Your Starter Guide to Data Governance

Data Services

Mar 18

Organizations must concern themselves with the importance of an effective data governance program. Insights gleaned from data can drive critical business decisions, fuel innovation, and provide value to the customer. That said, numerous examples demonstrate how data governance failures adversely affect the consumer and cause significant reputational and financial damage to your organization. The following three governance failures demonstrate how unanticipated data leakage leads to failures in securing customer data.

In 2006, Netflix launched a competition called the Netflix Prize, offering a $1 million prize to anyone who could improve the company's recommendation algorithm by 10%. Netflix provided contestants with a dataset containing user movie ratings to predict how users would rate movies they had yet to see. The dataset provided to contestants included anonymized user IDs, movie IDs, ratings, and timestamps. It turned out that it was possible to re-identify individuals by combining the Netflix dataset with publicly available data. This breach resulted in a class action lawsuit against Netflix. Even though the lawsuit was eventually dismissed, the damage to Netflix’s reputation was done.

In 2012, Target developed an algorithm that accurately predicted which customers were pregnant based on their purchasing habits. However, Target's data governance processes failed to anticipate the unintended consequences when ads sent to homes unintentionally outed pregnant teenagers.

In our final example, in 2016, employees at Wells Fargo opened millions of unauthorized accounts and credit cards for customers without their knowledge or consent, often resulting in multiple unknown fees for unsuspecting customers. In addition to the reputation damage and millions of dollars in penalties, four mid-level executives were dismissed and stripped of their bonuses and stock awards.

Data Governance

Data governance establishes standards for data collection, storage, and analysis, ensuring accuracy and mitigating risks associated with regulatory non-compliance. Moreover, governance promotes ethical data practices, safeguarding individual privacy rights and societal norms.

However, governance approaches must balance innovation and control, empowering your organization while mitigating risks.

This delicate balance enables organizations to harness the transformative power of data while safeguarding against unintended consequences and ethical dilemmas. Through structured and automated governance frameworks, organizations can navigate the complexities of data management, fostering innovation while upholding accountability and trust.

Key Governance Practices

Let’s look at the key elements that make up robust Data Governance.

Data Quality

Data Quality Assurance ensures that data in decision-making processes is accurate, reliable, and relevant. It involves data validation, cleansing, and enrichment techniques to maintain high data integrity standards. Validation ensures data accuracy by verifying its conformity to predefined rules and standards. Cleansing involves identifying and correcting errors or inconsistencies in the data, enhancing its accuracy and completeness. Enrichment supplements data with additional attributes or information, improving its depth and usefulness for analysis. These practices increase the reliability and trustworthiness of the data, thus ensuring the reliability of data-driven insights and empowering informed decision-making.

In many cases, data needs to be anonymized to protect individual privacy. Anonymization challenges the data quality process, especially when trying to draw insights about group preferences and trends. Techniques like data profiling and outlier detection help maintain data integrity throughout the anonymization process. Additionally, post-anonymization validation ensures that the anonymized data retains its quality and usefulness for analysis.

Security & Access Control

Access Controls and Securing Data are a cornerstone of any governance program. These controls define the security of an organization’s data by establishing policies and necessary validations to safeguard sensitive information. The governance policies should not describe how to implement the controls. In that way, the adopting teams (e.g., the domain teams publishing the data products; see our blog post on data mesh) can implement compliant controls in an automated manner appropriate for that system.

It is common for teams to adopt robust access control mechanisms, including role-based access control (RBAC), attribute-based access control (ABAC), and least privilege principles. These mechanisms restrict access to authorized users based on their roles, responsibilities, and permissions. Access control elements must be automated to facilitate frictionless user entitlement and rapid remediation of security issues.

Data must be encrypted in transit and at rest by default. Encryption helps prevent unauthorized access or data breaches. Governance should define the encryption strategies and ensure that they are in place. Again, the mechanisms to encrypt and secure data need to be automated.

Ethics & Compliance

Compliance and Ethical Considerations are paramount for the organization, particularly regarding regulatory frameworks like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). To address compliance requirements, teams must ensure that data collection, storage, and processing practices align with legal mandates, such as obtaining explicit consent for data usage and providing mechanisms for data subjects to access, rectify, or delete their information. Additionally, robust security measures must be implemented to safeguard sensitive data and prevent unauthorized access. Digging into the details of frameworks like GDPR and CCPA is complex and outside the scope of this blog post. That said, they and similar frameworks must be thoroughly understood as part of the governance process.

Ethical considerations extend beyond legal and compliance and encompass responsible data practices throughout the data lifecycle. This includes transparency in data collection methods, fairness in algorithmic decision-making, and accountability for potential biases or discriminatory outcomes. Organizations must actively mitigate privacy infringement, data misuse, and algorithmic bias risks by adhering to ethical guidelines, engaging in ongoing honest discussions, and incorporating diverse perspectives into their analyses. Such governance is typically accomplished via a steering committee or a community of data stewards. These bodies provide guidance that is enabled via automation.

Documentation

Documentation and transparency are essential for fostering trust and understanding in your organization's science endeavors. Clear documentation of data sources, methodologies, and assumptions provides insights into the reliability and validity of analyses. Findings should be communicated transparently, highlighting both strengths and limitations. However, it's crucial to acknowledge that documentation and transparency alone cannot eliminate all uncertainties; they serve as guiding principles for promoting accountability and informed decision-making within your organization’s community.

The most common approaches are establishing and tracking data lineage, developing well-documented code that transforms and/or creates features, and establishing data books. Lineage creates traceability back to the source data. Well-documented code ensures that data transformations are repeatable. Finally, publishing and maintaining a data book (a document that describes your data sets) ensures that consumers understand how the data is and how it was intended to be used.

Collaboration

Collaboration and Communication are vital in data governance to ensure alignment, transparency, and accountability across teams. They facilitate knowledge sharing, consensus-building, and effective decision-making, ultimately enhancing the success and sustainability of data governance initiatives.

Enabling collaboration between data scientists, stakeholders, and governance teams requires fostering a culture of openness, trust, and mutual respect. Regular meetings, workshops, and forums promote cross-functional collaboration and the open exchange of ideas. Clearly defined roles and responsibilities, along with the setting of expectations, must exist for each stakeholder group to ensure clarity and accountability.

Quick Start to Kicking Off your Governance Process

The Pre-Work, Working Group

The working group should comprise cross-functional leaders/executives across the organization. This working group is intended to get a “lay of the land” on the steps taken to establish an organization-wide governance program.

The group should first understand what data exists in the organization, what it contains, what it can be used for, and where it can be used. The next step is compiling a list of the compliance frameworks the organization will likely be subject to. Finally, this group needs to identify/quantify the intended use cases for the data.

Once the groundwork is done and documented, the organization can create its governance framework and processes. This process must be iterative. Monolithic approaches to governance can be problematic as they often need more flexibility and adapt to modern organizations' diverse and evolving needs. The four process elements below should be approached with a cyclic mindset, allowing for continual refinement and adaptation based on the organization’s evolving requirements and feedback.

Define Objectives and Scope

With input from the groundwork, senior leadership should clearly articulate the goals and scope of your data governance initiative, focusing on areas such as data quality, compliance, security, and decision-making support. The objectives must also clearly articulate the business objectives the governance framework will achieve, such as increased business partner revenue growth through anonymized and balanced customer class registration data.

Select Team Members and establish Roles and Responsibilities

The organization needs a diverse and cross-functional team comprised of members from all key business units. Functional representation from data management, compliance, and IT security is also essential.

The business units focus on how the data will benefit the business, customers, or both. The functional team representatives provide compliance, technical, and security insights.

The groundwork group usually seeds the creation of this cross-functional team. They are usually closest to understanding the problems that the organization is trying to solve with data. Over time and iterations, this group will change and morph as the needs and understandings evolve.

Develop Policies and Standards

With roles and responsibilities established, the team can create robust policies, standards, and metrics/KPIs. Successful teams start with a single use case and draft the initial policy that frames the life cycle of the data for that use case. They then solicit feedback from the business unit(s), customers, and implementation teams.

Implementation, Monitoring, and Continuous Improvement

The team should participate in implementing and applying the defined policies and standards, preferably with test cases and pilots for each use case. Metrics from the automation that implements the governance standards should tie back to the established metrics and KPIs. These metrics provide transparency around the success/failure of the implementation. They also provide feedback for the improvement and evolution of objectives and scope for the governance framework.

Final Thoughts

At the risk of flogging a fossilized equine, creating and implementing governance of any kind should be iterative. Iteration enables small and focused implementations, fast feedback from end users, and short cycles—principles adopted by “everything”-ops.

When your operations teams begin reviewing and selecting tools, opt for cloud-based tools as their default go-to. Cloud-based tools are generally less expensive, more straightforward to integrate with (assuming your data is already in the cloud), and less costly to maintain while providing a broader range of options than their on-premise counterparts.

DataGovernanceDataMessDataSwamp

Paul Karsten