Navigating the complexity of data governance and AI models

December 01, 2023

undefined mins

As businesses rapidly adopt Generative AI, experts warn that neglecting robust data governance can limit potential and pose security and ethical risks

Businesses are embracing AI at an accelerating pace, enticed by the promise of data-driven insights and automation.

But as organisations move quickly to employ Generative AI models in order to transform their business, they are often failing to consider the importance of established data

governance capabilities.

According to a report by Deloitte, data governance plays a pivotal role in fostering innovation in the evolving AI landscape by ensuring responsible data practices, mitigating biases, and safeguarding privacy. A robust data governance strategy, it says, is the key to unlocking the full potential of generative AI use cases.

However, this rapid adoption poses a myriad of challenges around data governance and model management. Without robust data governance capabilities, the potential impact and value added by Generative AI will be severely limited and may even expose organisations to data and cybersecurity risks.

The multifaceted challenges of compliance, quality and security

Jinender Jain, Senior VP and Sales Head UK and Ireland at Tech Mahindra, highlights this complexity. “Data governance and AI model management present complex challenges for businesses. Compliance with data privacy regulations, such as GDPR and CCPA, is non-negotiable to avoid legal consequences and safeguard consumer trust.” Jain also emphasises the pivotal role of data quality and privacy: “Maintaining data quality is equally critical, as AI relies on accurate, unbiased data for dependable outcomes. Data cleansing and validation processes must be continuously upheld to ensure data integrity.”

Zoë Webster, AI Director at BT Group, echoes these sentiments, particularly when it comes to data quality: “Ensuring the quality and quantity of data is paramount. To develop robust AI models, you need sufficient, high-quality data. Different AI techniques have varying data appetites, with deep learning and large language models being particularly data-hungry.”

The issue of security is another thorny challenge, especially in sectors like healthcare, retail, and BFSI, Jain notes. “Businesses that fail to protect their customer's data run the risk of losing their trust, leading to a loss of sales, ruining the company’s reputation, and potentially subjecting them to legal liability. In an environment where consumer loyalty and relationships are everything, businesses must take every measure possible to keep customer data safe and secure,” he says. “Protecting customer trust and mitigating cybersecurity risks are just a few examples of why privacy and security are vital to businesses.”

Ethical concerns and the risk of biases

Nina Bryant, Senior Managing Director in the Technology segment at FTI Consulting, turns the spotlight on ethical considerations: “Research has identified critical risks within AI algorithms, including racial, gender and socioeconomic biases and age verification issues, alongside significant data protection risks.” Bryant also highlights the significant regulatory focus: “There is also significant regulatory focus, with different jurisdictions taking alternative approaches, especially around assessing risks, transparency, explainability and accountability, which will contribute to making AI governance equally challenging and important.”

Webster also takes up the theme of ethical considerations, commenting on the potential for biases: “When AI is trained on historical data, it has the potential to inherit biases unless those biases are consciously addressed.” She adds that where AI models are typically trained may inherently introduce biases.

“It's worth noting that the well-known large language models are generally trained on vast amounts of data from the World Wide Web. The WWW does not necessarily reflect the values and experiences of everyone across the world in equal measure, so this may inherently introduce biases. Data scientists and their teams must prioritise understanding their data sets to avoid ‘baking-in’ bias into the model.”

Long-term reliability

Jason Tooley, VP North EMEA at Informatica, raises concerns around the role of effective data governance in ensuring long-term AI reliability. As he describes, there is a temptation for organisations to ‘jump in at the deep end’.

“In reality, poor quality data leads to incorrect outcomes. Without appropriate data governance to support a self-service model for data consumers, organisations risk creating data insights that are damaging in the short-term,” Tooley describes. “While in the long-term, it reduces the potential credibility of AI technologies.”

Effective data governance goes beyond the immediacy of deployment and also plays a role in ensuring ongoing reliability. Webster underscores this point: “Data governance includes continuous monitoring and management of the data feeding the AI model, ensuring that it aligns with the data on which the model was trained.

“A primary concern is data drift, which refers to changes in the data used to feed the model. This can occur for various reasons, such as a breakdown in data feeds and fluctuations in the behaviour of the entities the data relates to, for example, in customer behaviour caused by events like a pandemic. When data drift happens, it can significantly affect the model's performance. Without proper governance, you might end up with inaccurate and unreliable AI output.”

Another critical aspect of maintaining AI reliability, Webster says, is recognising and addressing different forms of drift, not just data drift. “It could be technical issues within the model, such as components not functioning as intended. Effective AI governance ensures these issues are detected and resolved promptly.”

Transparency and accountability

When it comes to transparency and accountability, Bryant considers it essential for effective data governance to incorporate checks and balances: “Core to effective data governance is a series of checks and balances, procedures and controls, that assess the risk of potential harm from new AI solutions. These ensure that as AI is developed key considerations are managed and documented to drive transparency and reduce the risk of unintended bias.”

Tooley stresses the role of a unified data governance model for achieving transparency: “Increasingly, organisations need to move towards a single platform that can connect, integrate and automate all of the data management capabilities. This approach provides greater visibility, but it also offers synergy and simplicity. It ensures organisations can integrate data from multiple points and apply powerful AI principles, while ensuring the quality, governance and traceability of data is built into the underlying data management models for AI.”

As Webster concludes, ultimately, humans should be accountable for AI. “While some people talk about giving AI systems legal rights, accountability must rest with those who make decisions about AI use and deployment. It's the responsibility of humans to ensure that AI systems are governed correctly, that biases are addressed, and that ethical considerations are upheld. Having strong data and AI governance practices in place helps uphold accountability by guiding the responsible use of AI.”

Data Governance Generative AI AI Reliability