Enterprise AI Governance

Data Platform

Data Governance

Come and use and govern your new tool correctly and make artificial intelligence available to your company

Industries:

Finance & Insurance - Retail & FMCG - Transportation - Energy & Utility - Life Science - Industrial

Solutions:

AI - Data Platform - Data Governance

Overview

With the explosion of Artificial Intelligence (AI) that has occurred in recent years, also due to the development of generative Artificial Intelligence and the publication on the market of applications that exploit Large Language Models (LLM), we have seen a incredible increase in the perception of the potential of these tools by an ever-increasing public.

Fully exploiting the features made available by Artificial Intelligence in a business context means analyzing which use cases can be implemented, creating solutions that can lead to an improvement in the efficiency and effectiveness of entire business processes. At the same time, however, it is necessary to take into account a series of risks and critical issues related to the use of these technologies.

The proposed AI Governance approach is based on Quantica’s many years of experience in the field of Data & Metadata Management. Starting from its foundation in 2009, ranging from the world of Business Intelligence to Data Integration, up to the management of Cloud platforms, Quantyca has always paid particular attention not only to the implementation of projects, but also to data governance and metadata management.

We are convinced that many of the attentions and policies defined in the management of structured data should also be applied in the context of projects related to Artificial Intelligence, where the information contained within unstructured or semi-structured data is also taking on increasingly more value ( e.g. company procedure documents, institutional website FAQs, multimedia contents, etc).

In particular, in the definition and development of solutions that exploit Artificial Intelligence, Quantyca guarantees particular attention to the following principles:

Data-centric approach

Data is a product and should be considered a business asset; thus, it must be managed accordingly. This corporate asset should be valued and shared to be leveraged in various use cases.

Platform thinking

Our vision revolves around a unified platform where analytical and service integrations converge. The interoperability of its components, particularly data products, is fundamental to us.

Data Governance by design

Governance is an integral part of the projects we work on. We conceive it as a proactive, sustainable, and measurable process.

Strengthened by these principles, and the strategic consultancy experience conducted over the years, Quantyca supports and guides companies in the implementation of an AI strategy based on the following aspects:

Quantyca-AI-strategy

Challenges

The main aspects to consider in an enterprise context when using new technologies provided by Artificial Intelligence are as follows:

Operational Management

Due to the various functionalities that can be leveraged with a Large Language Model (LLM), the enterprise use cases are numerous. The most interesting scenarios include using LLMs to understand functional context and generate structured information (Named Entity Recognition) or interact with external tools (Tool Calling). In these contexts, it is generally possible to significantly streamline activities related to Data Management, such as automatic ingestion from document images/scans, or introducing user interfaces for existing systems utilizing natural language, like chatbots or voice assistants.

The number of parameters composing an LLM, and consequently the computational power required to train it, makes it difficult for most organizations to develop such models. Only a few players can afford, both technologically and economically, to create these models. The quality of the model used is inherent to the model itself and heavily depends on the type of training and “instruction fine-tuning” it has undergone, as well as the data used for training.

Is it impossible to modify the behavior of an LLM? Must I use it as provided by a vendor or as available publicly?

In reality, two fundamental aspects can set or modify an LLM’s behavior: prompt engineering and fine-tuning.

Thus, operational management in an enterprise context of such models should focus on the following points:

• How do I best manage the prompts generated by my users?
• How can I collect the necessary data to effectively perform fine-tuning on a model?

Governance

Just as in Data Management, the use of Large Language Models involves significant aspects related to Governance.

The risks associated with employing these tools can be numerous:

Data Ownership

When using a Large Language Model from a third-party provider, you’re essentially using APIs or an application designed to allow users to interact with the model. A well-known example is the ChatGPT application, which enables interaction with GPT-x models. The issue of Data Ownership concerns how the data submitted by users to the service provider is used, specifically how the conversations users have with the chatbot are handled.

To improve model quality, it is crucial to provide high-quality textual information during training. Conversations, already structured and implicitly “validated” by end users, are an excellent example. Additionally, users often have the option to provide feedback on the model’s response, even if only to indicate whether it was positive or negative.

Using these systems in low-risk or “recreational” contexts poses few risks. However, in an enterprise setting, several aspects need to be considered.

For example, consider an unregulated use of a Large Language Model, employed merely to summarize a verbose company document. The user submits parts of the document to the service and asks the model to summarize the content. If the document contains sensitive and limited-distribution information, there is a risk of making such information public. The Large Language Model provider could use the user’s conversation for future training cycles, potentially including sensitive information in the training set and making it available in responses to other users in the future.

BIAS

As previously mentioned, the quality of Large Language Models heavily depends on the data and information used for their training. The volume of textual data needed to train these models is rapidly becoming a real bottleneck: the datasets used are directly sourced from all publicly available text on the internet and consist of billions of samples.

However, not all public text on the internet can be considered high quality. Additionally, some sources may present a certain cognitive bias.

For example, consider a model training using a thousand textual documents with a neutral sentiment on a specific political topic, and only one document expressing a positive stance on a particular viewpoint. Given that models are fundamentally statistical in generating text, that single document will not significantly influence and introduce bias in the responses generated. The situation changes when the percentage of documents presenting such biases increases.

Since training sets and training methods are intellectual property of the vendors providing these models, and disclosure to the public is made only in a few commendable cases, users must trust that the text used for training is of high quality and free from such cognitive biases. However, the risk remains, and studies have shown that in some cases, models may exhibit a preference for generating responses with a certain political, gender, or religious inclination, etc.

If these solutions are used to generate text that is then made visible to end users, it is therefore necessary to keep in mind the need to ensure that this text does not contain discriminatory, stereotyped information or information that presents cognitive biases of some type.

Hallucinations

In the context of Large Language Models, “hallucinations” refer to the generation of text that appears coherent in response to a user’s prompt but does not correspond to reality. In these cases, the model produces information that is not based on the data it was trained on and does not follow any learned patterns. This leads to inaccurate or nonsensical results.

For example, if one asks the model to provide titles of scientific papers proving that the Earth is flat, even though the model has never encountered such documents during training, it might still know the typical structure of a scientific paper’s title. Based on the request, it could invent coherent and formally valid titles on the topic, which would obviously be entirely incorrect.

Generally, when using an LLM, one must always consider the statistical nature underlying its functionality. The intelligence these tools display is actually simulated and stems from a simple task: estimating the most probable next word in a sentence given the previous input. A Large Language Model may perform this task with varying degrees of skill, but regardless, the risk of hallucination is always present due to the way these models are constructed.

Solutions

To facilitate the adoption of new Artificial Intelligence solutions and to govern and control how these tools are used, a fundamental aspect to consider is metadata management.

A common experience in the field of Data Management, with the advent of DevOps practices first, and MLOps later, is the systematic collection of metadata and artifacts produced by developers to enable automated deployments and facilitate new developments.

For example, in MLOps practices, everything that can be considered a decision factor in a data science project is collected and managed by a suite of services. This includes the data used for training, how features were created for the model, the choice of model, initial configurations, and the results obtained by the model, up to the deployment of the model, the collection and use of outputs to enable new operational processes or provide KPIs for existing processes, monitoring performance over time, and potential performance degradation.

Managing this information enables a better experience for developers by sharing what has been produced in other scenarios and facilitating integration and improved control over models once they are in production environments.

The same approach can and should be applied to the use of Large Language Models (LLMs). From an operational perspective, there are two main ways to modify and customize an LLM’s behavior: prompt engineering and fine-tuning.

Therefore, data and metadata must be collected to best manage these activities.

Prompt Lifecycle Management

This term refers to all those procedures necessary to create, maintain and manage the prompts that are used to train Large Language Models.

Exactly as happens with the feature store in the context of Machine Learning, in this case too it is essential to save all the prompts used by users in a dedicated store. An example of metadata to collect related to the prompt could be:

• Category
A “certified” prompt used in enterprise-level applications or a prompt used in a specific context.

• Use case
Used to instruct a chatbot, implement a specific RAG strategy, or a tool-calling agent, etc.

• Prompt Strategy
The strategy adopted in the prompt, such as zero-shot, few-shot, chain of thought.

• Context Description
A natural language description of what the prompt is used for, enabling semantic searches.

• Prompt parameters
For prompts that accept input parameters, a classification and description of how these parameters are used within the prompt.

• Prompt Quality
A parameter expressing the quality of the results obtained using the prompt, possibly updated automatically based on user feedback.

Collecting this information in a structured manner allows users to avoid starting from scratch. Instead, they can search within a Prompt Repository to find an existing version as a starting point and filter for prompts that have achieved better results in certain use cases.

Logging

Another fundamental aspect of using Large Language Models (LLMs) is logging.

To properly fine-tune a model, it is necessary to submit a series of conversations—pairs of requests made to the model and responses received—so that the LLM can recognize specific patterns in a given context and replicate them in its responses. The more high-quality information used during fine-tuning, the better the results.

For this reason, it is essential, when deploying any application that uses LLMs, to accurately save every request made to the model by users, as well as each response provided by the model. Additionally, it is crucial to establish methods for collecting user feedback to assess whether a response is considered correct or at least reliable, as opposed to incorrect or imprecise.

This can be achieved using logging services provided by cloud providers, as well as various open-source frameworks that are increasingly emerging in the context of LLMs (e.g., LangChain).

Besides the benefit of customizing a model based on user feedback, thereby improving response quality, another positive aspect of fine-tuning is cost reduction. It is possible to use a less powerful, and often less expensive, model that, through fine-tuning, can achieve the performance of a more powerful model.

Guardrails

Considering the aspects related to data governance and the risks involved in using the models previously described, a solution to mitigate these risks is the implementation of guardrails.

The term “guardrail” refers to an application component that performs filtering operations both at the request and response stages when using a Large Language Model. It is, of course, possible to use more than one component to govern different aspects, but generally, the pipeline to be constructed is as follows:

Guardrails allow for a series of specific checks on both requests and responses, such as verifying whether sensitive information has been included in the interaction with the model or ensuring that a certain formal structure is present in the model’s output.

In general, the application logic within guardrails can be of two types: it can be deterministic formal logic, where the checks are precise and quantitative, or it can be a logic obtained through the use of artificial intelligence, such as qualitative control over the presence of topics with particular ethical impacts in the text. In this second scenario, the aim is usually to use the best available model to achieve the highest possible quality results. However, it would still be a mistake to consider such a process free of errors.

Thus, guardrails constitute an important element of control and verification for both the information provided by users and that generated by the models.

The complete process

1 Definition of operational strategy

in the use of LLMs

2 Management of services

necessary for timely management of the life cycle

3 Activation of logs

for any request/response generated in interaction with an LLM

4 Using the Guardrail framework

for better control of LLMs

Benefits

Having correct governance of AI tools by exploiting the solutions described leads to some important advantages:

Simplified operational management

Developers can focus on adding value through the solutions they develop, without having to manage platform aspects and the automatic collection of metadata needed for future use.

More control

introduction of methodologies and architectural components dedicated to the control of the information submitted to the models and generated by them. Reduction of the risk of possible image damage caused by the use of generative AI.