Quality gate through centralized computational policy enforcement

Quantyca is proud to be a part of the Build with Confluent initiative.

Quantyca is proud to be a part of the Build with Confluent initiative. By verifying our streaming-based use cases with Confluent, you can have confidence that our Confluent-based service offering is not only built on the leading data streaming platform but also verified by the experts at Confluent.

Confluent is the data streaming platform that is pioneering a fundamentally new category of data infrastructure that sets data in motion.
Confluent’s cloud-native offering is the foundational platform for data in motion – designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly stream across the organization. With Confluent, organizations can meet the new business imperative of delivering rich, digital front-end customer experiences and transitioning to sophisticated, real-time, software-driven backend operations.

Monolithic architectures limit scalability and business agility, prompting a shift from app-centric to data-centric approaches. Data is now viewed as a product, optimizing value for its consumers. A data product, the smallest deployment unit within a data platform, comprises data, metadata, infrastructural components, and application code.

Data Governance plays a crucial role in ensuring interoperability between data products. It’s federated and automated, providing end-to-end governance across technologies. It facilitates developer experiences with automatic, self-service capabilities and helps consumers with meaningful information about data availability, characteristics, semantics, usage and quality.

Quality is governed through automatic policy enforcement applied during data product deployment, to accept or reject instances, and during their life cycle through a continuous monitoring. The effectiveness and sustainability of the approach is based on the implementation of quality gates directly next to the data sources rather than at various points that consume their outputs. In real-time applications, this increases latency, but it’s a worthwhile trade-off in highly regulated contexts or analytical platforms. Centralization of policies and their application across the platform against local, fragmented policies is key. This facilitates management and allows policy organization in suites. A holistic approach is adopted to manage the life cycle of cross-technology data products.

Confluent Cloud platform offers a wide variety of features required by data governance practices.
The Schema Registry is certainly the first among these for the definition and enforcement of data contracts to be associated with the data exchanged between producers and consumers, regulating not only the physical schema but also the semantics and permitted uses.
Thanks to advanced stream governance, it is possible to define domain rules to be applied to a schema and therefore to all messages that use it. It is also possible to invoke custom policies defined externally and perform custom actions based on their outcome.

At Quantyca, we have launched the Open Data Mesh Initiative and made our internal specification for describing data products, the Data Product Descriptor Specification (DPDS), open-source (https://dpds.opendatamesh.org/).
The initiative has also launched an open source data-ops platform, the Open Data Mesh Platform, that manages the full lifecycle of a data product from deployment to decommissioning.

This use case pertains to a data platform structured into data products, which in our scenario is governed by these open-source specification and data-ops platform, with real-time data ingestion, facilitated by Confluent Cloud services, and the requirement of applying quality gates to enforce centrally defined policies. This integration is accomplished by combining the Confluent Schema Registry and Advanced Stream Governance with the Open Data Mesh Platform’s services.

Value Proposition

The solution answers to the following needs:

Exploit a wider syntax than the one currently supported by Schema Registry’s domain rules
Reference a central repository for policy definitions
Apply policy enforcement as close as possible to data producers
Define custom behaviour to handle bad data

It leverages the following building blocks in Confluent Cloud:

Advanced Stream Governance to enable usage of domain rules
Schema Registry to bind policies defined externally to schemas

Its key advantages are:

Central and tech agnostic repository for policy management: avoid redundancy and facilitate integration
Proactive quality gates enablement: data is verified before its publication enabling effective quality management

Solution architecture diagram

Each time a data producer wants to write a message a Rule executor is used to talk to Open Data Mesh platform’s Policy Service through a REST API call. This service is responsible for verifying compliance with policies defined centrally in a Policy Server. The rule executor uses the message and the associated schema to communicate with the Policy Service: the schema specifies as domain rules the names of the policies to check, as they are part of the data contract of a data product. Policy definitions are managed centrally on the Policy Server, this makes it possible to organize them in suites and manage their enforcement across different technologies by keeping a single repository to manage. After collecting the verification answer a rule action is performed, for example to send an email and forward the message to a dead letter queue if the requirements aren’t satisfied.

Demo of the solution

Imagine a data product responsible for the real-time production of orders managed by an e-commerce portal.

We want the orders to be subjected to a validation that only lets through those that meet the requirements, signaling and discarding the others, before being propagated within the data warehouse, which follows near real-time analytical logic. The data product is represented through a descriptor that specifies data, metadata, application code, and necessary infrastructural components according to a specification called DPDS. In our case, the data product orders will have associated a topic on Confluent and a schema on the schema registry where the policies to be verified are indicated, invoking the names of policies defined centrally in a policy server. When a producer tries to write a message, a quality gate is implemented through a custom policy definition that behind the scenes makes a REST call to a service of our framework and receives the outcome of the validation. Invalid messages are quarantined on a dead letter queue topic for subsequent management and a report is sent with an email.