AWS Glue

Data integration is becoming increasingly crucial in modern IT architectures. Digital transformation necessitates ever closer connections between an increasing number of applications and this poses an ever-increasing challenge to govern and streamline integration processes.

Contatti di riferimento

Andrea Gioia

Chief Technology Officer

Giandomenico Avelluto

Strategy Advisor

Overview

AWS Glue consolidates the main data integration functionalities into a single serverless service. Key functionalities include:

Data Classification

Function that uses crawlers to determine the technical schema of data. AWS Glue provides classifiers for the most common file types, e.g. CSV, JSON, XML, AVRO. It also provides classifiers for the most common relational database management systems using a JDBC connection.

Data Catalog

Persistent metadata repository: the catalogue contains table definitions, process definitions and other control information for managing data entities in AWS.

ETL/ELT

AWS Glue Jobs system provides a managed infrastructure for defining, planning and executing ETL/ELT operations on data in order to prepare and consolidate the data and enable its analysis.

Streaming Processing

In addition to batch modes, it is possible to create streaming processing operations that are executed continuously, e.g. consuming data from Apache Kafka, Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK).

Using the AWS Glue Studio component, data integration processes can be created, executed and monitored via a graphical interface. AWS Glue Studio allows you to visually compose data transformation flows and easily execute them on the serverless ETL engine based on Apache Spark.

AWS Glue is the ideal tool for efficiently running data pipelines in AWS. Due to its serveless nature, it allows you to optimise costs and use a pay-as-you-go billing model that allows you to start using the tool without upfront costs.

At Quantyca, we have developed libraries that extend Glue’s functionality such as custom tracing and logging libraries that enable additional execution information and integrate with external monitoring tools such as Elasticserch.

Partnership

As an AWS Partner, we bring our expertise in cloud data management processes, leveraging the flexibility, scalability, and reliability of the AWS Glue service.

Our consulting services:

Consulting to start with new Cloud Native projects
Assessment of existing solutions and Data Platform migrations
Data Management landing zone design and implementation for multi-account and multi-region management
Support on developing data integration pipelines
Maintenance of Cloud environments