Data Science Lab

Data Products

A systematic approach to data analysis via a platform for the industrialisation of data science processes

Industries:

Finance & Insurance - Retail & FMCG - Transportation - Energy & Utility - Life Science - Industrial

Solutions:

Data Products

Technologies:

Power BI - TIBCO - Aws - Python - PyTorch - Databricks

Quantyca Data Science Lab use case image

Overview

In a modern data-driven company, data represents a fundamental asset and all actions and strategic directions are dictated by the insights gained from the analysis of data coming from a multitude of different sources.
Data science represents the set of methods, processes, algorithms and technologies that enable the extraction of useful knowledge from the multitude of structured and unstructured data that the company has at its disposal within the data warehouse, data lake or, more generally, the data platform.
In this way, artificial intelligence (AI) and machine learning (ML) techniques are redefining entire market sectors, from the world of online retail to transport services, from domotics to the insurance and banking fields, enabling the understanding of correlations and trends concerning complex phenomena such as consumer preferences, the evolution of demand for a specific product or service, and the analysis of market competition.
Over the last ten years, these technologies have spread not only in big companies, but increasingly also in SMEs; both have dedicated these years to experimentation, alternating between promising results and costly failures.

Challenges

The main problems of data platforms on prem are:

Inability to scale resources elastically. At times of high load the platform is often in trouble while at times of low load it pays for idle resources.
Inability to scale storage and computation independently. If you have to increase one of the two, you also have to increase the other. The scaling unit is the server within the cluster.
High operating costs to configure and manage complex, distributed architectures often consisting of multiple technologies developed by different vendors.

87% of data science projects

never make it into production

VentureBeat AI

90% of data scientists

have a reproducibility production issue

Nature

99% of AI research

focuses on ML and neglects Data Preparation

Andrew Ng

Solution

The entire solution is based on an infrastructure capable of automating the data processing process for calculating the features required for ML models, the training and execution of ML models and their exposure via APIs.

There are also tools for isolating environments and projects, provisioning the development environment, and versioning code, data and models.

The Data Science Lab environment reconciles the data scientist’s need for agility with IT’s need for stability and maintainability, thus accelerating the release time of new models.

The complete route

1. Inception

Mapping the previous skills of the data scientist team and overcoming technological and methodological gaps through the provision of training (face-to-face workshops, e-learning, ...)

2. Foundation

Set up of the infrastructure and import and production release of the first ML models starting from the priority use cases

3. Expansion & Optimization

Implementation of new use cases by expanding the coverage of the Data Science Lab, automating training and production release procedures...