An AWS Data Integration platform for Data Analytics and Machine Learning – A Case Study

The Project is an AWS based solution to provide a data integration platform that can accelerate digital analytics capture of the customer journey and identify some of the buying/cancellation patterns using Machine Learning (ML) approaches for one of the world’s most widely recognized cruise brands.

THE CHALLENGE

The client was seeking assistance in executing two specific use cases (given below) to validate the applicability and performance in a typical AWS Big Data / Machine Learning (ML) services platform.

Use Case 01 - Analysis by making best efforts to process the analytics data to identify business queries and visualize data with AWS based BI tools.

Use Case 02 - Build a predictive model that leverages the AWS ML and associated services using the available Adobe Analytics data and make business decisions using a ML model.

HOW AUXENTA HELPED

  • Auxenta provided an offshore team with AWS certified developers who were experienced across AWS core and Big Data services.
  • The team initially designed the most effective AWS based solution architecture to address the requirements of two use cases and developed the data integration platform based on the solution architecture.
  • Automated the data ingestion process and accelerated the velocity of digital analytics.
  • Provided quick translation of business vocabulary by automatic data discovery and metadata capture.
  • Enable a dashboard via AWS QuickSight to provide KPI’s and business metrics visualization.
  • Deployed the AWS Machine Learning and Data Sciences platform to compute propensity of cancellation, and identify features that influence or determine the likelihood of cancellation.

THE SOLUTION

1. Use Case 01 - Analysis by making best efforts to process the analytics data to identify business queries and visualize business queries with AWS based BI tools.

There are two processes to handle historical data and incremental data separately in Use case 1.

Manipulate historical data - [Steps]


  • Amazon S3 stores the historical data, which are ingested from the client side. Transfer historical data from
  • Transfer historical data from AWS S3 bucket to the AWS Redshift staging table using an Amazon EC2 instance that performs the work defined by a data pipeline activity.
  • Remove unwanted fields and data with the ETL process and load data into Redshift fact table.
  • Convert data into parquet format and store the files in S3 bucket using EMR step function.
  • Load data into AWS Athena from parquet S3 bucket and visualize with Quicksight.

Manipulate incremental data - [Steps]


  • Amazon S3 was selected to store the incremental, data which are ingested on a daily basis.
  • Lambda function will be triggered once new files are arrived into the S3 bucket and send the acknowledgement to an Amazon EC2 instance that performs the work defined by a data pipeline activity using AWS SQS.
  • Then transfer incremental data into AWS Redshift staging table and copy into fact table with removing unwanted fields and values.
  • Transformed data will be loaded into the S3 bucket in CSV format.
  • Convert CSV file into parquet format and load into the S3 folder, which contains parquet files using AWS Glue.
  • Load data into AWS Athena from parquet S3 bucket and visualize with Quicksight.

2. Use Case 02 - Build a predictive model that leverages the AWS ML and associated services using the available Adobe Analytics data.

  • Auxenta provided an offshore team with AWS certified developers who were experienced across AWS core and Big Data services.
  • AWS Sagemaker was selected as the platform of choice to take data storytelling.
  • Ingest relevant feature columns (humanize data-driven insights from experience).
  • Remove overfitting of the data and identify the right data.
  • Remove unnecessary noise.
  • Model deployments to model visualizations in Sagemaker.
  • Select the right integrated visualization (AWS Quicksight vs custom BI).

THE TECHNOLOGY STACK

Amazon S3, AWS Lambda, AWS Data Pipeline, AWS SQS, AWS EC2, AWS Redshift, AWS Sagemaker, AWS EMR, AWS Glue, AWS Athena, AWS Quicksight

BENEFITS TO THE CLIENT

  • Data integration platform that can accelerate digital analytics capture of the customer journey.
  • Visualize the analytics data based on business queries.
  • Make adaptive dynamic decision to provide ML as a Service based on existing conditions and past history driven by analyzing a large number of parameters.
  • Explore the likelihood of the buy and likelihood of cancellation with a similar adaptive decision framework.

Ridma Gamage

Senior Software Engineer