May 2019 - August 2019
Insurance & Care NSW (iCare) is one of Australia’s largest general insurers. Currently, iCare’s data team has workflows consuming more than 20hours of time for data processing, model training and data consumption. Also, there was a requirement to migrate their current training and deployment solution to AWS Cloud. They needed a solution to minimise the time required to execute their ETL jobs and to move these jobs to AWS. This will help getting more reliable and faster input data into iCare’s Qlik application.
Cloudten were engaged to deliver a POC solution and provide AWS best practice methodologies for their Data Team. The proposed solution was, implementation and migration of iCare’s data model and ETL jobs to AWS.
Cloudten proposed the use of AWS Glue Crawlers, Glue Databases and Zeppelin notebooks for the data transformation jobs.
For model training and deployment, Cloudten proposed the use of AWS Sagemaker Notebooks and Sagemaker endpoints. Their algorithm was Dockerized to run within a container for customization, to be consumed by Sagemaker.
To minimize the execution time for training and inference, Cloudten proposed the use of M5/C5 instance types for Sagemaker notebooks and scheduled Auto scaling of these instances during peak load.
For security, the Notebooks and Glue jobs were deployed within a VPC with no Internet gateway or NAT connectivity, in disabled internet mode. The API calls to AWS services within the Sagemaker notebooks were made accessible via a proxy server. Bucket policies were applied to restrict bucket access and the objects were enabled with KMS encryption.
For cost saving Cloudten proposed Autoscaling of Sagemaker endpoint, to scale based on the no.of requests.
To build the pipeline using IaaC, Cloudten used CloudFormation, code commit and code deploy.
Since iCare’s engagement with Cloudten, the Glue ETL jobs have been tested and they execute in 20mins compared to 5 hours earlier. Model algorithms have now been dockerized, easing the build of packages locally. The Pipeline built to perform end to end implementation has provided a collaborative approach for both Data analysts and Data Scientists. Sagemaker notebook files are stored in Code commit for version controlling. Models deployed with endpoints are now being autoscaled ,providing quicker results. Once the project meets the required approval, the plan is to implement this solution in iCare’s production environments.
Glue, Lambda, Athena, Sagemaker, S3, Iam, CloudFormation, Codebuild, API gateway, Code commit, ECR.