CLOSE ✕
Get in touch with us
Cloud consulting is what we do best - whether it's about taking your business to the next level or working for us we'd love to hear from you.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Cooperative and Automated Vehicle Initiative (CAVI)

Department of Transport and Main Roads (TMR)

Project Date

October 2019 - November 2019

problem statement

CAVI (Cooperative and Automated Vehicle Initiative) project is being developed by TMR (Transport and Main Roads) which mainly focuses on new vehicle technologies including safety mobility and environmental benefits onQueensland Road. Their Goal is to test cooperative and automated vehicle technologies that make roads safer by contributing towards a vision of zero road deaths and serious injuries on the state’s roads.

Cloudten was involved to assist TMR with the Industry best practices and to review their current Data analytics pipeline and their large datastore

 

proposal

As part of the engagement Cloudten provided the following,

  • Proposed Industry standards and best practices around ETL security; Data encryption - Transit and at Rest, DataPartitioning,
  • Developed Glue ETL jobs to convert datastore from CSV to Parquet and for further transformation
  • Used CTAS (Create Table as select)statement to create a new table specifying format, compression, partition fields and location
  •  Used CTAS (Create Table as select)statement to create a new table specifying format, compression, partition fields and location
  • Developed scripts for schema validation using lambda, step function and Dynamo DB
  •  Configure, test, deploy Glue workflows for Development and non-production environment
  • Optimisation of data ingestion pipeline
  • Refactoring ETL code to handle Glue crawler function within Glue jobs, aiming to reduce the time consumed by crawlers in hours to couple of minutes
  •  Configured helper scripts to handle duplicate data across data-stores
  •  Cross account SNS subscription to handle ETL triggers across AWS accounts in TMR

Outcomes and results

As part of this engagement, Cloudten. Successfully delivered the proposed solution. CAVI account now has an ETL job that is quicker efficient and is run with minimal time. Data streaming was enabled for some of the services from source to their ETL Tools for further processing. Major fixes in the legacy code.

 

DescribeTCO Analysis Performed: 

Analysed current state of run costs on Glue Crawlers and Jobs

 

LessonsLearned

Glue Crawler takes a longer time and it crawls entire data set causing overhead and additional costs, this was mitigated by moving Glue crawler functionality within AWS Glue Jobs.

 

Diagram:  

aws services used

Glue Data CatLog – Glue Crawlers were used to populate AWS Glue data CatLog in the tables 

Glue jobs – Transform data from one form to another; CSV toParquet

Glue database – As Datastore

DynamoDB -- Stores table Schema

Glue Endpoints – Connect with the local Zeppelin notebooks for debugging. 

Athena – Create queries and views for Tableau

S3 - Data lake

IAM – Create restricted policies and Roles to be consumed by AWSServices and end users 

CloudFormation – Deploy the pipeline including Glue jobs, crawlers, Lambda Functions, S3buckets

Code Commit-- StoreCode for CloudFormation templates, JSON files for DynamoDB schema, LambdaFunctions

Lambda – Functions to validate Schema, copy files from Raw toStaging

Step Function – Drives lambda function

Third party application or solution used

None