CLOSE ✕
Get in touch with us
Cloud consulting is what we do best - whether it's about taking your business to the next level or working for us we'd love to hear from you.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Data Lake

Federal Government Agency

Project Date

December 2019 - Present

problem statement

A large Australian federal government agency was seeking to gain greater business value and deeper insights into its wide and diverse range of disparate data assets. In addition to uplifting their advanced analytics capabilities, the business was looking to use the data lake to provide a wide range of real-world benefits  including improved fraud detection, enhanced market awareness and predictive modelling in areas such as superannuation, mortgages and business lending.

They launched an open tender in 2019 to partner with a trusted data services provider to design, build and manage an enterprise grade Data Lake that incorporated best of breed tools and was capable of passing an independent IRAP security assessment toPROTECTED grade.

Cloudten won this competitive tender and has been working with the client on an ongoing basis to design and deliver the platform. We currently have an established 8 person team consisting of Data Architects, Engineers, Analysts and Data Scientists working closely with customer business and IT teams.

proposal

This hybrid solution has existing business intelligence (Qlik Sense) components hosted on-premises with all Data Lake infrastructure running inside the AWS public cloud. All the underlying AWS services that make up theData Lake have been previously IRAP certified. From an access and identity perspective, all components of the solution are federated to the agencies Active Directory domain and all logging is integrated with their existing commercial SIEM solution.

The successful pilot for this project incorporated industry leading commercial products, Snowflake EDW, Databricks UAP (for ETL, processing and ML) and Alex Solutions (for governance, catalogue, lineage and data quality) as well as a number of native AWS services including Glue, Athena and KinesisData Streams for live feeds.

This project is now in full build phase and now includes a numbe rof additional cloud native machine learning tools such as Amazon Textract to provide advanced Optical Character Recognition (OCR) for non-structured datasets such as PDF documents and scanned images.

The following diagram gives a high-level overview of the solution:

Outcomes and results

One of the key challenges of this project was around stakeholder management and helping define the customer’s data strategy and governance policy. Prior to this point, the customer had fragmented business teams and alack of general structure and maturity in its data practices. This requiredCloudten to provide a range of resources in the form of business/data analysts, data architects and project management to assist the client with the cultural shift and organisational changes required to make the project a success.

We have also needed to work collaboratively with business and IT teams during the current COVID-19 outbreak where remote working and social distancing have become necessary. Cloudten has risen to the challenge and is effectively working with the customer to co-ordinate and deliver the project.One of the key success factors to this was our ability to quickly comply with, and adapt to, the customer’s policies and standards relating to security and remote access.

aws services used

Glue, Athena and Kinesis Data Streams.

Third party application or solution used

Snowflake EDW, Databricks UAP and Alex Solutions.