Federal Government Agency
December 2019 - Present
A large Australian federal government agency was seeking to gain greater business value and deeper insights into its wide and diverse range of disparate data assets. In addition to uplifting their advanced analytics capabilities, the business was looking to use the data lake to provide a wide range of real-world benefits including improved fraud detection, enhanced market awareness and predictive modelling in areas such as superannuation, mortgages and business lending.
They launched an open tender in 2019 to partner with a trusted data services provider to design, build and manage an enterprise grade Data Lake that incorporated best of breed tools and was capable of passing an independent IRAP security assessment toPROTECTED grade.
Cloudten won this competitive tender and has been working with the client on an ongoing basis to design and deliver the platform. We currently have an established 8 person team consisting of Data Architects, Engineers, Analysts and Data Scientists working closely with customer business and IT teams.
This hybrid solution has existing business intelligence (Qlik Sense) components hosted on-premises with all Data Lake infrastructure running inside the AWS public cloud. All the underlying AWS services that make up theData Lake have been previously IRAP certified. From an access and identity perspective, all components of the solution are federated to the agencies Active Directory domain and all logging is integrated with their existing commercial SIEM solution.
The successful pilot for this project incorporated industry leading commercial products, Snowflake EDW, Databricks UAP (for ETL, processing and ML) and Alex Solutions (for governance, catalogue, lineage and data quality) as well as a number of native AWS services including Glue, Athena and KinesisData Streams for live feeds.
This project is now in full build phase and now includes a numbe rof additional cloud native machine learning tools such as Amazon Textract to provide advanced Optical Character Recognition (OCR) for non-structured datasets such as PDF documents and scanned images.
The following diagram gives a high-level overview of the solution:
One of the key challenges of this project was around stakeholder management and helping define the customer’s data strategy and governance policy. Prior to this point, the customer had fragmented business teams and alack of general structure and maturity in its data practices. This requiredCloudten to provide a range of resources in the form of business/data analysts, data architects and project management to assist the client with the cultural shift and organisational changes required to make the project a success.
We have also needed to work collaboratively with business and IT teams during the current COVID-19 outbreak where remote working and social distancing have become necessary. Cloudten has risen to the challenge and is effectively working with the customer to co-ordinate and deliver the project.One of the key success factors to this was our ability to quickly comply with, and adapt to, the customer’s policies and standards relating to security and remote access.
Glue, Athena and Kinesis Data Streams.
Snowflake EDW, Databricks UAP and Alex Solutions.