In order to stay competitive, it is vital for every business entity to be able to make informed decision-making processes. Data lake can provide a significant amount of previously hidden data insights given the ability to accumulate and store structured, semi structured and unstructured data. AWS Simple Storage Service’s(S3) low cost, durability, scalability and simplicity inspires many organisations to start leveraging the benefits of this service by building its own repository of enterprise data.
However, the real challenge every organisation is facing, is at the very initial stage of its own data lake design and implementation process. How to avoid ending up with yet another data swamp? Things may easily and quite rapidly go wrong as the high velocity of data income derived from diverse sources and multi-tenant usage nature transforms. Something that is meant to be a solution becomes the source of anxiety and frustration for organisation’s IT department and executives.
As an AWS advanced consulting partner, Cloudten utilises a set of industry best practices to secure its clients with highly efficient, cutting-edge technologies to meet their business goals. Building a big data storage solution is not an exception and following clear and concise recommendations will unleash the full potential of data lake.
For instance, AWS highly recommends the data lake management based on object tagging. This simple technique permits to associate each and every data asset with organisations, lines of business, users, applications using, processing it and set entity specific policies for a coherent data management.
AWS S3 data objects may have up to 10 tags which are mutable key-value pairs with value length up to 256 Unicode characters. Well-designed object tagging schema serves the following tasks:
· Data classification that could be especially important for organisations dealing with sensitive information. It is not recommended, however, to include confidential information in tags themselves.
· Fine-grain controls of access permissions implemented with AWS Identity and Access Management (IAM).
· Fine-grain data lifecycle control based on lifecycle policies that contain tag-based filters.
· Data monitoring and audit with with Amazon CloudWatch metrics and AWS CloudTrail logs.
Furthermore, this additional metadata might become a vital enabler and important source of insights as data science professionals obtain better control and understanding of objects stored in AWS S3 data lake.
Below is the is the illustration of a single object tagging in AWS S3during upload process. For multiple objects upload operations it is recommended to use automation tools such as S3 Batch Operations or cloud-native ELT solutions.
AWS S3 object tagging during upload
Having a partnership with Australian enterprises and governmental institutions, Cloudten has contributed to a number of impressive quality transformations in terms of data lake formation and usage efficiency thanks to object tagging implementation. This technique definitely helps to turn any data lake into a well-organised big data storage and provides additional possibilities for advanced analytics.