Using AWS Quicksight For A Fast Datalake

Posted on August 01, 2020

Setting Up A Datalake on AWS And Connecting It To Quicksight

The apps we make generate a lot of data.

This is particularly true if you build an ever driven system or leverage cloud native solutions.

For example, it’s really easy with AWS DynamoDB or Kinesis to capture every update that ever happens to something in your app. Or even every click/important interactive.

This data usually comes in without all of the relationship data needed to make it useful. You might have an account guid for example. But, not have all the account information.

This creates a big problem. You want to make all of this data useful. But, it’d take a lot of effort to massage it into a query friendly format.

Additionally, any schema you come up with will almost certainly not be optimized for the crazy way business people want to use it in the future.

This is where Datalakes come in. You can use something like AWS Glue to stream your data from DDB and Kinesis to a cheap storage solution like S3.

And then use AWS Quicksight to generate the schema at query time. Quicksight resolves guids, timestamps, and other relational data automatically. So, you just need to worry about getting your data into S3 and then setting up business users with Quicksight.

I’ve found it a pretty great solution for letting internal users build their own custom reports connecting rather distantly related models.