The data lakehouse architecture has been around for a few years now. My first blog post about it was back in March, 2021 – Warehouses, lakes, and lakehouses – Lake Data Insights. Most of this post is still relevant today, including the essential idea that a lakehouse intends to blend the best of data warehouses and data lakes.
I have quite a bit more hands on experience with the lakehouse architecture now than I did in 2021. While the 2021 post talks some about Azure Synapse, my focus in the last few years has been using Azure Databricks as my primary lakehouse platform. In particular, I’ve been using a lakehouse approach to address data analytics challenges for electric cooperatives. Electric coops are focused on member value and grid reliability. Data analytics, particularly those that incorporate granular meter data, can support this mission through identifying cost savings, proactive maintenance, and better decision making.
The lakehouse architecture is an excellent choice for electric coops. Here are five reasons supporting this:
- Spark based compute capabilities enables the scalability required to handle a large volume of meter data. Meter data can quickly grow to billions of data points.
- Object storage in a data lake and adaptable compute capabilities provide a flexible solution to handle a variety of inputs and analyses. Data sources include meter data, NISC iVUE, spreadsheets from an energy provider, etc.
- Low cost data lake storage and pay as you go compute enables a cost effective solution. Data lake storage is inexpensive and Databricks compute uses a pay as you go model. Monthly costs from compute and storage can reasonably be on the order of hundreds of dollars per month. This is critical for coops that have a laser focus on managing costs in order to maximize member value.
- Cloud hosted resources mean low maintenance. Managed cloud compute and storage minimizes the need for the costs associated with managing on premises servers. This minimizes the need for IT resources to manage on premise servers.
- A variety of tools, including low code / no code capabilities, drive easy to use capabilities. While Databricks is a code first environment, it has several capabilities such as notebook based development, strong git integration, Unity data catalog, and more that enable productive development. With some training and a core data foundation in place, experienced software engineers are not required.
Last but not least, using Databricks compute capability as part of the data engineering solution can greatly simplify what is needed downstream complexity in Power BI reporting. For example, it can simplify the data models and minimize the amount of DAX / M code required to create the desired analytics. Databricks also is working closely with Microsoft on Power BI integration scenarios.
Bottom line – I really like the lakehouse architecture, especially for electric coops! Feel free to reach out to me at david@lakedatainsights.com if you want to discuss this further.