My last post, Data Lakehouse Architecture for Electric Coops – Lake Data Insights, described an overall architecture for using a lakehouse for electric coops. Meter data is an essential part of coop analytics and ingesting it is the focus of this post. I’ve outlined the relevant section of the diagram at the end of the post.
In my first iteration of the data lakehouse, raw data was ingested from the meter system (Aclara TWACS). Daily ASCII dumps were processed over several steps to get meter data that was transformed from wide format to long format, scaled properly with multipliers, compensated for sub-meters, and more. Last year, I learned about the NISC Meter Data Management (MDM) infrastructure from a peer at another coop. This is a game changer for creating a reproducible solution across coops. The architecture here uses MDM as the source for meter data.
There are some important considerations for leveraging MDM for the lakehouse. First of all, you need to know how to extract data from MDM on a regular cadence. Fortunately, NISC has a good solution for this with the MDM Extract administration tool. There’s good documentation on the specifics of this from NISC, but I’ll provide an overview here.
First, you’ll want to set up a regular running (daily) extract job using the administration tool. There will be some lag as a typical pattern is that NISC receives the meter data from the coop around mid-day for the previous day. This is then processed and available for an extract on the next day. Long story short, you’ll typically have data through the day before yesterday on any given day.
To avoid manually downloading the data from the job, you’ll want to work with NISC to set up a Secure File Transfer Prootocal (SFTP) destination for the data. There are a couple of destinations that will work with a lakehouse:
- An Azure Storage account can be set up for SFTP transfers. Since this is the ultimate destination for the data, this is the simplest approach. It’s also more secure than the alternative. However, it has one non-trivial negative. Enabling SFTP on a storage account incurs an ongoing charge of $0.30 an hour. If you do the math, that’s over $200 / month for this capability. That can be a pretty high proportion of a lakehouse cost for functionality that is fairly minor.
- Alternatively, you can set up an SFTP account on an on premise server. Once it’s there, you will need to move it to the Azure Storage account for the lakehouse. While it seems a bit silly to move data from the NISC cloud to an on premise server to your Azure cloud, sometimes you have to take the long route in order to save some money. Here are a couple of ways to move the data from the server to Azure storage:
- Azure Data Factory has a way to track new files that land in a particular folder and incrementally upload the new files. This is the way I would recommend doing this. You need to install a Self Hosted Integration Runtime on a server for ADF to reach onto your on-premise network for this, but you also need this for an ODBC connection for iVUE.
- An alternative would be to use a service running on the server that uses the Azure APIs to upload the data. I’ve done this with PowerShell and Windows Scheduler. The downside of this is one more place to coordinate the orchestration of your pipeline.
In the diagram, you’ll notice a special “Landing Zone” for the meter data in the Bronze tier of the medallion architecture. This is a good pattern to follow when raw source data is being “pushed” to the storage account. Once it is stored in this format in the landing zone, another process can do some preliminary processing that will finalize the ingestion portion of the process. That is what will be covered in Meter Data Ingestion, Part 2.
