Roche’s Maxim of Data Transformation

“Data should be transformed as far upstream as possible, and as far downstream as necessary.”

I came across this statement recently and found it to be succinct guidance for decision making in your data pipelines.  The statement is known as Roche’s Maxim of Data Transformation (https://ssbipolar.com/2021/05/31/roches-maxim) authored by Microsoft Principal PM, Matthew Roche.

The general idea is to transform your data close to the source (upstream) where it is cheaper and can be made available for more analytics use cases. Sometimes, the information for the transformation is only available close to the report or model (downstream), forcing the transformation further downstream. Read Matthew’s blog post for further explanation and some great examples.

I was recently talking to a prospective client about Power BI reporting. They were getting data from multiple source systems and struggling to model / transform the data in Power BI. What they were doing was counter to Roche’s Maxim. Following the maxim would push them towards an incremental store containing transformed data. That would simplify their reporting development and allow other reports to build off the transformations.