Dynamics 365 Supply Chain Management Demand Forecasting R Module

IMG_0299

This is the second post in a series discussing Dynamics 365 SCM Demand Forecasting Machine Learning. This post is focused on the R module in the Azure ML experiment. The full list of posts include:

  • ML Overview – An overview of Azure ML Studio and the interface from SCM to Azure ML (Dynamics 365 Supply Chain Management Demand Forecasting ML Overview – Lake Data Insights).
  • R Script Details – A dive into the R script that does the heavy lifting (this post).
  • Azure Auto ML Comparison – Use an external data set to show the differences between the SCM out of the box functionality and what Azure Auto ML does.
  • Results Discussion – An explanation of the results of the experiment from the previous post.
  • Customization Opportunities – A discussion on the various ways that customization can be done given the background from the previous posts.

Quick review

In the first post of this series, I reviewed the blocks in the Azure ML experiment shown below. I discussed the input and output formats, how to run the experiment interactively, and the function of each block.

Microsoft Machine Learning Studio (classic) 
Dynamics Ax7 - demand forecasting 
Enter Data Manually 
SAMPLE DATA 
Execute R Script 
GENERATE FORECAST 
split Data 
Select Columns in Dataset 
Web service output 
Mini Map 
RUN HSTORY 
David Froslie-Free-Worksp... • 
Web service input 
Enter Data Manually 
PARAMETERS 
Select Columns in Dataset 
Web service output 
PUBLISH TO 
SERVICE 
In draft 
Draft saved at 6:43:12 AM 
Search experiment items 
Saved Datasets 
My Datasets 
manual_data 
Samples 
Adult Census Income „ 
Airport Codes Dataset 
Automobile price dat... 
Bike Rental LICI dataset 
Bill Gates RGB Image 
Blood donation data 
Book Reviews from 
Breast cancer data 
Breast Cancer Features 
Breast Cancer Info 
CRM Appetency Labe... 
CRM Churn Labels Sh... 
CRM Dataset Shared 
CRM upselling Labels... 
Energy Efficiency Reg... 
Flight Delays Data 
Flight on-time perfor... 
Forest fires data 
fraudTemplateUtil.zip 
German Credit Card 
IMDB Movie Titles 
Iris Two Class Data 
MNIST Test 10k 28x2... 
MNIST Train 60k 280... 
NEW

What I ignored was the guts of the experiment, the Execute R Script GENERATE FORECAST block.

What is R?

Before going further, I should briefly introduce the R programming language. R is an open source project that is a “language and environment for statistical computing and graphics” (R: What is R? (r-project.org)). It has a myriad of packages available for data wrangling, machine learning, visualization, and more. It’s also the language that I used throughout my MOOC and graduate school studies on data science. Because of this, it’s still my go-to tool for exploratory analysis.

R has historically been the top choice for time series forecasting. There are some well known authors and researchers who authored forecasting modules for R. That said, most of these modules have been ported to Python and other languages so I doubt there is much difference for the most common methods these days.

Getting started with Execute R Script

Selecting the Execute R Script block and expanding the slide out on the right will show you the R code. You can expand the pane and the editor to show more code. What you’ll find is 400+ lines of code that start and stop with calls to the maml functions that map inputs and outputs to the other blocks in the experiment.

nand forecasting 
Execute R Script 
GENERATE FORECAST 
Split Data 
•ct Columns in Dataset 
In draft 
Properties Project 
Execute R Script 
R Script 
Web service input 
2 
1 
2 
3 
4 
5 
6 
library (forecast) 
library(plyr) 
OUTPUT TYPE FORECAST 1 
OUTPUT TYPE MODEL VAR 2 
MAX DECIMAL 7922816251426433759354395C 
Enter Data Manually 
PARAMETERS 
Random Seed 
42 
R Version 
Microsoft R Open 32.2 
START TIME 
END TIME 
ELAPSED TIME 
STATUS CODE 
STATUS DETAILS 
View output log 
10/1/2021 AM 
10/1/2021 AM 
1.332 
Finished 
None 
Web Service Parameters 
Parameters 
Select Columns in Dataset

You can copy and paste the code locally. I use R Studio (RStudio | Open source & professional software for data science teams – RStudio) for my local IDE. Getting the code into this experience is one of the first things I did in this case.

Understanding the forecast methods

The R script has quite a lot of functionality built into it. It can fill in missing values, address seasonality, deal with different train vs. test scenarios, and more. But the heart of the code involves iterating over each time series (aka grain) to find the best forecasting method of the methods coded into the script.

There are three primary forecasting methods used in the R script – ARIMA (auto regressive integrated moving average), ETS (exponential smoothing), and STL (seasonal, trend, Loess). The script also has a couple ensemble options available for ETS+ARIMA and ETS+STL. So there are five total options.

These methods are “classical statistical” methods typically used on “univariate data”. I’ll elaborate, starting with the latter term.

“Univariate” data means that there is only the historical time series data available for future forecasts. There can be multiple time series (aka grains), but each time series is treated independently from the rest. Contrast this with “multivariate data” where additional independent variables are available for the forecast. These variables are still based on the time series, but can provide additional information that can help provide a more accurate forecast. For example, promotions will often drive sales and can be planned in the future. Including historical and future promotion information could significantly improve the forecast. That said, it can be tricky to set up additional independent variables.

“Classical statistical” methods typically only work with univariate data. They work by looking for components of the historical data like trend and seasonality. Contrast this with “machine learning regression” methods which reconfigure the dataset in such a way that supervised learning methods can be applied. These methods look at all the grains collectively and can factor in additional independent variables (multivariate data).

My experience is that the classical statistical methods work best with univariate data. Because the Azure ML interface for demand forecasting requires univariate data, it’s reasonable to assume that the script will perform well. We’ll test that theory out in the next blog post.  We’ll also explore the benefits of having multivariate data and using machine learning regression methods.

Picture details:  10/6/2020, Canon PowerShot G3 X, f/5, 1/640 s, ISO-125