Processing and visualizing NCAA tourney pool results with R and Power BI

IMG_0604

In my last post, Aggregating Spreadsheets with R, I described how Excel has gone from my jack-of-all-trades data processing tool to one that is used for data entry and limited exploration. This post will describe another use of R for processing and the value of Power BI for visualization.

The picture below shows the overall flow for the automation of the pool. The last post showed the R code for looping through the submitted Excel files and extracting the eight teams selected by each participant into a single comma separated value (CSV) with all submissions in it.

NCAAPoolFlow

Now that I have all submissions in a single file, I want an easy way to track the pool results over the three weekends of the tournament and provide updates to the participants. I do this in three steps:

1. Use Excel to update the tournament results

This file has all 64 teams in a worksheet along with columns for each round. The pool scoring system is embedded in the spreadsheet using some simple multiplication formulas. This is mostly due to the legacy of running this pool back when Excel was the sole tool in the toolbox. Excel works great for this data entry and simple formula task.

2. Use R to join the results Excel file with the submissions

By joining information from the submissions file and the results file, the R code produces a pool results file that has the results for each round for each team submitted by each participant. The code for this is shown below. The R package, dplyr, is easily the package I use the most in R. In this example, I use it to select columns and join the two data sets together.

library(dplyr)
inputPath <- "2019\\"

# Get submissions list.
submissions <- read.csv(paste0(inputPath, "2019AllSubmissions.csv"))

# Get results list. Convert it to a data frame and remove a few redundant columns.
resultsSheet <- readxl::read_excel(paste0(inputPath, "2019Results.xlsx"), "Results")
scoring <- as.data.frame(resultsSheet)
scoring <- scoring %>% select(-"Bracket Order", -Seed, -Price, -Region)

# Join the current results with the submissions list.
results <- left_join(submissions, scoring, by=c("Submission.Team"="Team"))
write.csv(results, paste0(inputPath, "NCAAResults.csv"), row.names=F)

3. Refresh the Power BI report

Power BI makes it easy to create interactive charts. My default page design has a summary chart at the top of the page and details below it. Here’s the current state of the pool for this year with only the Final Four games left to be played. By selecting my ‘bar’ in the top chart, the other bar chart and the contents of the table are filtered down to my round and team detail.

NCAAPoolPBI

Sharing the report is easy within a company by publishing it. Sharing outside the company, as is the requirement in this case, is also possible although I haven’t explored it. Instead, I screenshot the report and share the update over e-mail.

In summary, use the right tool for the right job!  For scenarios like this, I use Excel for data entry, R for automation, and Power BI for visualization.  It’s a combination that works well.

Picture details:  “Spring kayak”, May 12, 2018, iPhone 7, f1.8, 1/120 s, ISO-25