Data science grad school projects


One of the favorite parts of my data science graduate school experience has been the self driven projects that we were required to complete for various classes.  Roughly half of the 12 classes have had the requirement to come up with a project to apply the techniques learned in the class.  One class, DS 745- Visualization and Unstructured Data Analysis, was broken into three major sections and each section required a self driven project.

These projects were typically quite time consuming.  But since they were on topics that I was interested in, it wasn’t hard to put in the effort.  The reward was definitely there as applying new skills outside the structure of the classroom forces you to dig in deeper.  A lot of times the challenge is to figure out how to take the first step to applying the new skills to your own problem.

I spent a few minutes this weekend looking through the projects that I’ve done over the last three years in the program.  Here’s the list that quickly bubbled up:

  • Predicting MLB Breakout Pitchers – used Hadoop, Pig, and R to evaluate MLB pitching data since 1871 to understand signals that predict a starting pitcher breakout season.
  • Machine Learning for Employee Retention – used R to evaluate multiple ML approaches for employee retention based on accuracy, inference, and robustness.
  • NCAA Tournament Visualization – developed multiple visualizations in Power BI to provide insight on NCAA basketball tourney progression based only on seed information.
  • Customer Satisfaction Comments Analysis – used text analytics in R to classify and analyze comments from product feedback survey.
  • Identifying Customer Communities in a Product Ecosystem – used network analysis in R to analyze a product ecosystem containing the core product, partner implementations, and customers. Used bipartite network approach to identify customer communities based on partner affiliations.
  • Ethical Concerns When Applying Machine Learning in Clinical Healthcare – 12 page paper using ethical analysis to justify the need for Explainable AI in healthcare based.

It’s pretty easy to see from this list where my interests lie – sports and work!  The exception that seemingly is not closely aligned with these two topics is the last topic on health care.  That said, Explainable AI is a topic that I’m very interested in and I recently presented on the topic for an Engineering Excellence session at the Microsoft Fargo campus.  This presentation will get some more mileage at the local university, NDSU, in coming months.

Being able to do self driven projects like this and get meaningful instructor feedback is an advantage of a master’s program over MOOC’s, certificate programs, or vendor education.  It definitely was a valuable part of the degree program for me.  I’ll likely talk more about these projects in future posts.

Picture details:  1/20/2019, Sunrise over Lida, Canon PowerShot SD4000 IS, f/5, 1/320 s, –1 step, ISO-1600