Data science grad school top 10

Diploma

It’s been about two months since I wrapped up my last class in the UW Data Science master’s degree program. Given that I got the enclosed in the mail this week, it seems like a good time to do a bit of reflection on the program.

With apologies to David Letterman, I’m going to write this as a top 10 countdown. I figure I put close to 1,000 hours into 12 classes over three years, so scoping it down to 10 key takeaways leaves out a lot. But, hey, it’s a blog post…

Without further ado, here is Dave’s “Top 10 takeaways from data science grad school”:

#10 – the value of leveraging research papers for learning

DS 780, Strategic Decision Making, required multiple papers with academic paper sources. At first, I thought this was some unessential overhead. But I came around to seeing the value of this activity and that I’ve undervalued this resource in my job and learning.

#9 – digital transformation and how marketing is changing

DS780 also provided perspective on the buzzy idea of digital transformation. As someone who has spent his career in product engineering, learning about how the world of marketing has changed in the digital world was interesting.

#8 – improved communication skills

You can always improve your communication skills. In DS 735, I was reminded of the importance of being direct and concise in writing through lectures and practice. I have less proficiency and practice in presentation skills than writing skills. The opportunity to learn new data presentation skills, practice them, and get feedback was valuable.

#7 – how operations research techniques apply to data science

When I started the program, I didn’t have a good appreciation of the difference between predictive and prescriptive techniques. Learning operations research techniques such as linear programming and stochastic simulations helped me realize there’s more to data science than ML. This is an under-appreciated aspect of the field.

#6 – a broad survey of ML algorithms

Speaking of ML, we covered a plethora of them in DS 740. Gaining an intuition of how each algorithm works as well as the prerequisites were two valuable takeaways. I also have one of the most commonly used data science references in Springer’s Elements of Statistical Learning to use in the future.

#5 – an introduction to big data techniques and languages

I’ve had a bit of exposure to the big data query language, hive, before this class, but I frankly didn’t get it. Having to explicitly create map reduce programs before learning hive and pig was a great way to understand what was going on.

#4 – a solid grounding in statistical techniques

The null hypothesis is a classic statistic essential, and something that is not intuitive. The process and language is very specific. Practice is required to do it well, and practice is what we got in DS 705. And we learned about the importance of a normal distribution, what to do when you don’t have a normal distribution, T-tests, ANOVA, and more.

#3 – a good understanding of ethical challenges in data science

Socio-technical problems are hard because they are constantly on new and changing ground. Learning how to apply ethical frameworks in DS 760 and realizing that technical publications from 15-20 years were leaning into these problems was an eye opener. I have a lot more respect for some of the work that Microsoft and other leaders are doing in this space now, and a lot less patience for companies that aren’t being proactive about privacy, security, and other ethical considerations.

#2 – what the bias-variance trade-off is all about

Bias results from a model under-representing the real world problem… an unsophisticated model. Variance happens when a model shifts significantly with new data… an overtrained model. Both are sources of model error and a middle ground between these errors must be achieved for an optimal model. This is an essential for applying ML.

And the #1 takeaway from data science grad school… data science is a lot more than algorithms!

The #1 takeaway has to do with the breadth of topics addressed in the courses. Unlike a boot camp where you learned how to code in R or a MOOC on neural networks, the curriculum touched on a wide variety of topics in data science. As this top 10 shows, many of those topics are not technical in nature. For the technical topics, understanding the underlying concepts are important to having intuition for applying the techniques.

A few more thoughts…

I could write a lot more about the program itself, but this post is already getting long. I will emphasize one point, however. The UW program is a multi-university, multi-department program. This diversity is a definite strength of the program and was an important factor in my top takeaway.

Given the variety of backgrounds entering the program, this list will vary quite a bit with different students. You don’t see many software engineering or programming items in my list, for example.  For others, becoming proficient in R or Python might be their top takeaway.

I am fortunate to work for a company, Microsoft, with a tuition reimbursement program. I would be remiss if I didn’t call that out as it was a huge enabler for me. Tuition reimbursement is an under-utilized benefit… take advantage it if you can!

From a personal standpoint, my goal was to leverage this investment into new roles and challenges in my career. The analytics portion of my job has been slowly increasing over the last few years, and I am now transitioning into a role where my focus will be adding intelligent capabilities that leverage data to the Dynamics 365 Finance and Operations product. This is an exciting new challenge made possible by the investment and learning from my data science graduate school experience.