Tools of the trade: Data Science Virtual Machine

IMG_0569

This is the first of a series of blog posts using a theme of “Tools of the trade”. If you’ve been following along, it will be obvious that the trade is data science. Each post will feature a tool… which may or may not be an actual tool. In addition to software tools applied to data science, topics will include statistical concepts, data science techniques, or related items. In all cases, the tool/topic will contribute to accomplishing data science tasks.

The target audience for the posts will be engineers, analysts, and managers who want to build their knowledge and skills in data science. The posts will typically be fairly high level and short, but some will include code snippets and a deeper dive into a tool.

With that introduction, here’s the inaugural Tools of the Trade post on the Data Science Virtual Machine.

What is it?

An Azure Data Science Virtual Machine (DSVM) is a cloud hosted environment that has a variety of pre-installed tools to jump start the data science process – Anaconda Python, R, Power BI, Jupyter notebooks, VS Code, Azure ML SDKs, and several other Azure SDKs. It’s a functional and cost effective way to perform data science activities without the headaches of environment set up and management.

A quad CPU VM that has 16 GB of RAM and premium disks can be used for under $150 per month (as of October, 2019). And that rate only applies with the VM is started; there is no charge when the VM is shut down. As long as your data set sizes are reasonable, this environment has plenty of horsepower for data munging and ML work.

How do I use it?

We’ve leveraging the DSVM on customer proof of concept projects. With some process built around it, it’s a good tool to ensure compliance for data handling and environment access while having all the necessary software tools available.

Discussion

I generally like to manage my own environments and run on ‘real’ hardware. For example, I avoided the cloud environments available for my data science master’s work in favor of local installs. Understanding the installation process is part of understanding your work.  And you can be productive on an airplane!

That said, I’ve learned to appreciate these pre-configured environments, and not only for customer engagements. Sometimes it’s easiest to just update an entire environment than update things piecemeal. It also reinforces the need to have all your work under source code control so you can easily re-create any project. This is how I work even if it’s a simple exploration.

A DSVM is a tool you could use for your everyday work or reach for it only as the situation dictates.  

Picture details:  Sunrise, 10/23/2019, Apple iPhone 7 Plus, f/1.8, 1/15 sec, ISO-80