Introduction to Data Engineering

IMG_0253

Data engineering is a hot discipline right now. A quick search on LinkedIn shows over 200,000 job openings. While that doesn’t quite match the 350,000 data scientist openings, it’s in the ballpark.

Data engineering sits upstream from data science and is a pre-requisite to successful data science projects. According to Reis and Housley in Amazon.com: Fundamentals of Data Engineering: Plan and Build Robust Data Systems: 9781098108304: Reis, Joe, Housley, Matt: Books, the definition of data engineering is the following:

Data engineering is the development, implementation, and maintenance of systems and processes that take in raw data and produce high-quality, consistent information that supports downstream use cases, such as analysis and machine learning.

In my consulting practice, I’m finding a stronger need for data engineering skills than data science skills. This is especially true for companies in the early stages of their data driven evolution, but it’s also critical as they grow as governance and other factors need to strengthen.

Seeing the need for data engineers and a strong alignment with software engineering, I pitched a new class in the NDSU Computer Science curriculum last winter – “Introduction to Data Engineering”. It’s a senior level elective that supports data science and software engineering specializations in the program. I’ll be teaching it for the first time this fall.

Fortunately, the aforementioned book from Reis and Housley will serve as an excellent text book for the class. The book is just out, currently only available from Amazon in electronic format. As book title espouses, it’s about fundamentals rather than specific technologies. The timing has been perfect as I’m just getting into building out the course details.

Check out this book if you’re interested in data engineering. And stay tuned for future posts about data engineering, and let me know if you are interested in learning more about it at your company.