Ep. 37 - The Rise of the Data Engineer

FromfreeCodeCamp Podcast

Start listening View podcast show

Ep. 37 - The Rise of the Data Engineer

FromfreeCodeCamp Podcast

ratings:

Length:

19 minutes

Released:

Jul 2, 2018

Format:

Podcast episode

Description

When Maxime worked at Facebook, his role started evolving. He was developing new skills, new ways of doing things, and new tools. And — more often than not — he was turning his back on traditional methods. He was a pioneer. He was a data engineer! In this podcast, you'll learn about the rise of the data engineer and what it takes to be one. Written by Maxime Beauchemin: https://twitter.com/mistercrunch Read by Abbey Rennemeyer: https://twitter.com/abbeyrenn Original article: https://fcc.im/2tHLCST Learn to code for free at: https://www.freecodecamp.org Intro music by Vangough: https://fcc.im/2APOG02 Transcript: I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence. The role we’d created for ourselves was a new discipline entirely. My team was at forefront of this transformation. We were developing new skills, new ways of doing things, new tools, and — more often than not — turning our backs to traditional methods. We were pioneers. We were data engineers! Data Engineering? Data science as a discipline was going through its adolescence of self-affirming and defining itself. At the same time, data engineering was the slightly younger sibling, but it was going through something similar. The data engineering discipline took cues from its sibling, while also defining itself in opposition, and finding its own identity. Like data scientists, data engineers write code. They’re highly analytical, and are interested in data visualization. Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services. In fact, it’s arguable that data engineering is much closer to software engineering than it is to a data science. In relation to previously existing roles, the data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering. This discipline also integrates specialization around the operation of so called “big data” distributed systems, along with concepts around the extended Hadoop ecosystem, stream processing, and in computation at scale. In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, and the like. In smaller environments people tend to use hosted services offered by Amazon or Databricks, or get support from companies like Cloudera or Hortonworks — which essentially subcontracts the data engineering role to other companies. In larger environments, there tends to be specialization and the creation of a formal role to manage this workload, as the need for a data infrastructure team grows. In those organizations, the role of automating some of the data engineering processes falls under the hand of both the data engineering and data infrastructure teams, and it’s common for these teams to collaborate to solve higher level problems. While the engineering aspect of the role is growing in scope, other aspects of the original business engineering role are becoming secondary. Areas like crafting and maintaining portfolios of reports and dashboards are not a data engineer’s primary focus. We now have better self-service tooling where analysts, data scientist and the general “information worker” is becoming more data-savvy and can take care of data consumption autonomously. ETL is changing We’ve also observed a general shift away from drag-and-drop ETL (Extract Transform and Load) tools towards a more programmatic approach. Product know-how on platforms like Informatica, IBM Datastage, Cognos, AbInitio or Micros

Released:

Jul 2, 2018

Format:

Podcast episode

Titles in the series (100)

The official podcast of the freeCodeCamp open source community. Learn to code with free online courses, programming projects, and interview preparation for developer jobs.

Skip carousel

More Episodes from freeCodeCamp Podcast

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Ep. 37 - The Rise of the Data Engineer

Ep. 37 - The Rise of the Data Engineer

Description

Titles in the series (100)

More Episodes from freeCodeCamp Podcast

Related podcast episodes