28 min listen
Beam and Spark with Holden Karau
ratings:
Length:
35 minutes
Released:
May 9, 2018
Format:
Podcast episode
Description
Holden Karau is on the podcast this week to talk all about Spark and Beam, two open source tools that helps process data at scale, with Mark and Melanie.
Holden Karau
Holden Karau is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related “big data” tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
Cool things of the week
Twitter’s collaboration with Google Cloud blog & tweet
Kaggle CERN TrackML Particle Tracking Challenge Competition site
Open-sourcing gVisor, a sandboxed container runtime blog & repo
Announcing Stackdriver Kubernetes Monitoring blog
MLPerf: collaborative effort to standardize ML benchmarks site
Interview
Spark site & community site
Beam site
Cloud Dataflow site & docs
Cloud Dataproc site & docs
Using Spark on Kubernetes Engine blog
Testing future Apache Spark releases and changes on Google Kubernetes Engine and Cloud Dataproc blog
Spark Packages site
Spark testing base repo
Flink site
Arrow site
Upcoming Talks:
PyCon 2018 & Debugging PySpark talk
Scala Days & Keeping the “fun” in Spark talk
Strata London & Understanding Spark tuning with auto-tuning talk
J on the Beach & General Purpose Big Data Systems are eating the world talk
Spark Summit 2018 & Accelerating TF with Apache Arrow on Spark talk
Question of the week
I have a continuous integration build process setup with Container Builder, but it’s all sequential. I want to speed things up by processing parts of it in parallel. How do I do that?
Configure Build Step Order docs
Where can you find us next?
Mark can be found streaming Agones development on Twitch.
Melanie is speaking at the internet2 Global Summit, May 9th in San Diego,
and will also be talking at the Understand Risk Forum on May 17th, in Mexico City.
Special shout out: Google I/O and PyCon are both happening this week
Holden Karau
Holden Karau is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related “big data” tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
Cool things of the week
Twitter’s collaboration with Google Cloud blog & tweet
Kaggle CERN TrackML Particle Tracking Challenge Competition site
Open-sourcing gVisor, a sandboxed container runtime blog & repo
Announcing Stackdriver Kubernetes Monitoring blog
MLPerf: collaborative effort to standardize ML benchmarks site
Interview
Spark site & community site
Beam site
Cloud Dataflow site & docs
Cloud Dataproc site & docs
Using Spark on Kubernetes Engine blog
Testing future Apache Spark releases and changes on Google Kubernetes Engine and Cloud Dataproc blog
Spark Packages site
Spark testing base repo
Flink site
Arrow site
Upcoming Talks:
PyCon 2018 & Debugging PySpark talk
Scala Days & Keeping the “fun” in Spark talk
Strata London & Understanding Spark tuning with auto-tuning talk
J on the Beach & General Purpose Big Data Systems are eating the world talk
Spark Summit 2018 & Accelerating TF with Apache Arrow on Spark talk
Question of the week
I have a continuous integration build process setup with Container Builder, but it’s all sequential. I want to speed things up by processing parts of it in parallel. How do I do that?
Configure Build Step Order docs
Where can you find us next?
Mark can be found streaming Agones development on Twitch.
Melanie is speaking at the internet2 Global Summit, May 9th in San Diego,
and will also be talking at the Understand Risk Forum on May 17th, in Mexico City.
Special shout out: Google I/O and PyCon are both happening this week
Released:
May 9, 2018
Format:
Podcast episode
Titles in the series (100)
Node.js with Justin Beckwith: In the twentieth episode of this podcast, your hosts Francesc and Mark interview Justin Beckwith, a Product Manager at Google Cloud Platform, about how Node.js and the cloud work together. by Google Cloud Platform Podcast