Data Shapley

FromLinear Digressions

Start listening View podcast show

Data Shapley

FromLinear Digressions

ratings:

Length:

17 minutes

Released:

Aug 19, 2019

Format:

Podcast episode

Description

We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why.

Relevant links:
http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
https://blog.acolyer.org/2019/07/15/data-shapley/

Released:

Aug 19, 2019

Format:

Podcast episode

Titles in the series (100)

Linear Digressions is a podcast about machine learning and data science. Machine learning is being used to solve a ton of interesting problems, and to accomplish goals that were out of reach even a few short years ago.

Skip carousel

More Episodes from Linear Digressions

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Data Shapley

Data Shapley

Description

Titles in the series (100)

More Episodes from Linear Digressions

Related podcast episodes