Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Dataset Creation and Curation - Christiaan Swart

Dataset Creation and Curation - Christiaan Swart

FromDataTalks.Club


Dataset Creation and Curation - Christiaan Swart

FromDataTalks.Club

ratings:
Length:
56 minutes
Released:
Sep 9, 2022
Format:
Podcast episode

Description

We talked about:

Christiaan’s background
Usual ways of collecting and curating data
Getting the buy-in from experts and executives
Starting an annotation booklet
Pre-labeling
Dataset collection
Human level baseline and feedback
Using the annotation booklet to boost annotation productivity
Putting yourself in the shoes of annotators (and measuring performance)
Active learning
Distance supervision
Weak labeling
Dataset collection in career positioning and project portfolios
IPython widgets
GDPR compliance and non-English NLP
Finding Christiaan online


Links:

My personal blog: https://useml.net/
Comtura, my company: https://comtura.ai/
LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
Twitter: https://twitter.com/swartchris8/


ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Released:
Sep 9, 2022
Format:
Podcast episode

Titles in the series (100)

DataTalks.Club - the place to talk about data!