61 min listen
AI Positive
FromAI Inside
ratings:
Length:
62 minutes
Released:
Jan 24, 2024
Format:
Podcast episode
Description
On the premiere episode of the AI Inside podcast, hosts Jeff Jarvis and Jason Howell discuss AI copyright issues with Common Crawl Foundation's Rich Skrenta regarding news outlets limiting access to content they publish publicly, impacting the integrity of Common Crawl's internet archive. In recent years, the archive has been used by LLMs as AI training data, and the implications of restricting information have a dramatic impact on the data quality that survives. INTERVIEWIntroduction and background on AI Inside podcastDiscussion of the recent AI oversight Senate hearing Jeff testified atIntroduction of guest Rich Skrenta from Common Crawl FoundationOverview of Common Crawl and its goals to archive the open webDiscussion of how Common Crawl data is used to train AI modelsNews publishers wanting content removed from Common CrawlDebate around copyright, fair use, and AI’s “right to read”Mechanics of how Common Crawl works and what it archivesConcerns about restricting AI access to data for trainingRisk of regulatory capture and only big companies being able to use AIDiscussion of recent court ruling related to web scrapingHopes for Common Crawl's growth and evolutionNEWS BITESInteresting device announcement from CES - Rabbit R1 with Perplexity AI integrationStudy on actual risk of AI automating jobs away in the near future Hosted on Acast. See acast.com/privacy for more information.
Released:
Jan 24, 2024
Format:
Podcast episode
Titles in the series (18)
Schibsted's AI Strategy with Sven Størmer Thaulow: Building a Large Language Model for Media by AI Inside