9 min listen
word2vec
FromData Skeptic
ratings:
Length:
31 minutes
Released:
Feb 1, 2019
Format:
Podcast episode
Description
Word2vec is an unsupervised machine learning model which is able to capture semantic information from the text it is trained on. The model is based on neural networks. Several large organizations like Google and Facebook have trained word embeddings (the result of word2vec) on large corpora and shared them for others to use. The key algorithmic ideas involved in word2vec is the continuous bag of words model (CBOW). In this episode, Kyle uses excerpts from the 1983 cinematic masterpiece War Games, and challenges Linhda to guess a word Kyle leaves out of the transcript. This is similar to how word2vec is trained. It trains a neural network to predict a hidden word based on the words that appear before and after the missing location.
Released:
Feb 1, 2019
Format:
Podcast episode
Titles in the series (100)
[MINI] Anscombe's Quartet: This mini-episode discusses Anscombe's Quartet, a series of four datasets which are clearly very different but share some similar statistical properties with one another. For example, each of the four plots has the same mean and variance on both... by Data Skeptic