35 min listen
31: Crawling the Web using Elixir with Oleg Tarasenko and Tze Yiing
31: Crawling the Web using Elixir with Oleg Tarasenko and Tze Yiing
ratings:
Length:
51 minutes
Released:
Jan 19, 2021
Format:
Podcast episode
Description
We talk with Oleg Tarasenko and Tze Yiing about crawling the web using Elixir. Oleg created the crawly project to help solve this problem and Tze Yiing joined him as a contributor and maintainer. We cover how Elixir is well suited to orchestrate crawling, how to deal with login pages, understanding the legal concerns, building a codeless scraper and much more!
Show Notes online - http://podcast.thinkingelixir.com/31 (http://podcast.thinkingelixir.com/31)
Elixir Community News
- https://dashbit.co/blog/ten-years-ish-of-elixir (https://dashbit.co/blog/ten-years-ish-of-elixir) – January 9th marked the 10th year since the first commit to the Elixir repository
- https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b (https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b) – First commit on the repository
- https://twitter.com/josevalim/status/1349010127270129670 (https://twitter.com/josevalim/status/1349010127270129670) – Jose Valim reveals the name of his secret project is called 'Nx'
- https://remote.com/blog/welcoming-elixir-creator-jose-valim (https://remote.com/blog/welcoming-elixir-creator-jose-valim) – Jose Valim joins Remote as a Technical Adivsor
- https://twitter.com/josevalim/status/1347858475267854336 (https://twitter.com/josevalim/status/1347858475267854336) – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were running
- https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 (https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34) – ExUnit will print how much time the test suite spent on async tests vs sync tests
- https://twitter.com/fhunleth/status/1348092050487570433 (https://twitter.com/fhunleth/status/1348092050487570433) – Nerves support on the M1 is looking good
- https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg (https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg) – Elixir Conf 2020 videos have all been publicly released!
Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com)
Discussion Resources
- https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13 (https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13)
- https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 (https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64) – Using Elixir for price monitoring
- https://hex.pm/packages/crawly (https://hex.pm/packages/crawly)
- https://github.com/oltarasenko/crawly (https://github.com/oltarasenko/crawly)
- https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html (https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html) – Oleg's older web scraping with Elixir article
- https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html (https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html) – Building a machine learning projects with Elixir, Tensorflow and Crawly
- https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 (https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0) – What is web scraping, and why you might want to use it?
- https://www.pillowskin.com (https://www.pillowskin.com) – Ziinc's project using scraping and aggregation
- https://www.tensorflow.org/ (https://www.tensorflow.org/)
- https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b (https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b)
- https://scrapy.org/
Show Notes online - http://podcast.thinkingelixir.com/31 (http://podcast.thinkingelixir.com/31)
Elixir Community News
- https://dashbit.co/blog/ten-years-ish-of-elixir (https://dashbit.co/blog/ten-years-ish-of-elixir) – January 9th marked the 10th year since the first commit to the Elixir repository
- https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b (https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b) – First commit on the repository
- https://twitter.com/josevalim/status/1349010127270129670 (https://twitter.com/josevalim/status/1349010127270129670) – Jose Valim reveals the name of his secret project is called 'Nx'
- https://remote.com/blog/welcoming-elixir-creator-jose-valim (https://remote.com/blog/welcoming-elixir-creator-jose-valim) – Jose Valim joins Remote as a Technical Adivsor
- https://twitter.com/josevalim/status/1347858475267854336 (https://twitter.com/josevalim/status/1347858475267854336) – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were running
- https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 (https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34) – ExUnit will print how much time the test suite spent on async tests vs sync tests
- https://twitter.com/fhunleth/status/1348092050487570433 (https://twitter.com/fhunleth/status/1348092050487570433) – Nerves support on the M1 is looking good
- https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg (https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg) – Elixir Conf 2020 videos have all been publicly released!
Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com)
Discussion Resources
- https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13 (https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13)
- https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 (https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64) – Using Elixir for price monitoring
- https://hex.pm/packages/crawly (https://hex.pm/packages/crawly)
- https://github.com/oltarasenko/crawly (https://github.com/oltarasenko/crawly)
- https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html (https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html) – Oleg's older web scraping with Elixir article
- https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html (https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html) – Building a machine learning projects with Elixir, Tensorflow and Crawly
- https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 (https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0) – What is web scraping, and why you might want to use it?
- https://www.pillowskin.com (https://www.pillowskin.com) – Ziinc's project using scraping and aggregation
- https://www.tensorflow.org/ (https://www.tensorflow.org/)
- https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b (https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b)
- https://scrapy.org/
Released:
Jan 19, 2021
Format:
Podcast episode
Titles in the series (100)
10: Frontend vs Backend and Business Value of LiveView: News and discussion about why the talk of "Frontend vs Backend" is wrong, Single Page Apps, seeing the business value of LiveView, and much more! Show Notes online - https://thinkingelixir.com/podcast-episodes/010-frontend-vs-backend-and-business-... by Thinking Elixir Podcast