Загрузка страницы

Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)

Data is everywhere! Enhance your career and acquire new skills by taking a course on DataCamp! Click here to take the first chapter of any course for FREE: https://bit.ly/36lKg44 (you’ll be supporting my channel too!)

In this video we scrape Wikipedia pages to create a dataset on Disney movies.

The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.

We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library

Link to code & datasets: https://github.com/KeithGalli/disney-data-science-tasks
Previous tutorial on Beautiful Soup: https://youtu.be/GjKQ6V_ViQE

If you enjoyed this video, make sure to like & subscribe :)

This video was sponsored by DataCamp

---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup

Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: https://en.wikipedia.org/wiki/Toy_Story_3

Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: https://en.wikipedia.org/wiki/List_of_Walt_Disney_Pictures_films
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)

Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)

Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)

Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)

---------------------
Extra resources!
Setup Jupyter notebook: https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html
Google Colab (cloud-based notebook): https://colab.research.google.com/
Learn regular expressions: https://youtu.be/K8L6KVGG-7o

Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite for 6 months and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only

---------------------
Follow me on social media!
Instagram | https://www.instagram.com/keithgalli/
Twitter | https://twitter.com/keithgalli

If you are curious to learn how I make my tutorials, check out this video: https://youtu.be/LEO4igyXbLs

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Видео Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation) канала Keith Galli
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
1 октября 2020 г. 20:02:01
03:24:18
Яндекс.Метрика