Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)
Data is everywhere! Enhance your career and acquire new skills by taking a course on DataCamp! Click here to take the first chapter of any course for FREE: https://bit.ly/36lKg44 (you’ll be supporting my channel too!)
In this video we scrape Wikipedia pages to create a dataset on Disney movies.
The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.
We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library
Link to code & datasets: https://github.com/KeithGalli/disney-data-science-tasks
Previous tutorial on Beautiful Soup: https://youtu.be/GjKQ6V_ViQE
If you enjoyed this video, make sure to like & subscribe :)
This video was sponsored by DataCamp
---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: https://en.wikipedia.org/wiki/Toy_Story_3
Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: https://en.wikipedia.org/wiki/List_of_Walt_Disney_Pictures_films
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)
Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)
Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)
---------------------
Extra resources!
Setup Jupyter notebook: https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html
Google Colab (cloud-based notebook): https://colab.research.google.com/
Learn regular expressions: https://youtu.be/K8L6KVGG-7o
Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite for 6 months and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only
---------------------
Follow me on social media!
Instagram | https://www.instagram.com/keithgalli/
Twitter | https://twitter.com/keithgalli
If you are curious to learn how I make my tutorials, check out this video: https://youtu.be/LEO4igyXbLs
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Видео Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation) канала Keith Galli
In this video we scrape Wikipedia pages to create a dataset on Disney movies.
The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.
We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library
Link to code & datasets: https://github.com/KeithGalli/disney-data-science-tasks
Previous tutorial on Beautiful Soup: https://youtu.be/GjKQ6V_ViQE
If you enjoyed this video, make sure to like & subscribe :)
This video was sponsored by DataCamp
---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)
Link: https://en.wikipedia.org/wiki/Toy_Story_3
Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
Link: https://en.wikipedia.org/wiki/List_of_Walt_Disney_Pictures_films
30:30 - Robots.txt (Are you allowed to scrape a site?)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)
Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)
Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)
---------------------
Extra resources!
Setup Jupyter notebook: https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html
Google Colab (cloud-based notebook): https://colab.research.google.com/
Learn regular expressions: https://youtu.be/K8L6KVGG-7o
Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite for 6 months and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only
---------------------
Follow me on social media!
Instagram | https://www.instagram.com/keithgalli/
Twitter | https://twitter.com/keithgalli
If you are curious to learn how I make my tutorials, check out this video: https://youtu.be/LEO4igyXbLs
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Видео Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation) канала Keith Galli
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Solving real world data science tasks with Python Pandas!Comprehensive Python Beautiful Soup Web Scraping Tutorial! (find/find_all, css select, scrape table)Easy Web Scraping in Python using Pandas for Data SciencePython Web scraping to CSV file| BeautifulSoup | Real Estate Website ScrapingReal-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc)2020 Machine Learning Roadmap (95% valid for 2022)Python Data Science Project Ideas! (for all skill levels)How to Schedule & Automatically Run Python Code!Sales Data Analysis With Python | Solving Real World Data Science Problems | Python Case Study | EDSComplete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)Python Plotting Tutorial w/ Matplotlib & Pandas (Line Graph, Histogram, Pie Chart, Box & Whiskers)Day in the Life of a Data Analyst - SurveyMonkey Data TransformationEverything you need to know about Classes in Python! (Object Oriented Programming Tutorial)Amazon Web Scraping Using Python | Data Analyst Portfolio ProjectWeb Scraping in Python using Beautiful Soup | Writing a Python program to Scrape IMDB websitePython for Everybody - Full University Python CourseBeautiful Soup 4 Tutorial #1 - Web Scraping With PythonData Analysis with Python for Excel Users - Full CourseHow to Generate an Analytics Report (pdf) in Python!