This document covers Beautiful Soup version 4.8.1. The examples in this documentation should work the same way in Python 2.7 and Python 3.2. You might be looking for the documentation for Beautiful Soup 3. If so, you should know that Beautiful Soup 3 is no longer being developed and that support for it will be dropped on or after December 31, 2020.
In this tutorial, we will show you, how to perform web scraping in Python using Beautiful Soup 4 for getting data out of HTML, XML and other markup languages. In this we will try to scrap webpage from various different websites (including IMDB). We will cover beautiful soup 4, python basic tools for efficiently and clearly navigating, searching and parsing HTML web page. We have tried to cover almost all the functionalities of Beautiful Soup 4 in this tutorial. You can combine multiple functionalities introduced in this tutorial into one bigger program to capture multiple meaningful data from the website into some other sub-program as input.
This tutorial is basically designed to guide you in scarping a web page. Basic requirement of all this is to get meaningful data out of huge unorganized set of data. The target audience of this tutorial can be anyone of:
Anyone who wants to know – how to scrap webpage in python using BeautifulSoup 4.
Any data science developer/enthusiasts or anyone, how wants to use this scraped (meaningful) data to different python data science libraries to make better decision.
Though there is NO mandatory requirement to have for this tutorial. However, if you have any or all (supercool) prior knowledge on any below mentioned technologies that will be an added advantage −
Knowledge of any web related technologies (HTML/CSS/Document object Model etc.).
Python Language (as it is the python package).
Developers who have any prior knowledge of scraping in any language.
Basic understanding of HTML tree structure.