Beautiful Soup (HTML parser)
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which is useful for web scraping.[2]
| Original author(s) | Leonard Richardson | 
|---|---|
| Initial release | 2004 | 
| Stable release | |
| Repository | |
| Written in | Python | 
| Platform | Python | 
| Type | HTML parser library, Web scraping | 
| License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[2] | 
| Website | www | 
Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,[4] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.[5]
It is available for Python 2.7 and Python 3.
#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))
Advantages and disadvantages of parsers
    
This table summarizes the advantages and disadvantages of each parser library[2]
| Parser | Typical usage | Advantages | Disadvantages | 
|---|---|---|---|
| Python’s html.parser | BeautifulSoup(markup, "html.parser") | 
 | 
 | 
| lxml’s HTML parser | BeautifulSoup(markup, "lxml") | 
 | 
 | 
| lxml’s XML parser | BeautifulSoup(markup, "lxml-xml")  | 
 | 
 | 
| html5lib | BeautifulSoup(markup, "html5lib") | 
 | 
 | 
Release
    
Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4.
See also
    
    
References
    
- https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG; retrieved: 16 December 2021.
-  "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself 
- Hajba, Gábor László (2018), Hajba, Gábor László (ed.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy, Apress, pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
- "Code : Leonard Richardson". Launchpad. Retrieved 2020-09-19.
- Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com. Retrieved 2020-09-19.
