Web Scraping with Python and BeautifulSoup

Posted on June 1, 2024 (Last modified on June 8, 2024) • 2 min read • 248 words

Share via

Link copied to clipboard

Learn the basics of web scraping with Python using the BeautifulSoup library, including how to extract data from HTML and handle web requests.

On this page

Web Scraping with Python and BeautifulSoup

Web scraping allows you to extract data from websites. This guide covers the basics of web scraping with Python using the BeautifulSoup library, including how to extract data from HTML and handle web requests.

Installing BeautifulSoup and Requests

First, install the necessary libraries.

pip install beautifulsoup4 requests

Making Web Requests

Fetching a Web Page

import requests

url = "http://example.com"
response = requests.get(url)
print(response.text)

Handle errors when making web requests.

if response.status_code == 200:
    print(response.text)
else:
    print(f"Failed to retrieve data: {response.status_code}")

Parsing HTML with BeautifulSoup

Creating a BeautifulSoup Object

from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, "html.parser")

Extracting Data

# Extracting the title of the page
title = soup.title.string
print(title)

# Extracting all paragraphs
paragraphs = soup.find_all('p')
for p in paragraphs:
    print(p.text)

BeautifulSoup also supports CSS selectors.

links = soup.select('a[href]')
for link in links:
    print(link['href'])

Navigating the HTML Structure

# Extracting an element by ID
element = soup.find(id="example-id")
print(element.text)

# Extracting elements by class
elements = soup.find_all(class_="example-class")
for element in elements:
    print(element.text)

Handling Pagination

To scrape multiple pages, iterate through the pages by modifying the URL or using web scraping libraries like Scrapy.

base_url = "http://example.com/page"
for i in range(1, 6):
    url = f"{base_url}{i}"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    # Extract data from soup

Conclusion

Web scraping is a powerful tool for extracting data from websites. Practice using BeautifulSoup to parse HTML and extract information, and use requests to handle web requests effectively.

Handling Dates and Times in Python

Introduction to Data Analysis with Pandas

On this page:

Guides

Web Scraping with Python and BeautifulSoup

Installing BeautifulSoup and Requests

Making Web Requests

Fetching a Web Page

Parsing HTML with BeautifulSoup

Creating a BeautifulSoup Object

Extracting Data

Navigating the HTML Structure

Handling Pagination

Conclusion