Web Scraping with Python and BeautifulSoup
Posted on June 1, 2024 (Last modified on June 8, 2024) • 2 min read • 248 wordsLearn the basics of web scraping with Python using the BeautifulSoup library, including how to extract data from HTML and handle web requests.
Web scraping allows you to extract data from websites. This guide covers the basics of web scraping with Python using the BeautifulSoup library, including how to extract data from HTML and handle web requests.
First, install the necessary libraries.
pip install beautifulsoup4 requests
import requests
url = "http://example.com"
response = requests.get(url)
print(response.text)
Handle errors when making web requests.
if response.status_code == 200:
print(response.text)
else:
print(f"Failed to retrieve data: {response.status_code}")
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
# Extracting the title of the page
title = soup.title.string
print(title)
# Extracting all paragraphs
paragraphs = soup.find_all('p')
for p in paragraphs:
print(p.text)
BeautifulSoup also supports CSS selectors.
links = soup.select('a[href]')
for link in links:
print(link['href'])
# Extracting an element by ID
element = soup.find(id="example-id")
print(element.text)
# Extracting elements by class
elements = soup.find_all(class_="example-class")
for element in elements:
print(element.text)
To scrape multiple pages, iterate through the pages by modifying the URL or using web scraping libraries like Scrapy.
base_url = "http://example.com/page"
for i in range(1, 6):
url = f"{base_url}{i}"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Extract data from soup
Web scraping is a powerful tool for extracting data from websites. Practice using BeautifulSoup to parse HTML and extract information, and use requests to handle web requests effectively.