How to use Xpath with BeautifulSoup ?
Last Updated :
12 Apr, 2025
We have an HTML page and our task is to extract specific elements using XPath, which BeautifulSoup doesn't support directly. For example, if we want to extract the heading from the Wikipedia page on Nike, we can’t do it with just BeautifulSoup, but with a mix of lxml and etree, we can. This article explains how to use XPath with BeautifulSoup by leveraging the lxml module.
Required Libraries
Before we start, install the following Python libraries:
pip install requests
pip install beautifulsoup4
pip install lxml
- requests: Fetches the HTML content of a webpage.
- bs4: Parses HTML using BeautifulSoup.
- lxml: Enables XPath support and integrates with BeautifulSoup.
Understanding XPath
XPath works very much like a traditional file system.

To access file 1,
C:/File1
Similarly, To access file 2,
C:/Documents/User1/File2
Similarly in HTML:
//div[@id="content"]/h1 #(Finds an <h1> tag inside a <div> with id="content")
With XPath, we can target tags, IDs, classes, text values or even element positions.
Why BeautifulSoup Alone Isn’t Enough
BeautifulSoup supports CSS selectors like .find() and .select(), but not XPath.
To use XPath:
- Parse the page with BeautifulSoup.
- Convert it into an lxml.etree object.
- Use .xpath() to extract data.
This hybrid approach gives you the flexibility of XPath with the simplicity of BeautifulSoup.
To find the XPath for a particular element on a page:
- Right-click the element in the page and click on Inspect.
- Right-click on the element in the Elements Tab.
- Click on copy XPath.

Approach
- Import the required modules
- Send a request to the target webpage
- Parse the HTML with BeautifulSoup
- Convert it to lxml.etree format
- Use .xpath() to extract elements
Note: If XPath is not giving you the desired result then copy the full XPath instead of XPath and the rest other steps would be the same.
Given below is an example to show how Xpath can be used with Beautifulsoup
Example: Extracting the title from the Wikipedia page for Nike using XPath.
Python
from bs4 import BeautifulSoup
from lxml import etree
import requests
# Target URL
url = "https://en.wikipedia.org/wiki/Nike,_Inc."
# Set headers to avoid blocking
headers = {
'User-Agent': 'Mozilla/5.0',
'Accept-Language': 'en-US,en;q=0.5'
}
# Fetch the page
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content, "html.parser")
# Convert to etree for XPath
dom = etree.HTML(str(soup))
# Extract heading using XPath
heading = dom.xpath('//*[@id="firstHeading"]/span')[0].text
print(heading)
Output:
Nike, Inc.
Explanation:
- etree.HTML(str(soup)): Converts BeautifulSoup object to an XPath-compatible structure.
- .xpath(...): Executes the XPath query.
- [0].text: Retrieves the text of the first matching element.
Similar Reads
How to Use lxml with BeautifulSoup in Python In this article, we will explore how to use lxml with BeautifulSoup in Python. lxml is a high-performance XML and HTML parsing library for Python, known for its speed and comprehensive feature set. It supports XPath, XSLT, validation, and efficient handling of large documents, making it a preferred
3 min read
Contents list - Python Beautifulsoup The contents list is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The content is a list that contains the tagâs children.Syntax:Â tag.contents Below given exa
1 min read
How to Import BeautifulSoup in Python Beautiful Soup is a Python library used for parsing HTML and XML documents. It provides a simple way to navigate, search, and modify the parse tree, making it valuable for web scraping tasks. In this article, we will explore how to import BeautifulSoup in Python. What is BeautifulSoup?BeautifulSoup
3 min read
How to Scrape Websites with Beautifulsoup and Python ? Have you ever wondered how much data is created on the internet every day, and what if you want to work with those data? Unfortunately, this data is not properly organized like some CSV or JSON file but fortunately, we can use web scraping to scrape the data from the internet and can use it accordin
10 min read
BeautifulSoup4 Module - Python BeautifulSoup4 is a user-friendly Python library designed for parsing HTML and XML documents. It simplifies the process of web scraping by allowing developers to effortlessly navigate, search and modify the parse tree of a webpage. With BeautifulSoup4, we can extract specific elements, attributes an
3 min read
How to modify HTML using BeautifulSoup ? BeautifulSoup in Python helps in scraping the information from web pages made of HTML or XML. Not only it involves scraping data but also involves searching, modifying, and iterating the parse tree. In this article, we will discuss modifying the content directly on the HTML web page using BeautifulS
3 min read