How to use Xpath with BeautifulSoup ?

Last Updated : 12 Apr, 2025

We have an HTML page and our task is to extract specific elements using XPath, which BeautifulSoup doesn't support directly. For example, if we want to extract the heading from the Wikipedia page on Nike, we can’t do it with just BeautifulSoup, but with a mix of lxml and etree, we can. This article explains how to use XPath with BeautifulSoup by leveraging the lxml module.

Required Libraries

Before we start, install the following Python libraries:

pip install requests
pip install beautifulsoup4
pip install lxml

requests: Fetches the HTML content of a webpage.
bs4: Parses HTML using BeautifulSoup.
lxml: Enables XPath support and integrates with BeautifulSoup.

Understanding XPath

XPath works very much like a traditional file system.

To access file 1,

C:/File1

Similarly, To access file 2,

C:/Documents/User1/File2

Similarly in HTML:

//div[@id="content"]/h1 #(Finds an <h1> tag inside a <div> with id="content")

With XPath, we can target tags, IDs, classes, text values or even element positions.

Why BeautifulSoup Alone Isn’t Enough

BeautifulSoup supports CSS selectors like .find() and .select(), but not XPath.

To use XPath:

Parse the page with BeautifulSoup.
Convert it into an lxml.etree object.
Use .xpath() to extract data.

This hybrid approach gives you the flexibility of XPath with the simplicity of BeautifulSoup.

To find the XPath for a particular element on a page:

Right-click the element in the page and click on Inspect.
Right-click on the element in the Elements Tab.
Click on copy XPath.

Approach

Import the required modules
Send a request to the target webpage
Parse the HTML with BeautifulSoup
Convert it to lxml.etree format
Use .xpath() to extract elements

Note: If XPath is not giving you the desired result then copy the full XPath instead of XPath and the rest other steps would be the same.

Given below is an example to show how Xpath can be used with Beautifulsoup

Example: Extracting the title from the Wikipedia page for Nike using XPath.

Python

from bs4 import BeautifulSoup
from lxml import etree
import requests

# Target URL
url = "https://en.wikipedia.org/wiki/Nike,_Inc."

# Set headers to avoid blocking
headers = {
    'User-Agent': 'Mozilla/5.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

# Fetch the page
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content, "html.parser")

# Convert to etree for XPath
dom = etree.HTML(str(soup))

# Extract heading using XPath
heading = dom.xpath('//*[@id="firstHeading"]/span')[0].text
print(heading)

Output:

Nike, Inc.

Explanation:

etree.HTML(str(soup)): Converts BeautifulSoup object to an XPath-compatible structure.
.xpath(...): Executes the XPath query.
[0].text: Retrieves the text of the first matching element.

How to Import BeautifulSoup in Python

saikatsahana91

Improve

Article Tags :

Practice Tags :

python

How to use Xpath with BeautifulSoup ?

Required Libraries

Understanding XPath

Why BeautifulSoup Alone Isn’t Enough

To find the XPath for a particular element on a page:

Approach

Similar Reads

Thank You!

What kind of Experience do you want to share?