Open In App

How to use Xpath with BeautifulSoup ?

Last Updated : 12 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

We have an HTML page and our task is to extract specific elements using XPath, which BeautifulSoup doesn’t support directly. For example, if we want to extract the heading from the Wikipedia page on Nike, we can’t do it with just BeautifulSoup, but with a mix of lxml and etree, we can. This article explains how to use XPath with BeautifulSoup by leveraging the lxml module.

Required Libraries

Before we start, install the following Python libraries:

pip install requests
pip install beautifulsoup4
pip install lxml

  • requests: Fetches the HTML content of a webpage.
  • bs4: Parses HTML using BeautifulSoup.
  • lxml: Enables XPath support and integrates with BeautifulSoup.

Understanding XPath

XPath works very much like a traditional file system.

To access file 1,

C:/File1

Similarly, To access file 2,

C:/Documents/User1/File2

Similarly in HTML:

//div[@id=”content”]/h1 #(Finds an <h1> tag inside a <div> with id=”content”)

With XPath, we can target tags, IDs, classes, text values or even element positions.

Why BeautifulSoup Alone Isn’t Enough

BeautifulSoup supports CSS selectors like .find() and .select(), but not XPath.

To use XPath:

  1. Parse the page with BeautifulSoup.
  2. Convert it into an lxml.etree object.
  3. Use .xpath() to extract data.

This hybrid approach gives you the flexibility of XPath with the simplicity of BeautifulSoup.

To find the XPath for a particular element on a page:

  • Right-click the element in the page and click on Inspect.
  • Right-click on the element in the Elements Tab.
  • Click on copy XPath.

Approach

  • Import the required modules
  • Send a request to the target webpage
  • Parse the HTML with BeautifulSoup
  • Convert it to lxml.etree format
  • Use .xpath() to extract elements

Note: If XPath is not giving you the desired result then copy the full XPath instead of XPath and the rest other steps would be the same.

Given below is an example to show how Xpath can be used with Beautifulsoup

Example: Extracting the title from the Wikipedia page for Nike using XPath.

Python
from bs4 import BeautifulSoup
from lxml import etree
import requests

# Target URL
url = "https://en.wikipedia.org/wiki/Nike,_Inc."

# Set headers to avoid blocking
headers = {
    'User-Agent': 'Mozilla/5.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

# Fetch the page
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.content, "html.parser")

# Convert to etree for XPath
dom = etree.HTML(str(soup))

# Extract heading using XPath
heading = dom.xpath('//*[@id="firstHeading"]/span')[0].text
print(heading)

Output:

Nike, Inc.

Explanation:

  • etree.HTML(str(soup)): Converts BeautifulSoup object to an XPath-compatible structure.
  • .xpath(…): Executes the XPath query.
  • [0].text: Retrieves the text of the first matching element.


Next Article
Article Tags :
Practice Tags :

Similar Reads