Implementing Web Scraping in Python with BeautifulSoup Last Updated : 18 Jul, 2025 Comments Improve Suggest changes Like Article Like Report BeautifulSoup is a Python library used for web scraping. It helps parse HTML and XML documents making it easy to navigate and extract specific parts of a webpage. This article explains the steps of web scraping using BeautifulSoup.Steps involved in web scrapingSend an HTTP Request: Use the requests library to send a request to the webpage URL and get the HTML content in response.Parse the HTML Content: Use a parser like html.parser or html5lib to convert the raw HTML into a structured format (parse tree).Extract Data: Use BeautifulSoup to navigate the parse tree and extract the required data using tags, classes, or IDs.Now, let’s go through the web scraping process step by step.Before starting with the steps, make sure to install all the necessary libraries. Run the following commands on command prompt or terminal using pip:pip install requestspip install beautifulsoup4Step 1: Fetch HTML ContentThe first step in web scraping is to send an HTTP request to the target webpage and fetch its raw HTML content. This is done using requests library. Python import requests url = "https://www.geeksforgeeks.org/data-structures/" response = requests.get(url) print(response.text) Explanation:GET request is sent to the URL using requests library..text attribute of the response object returns HTML content of the page as a string.Note: If you're facing issues like "403 Forbidden" try adding a browser user agent like below. You can find your user agent online based on device and browser. Python headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} response = requests.get(url, headers=headers) Step 2: Parse HTML with BeautifulSoupNow that we have raw HTML, the next step is to parse it using BeautifulSoup so we can easily navigate and extract specific parts of the content. Python from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') print(soup.prettify()) # prints well-formatted HTML Explanation:Passing raw HTML to BeautifulSoup to create a parsed tree structure.html.parser is Python's built-in HTML parser.Note: BeautifulSoup supports different parsers like html.parser, lxml and html5lib. Choose one by specifying it as the second argument. For Example: soup = BeautifulSoup(response.text, 'html5lib')Step 3: Extract Specific DataNow that the HTML is parsed, specific elements like text, links or images can be extracted by targeting tags and classes using BeautifulSoup methods like .find() or .find_all().Suppose we want to extract quotes from a website then we will do: Python import requests from bs4 import BeautifulSoup url = "http://www.values.com/inspirational-quotes" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') quotes = [] quote_boxes = soup.find_all('div', class_='col-6 col-lg-3 text-center margin-30px-bottom sm-margin-30px-top') for box in quote_boxes: quote_text = box.img['alt'].split(" #") quote = { 'theme': box.h5.text.strip(), 'image_url': box.img['src'], 'lines': quote_text[0], 'author': quote_text[1] if len(quote_text) > 1 else 'Unknown' } quotes.append(quote) # Display extracted quotes for q in quotes[:5]: # print only first 5 for brevity print(q) Explanation:soup.find_all(): locates all quote containers based on their class.For each quote box box.img['alt'] gives text containing quote lines and author and split(" #") separates the quote from the author.A dictionary is created with theme, image_url, lines and author.Extracts quotes into a list of dictionaries and prints the first 5 for brevity.Understanding the HTML StructureBefore extracting data, it’s helpful to inspect the HTML structure using soup.prettify() to find out where the information is written in the code of the page. For example: quotes is inside a <div> with a specific id or class, we can find it using:container = soup.find('div', attrs={'id': 'all_quotes'})find() gets the first <div> that has id="all_quotes".If there are multiple quote boxes inside that section, we can use:container.find_all()Step 4: Save Data to CSVNow that the data is extracted, it can be saved into a CSV file for easy storage and future use. Python’s built-in csv module is used to write the data in a structured format. Python import csv filename = "quotes.csv" with open(filename, mode='w', newline='', encoding='utf-8') as file: writer = csv.DictWriter(file, fieldnames=['theme', 'image_url', 'lines', 'author']) writer.writeheader() for quote in quotes: writer.writerow(quote) Explanation:with open() statement creates a new CSV file (quotes.csv) in write mode with UTF-8 encoding.csv.DictWriter() sets up a writer object to write dictionaries to the file using specified column headers.writer.writeheader() writes the header row to the CSV using the defined field names.for loop writes each quote dictionary as a row in the CSV using writer.writerow(quote)This script scrapes inspirational quotes from the website, parses the HTML content, extracts relevant information and saves the data to a quotes.csv file for later use. Comment More infoAdvertise with us Next Article Installing BeautifulSoup: A Beginner's Guide K kartik Follow Improve Article Tags : Project Python Web-scraping Practice Tags : python Similar Reads Implementing Web Scraping in Python with BeautifulSoup BeautifulSoup is a Python library used for web scraping. It helps parse HTML and XML documents making it easy to navigate and extract specific parts of a webpage. This article explains the steps of web scraping using BeautifulSoup.Steps involved in web scrapingSend an HTTP Request: Use the requests 6 min read Installing and Loading BeautifulSoupInstalling BeautifulSoup: A Beginner's GuideBeautifulSoup is a Python library that makes it easy to extract data from HTML and XML files. It helps you find, navigate, and change the information in these files quickly and simply. Itâs a great tool that can save you a lot of time when working with web data. The latest version of BeautifulSoup i 2 min read Beautifulsoup - Kinds of objectsPrerequisites: BeautifulSoup In this article, we will discuss different types of objects in Beautifulsoup. When the string or HTML document is given in the constructor of BeautifulSoup, this constructor converts this document to different python objects. The four major and important objects are : 4 min read How to Scrape Data From Local HTML Files using Python?BeautifulSoup module in Python allows us to scrape data from local HTML files. For some reason, website pages might get stored in a local (offline environment), and whenever in need, there may be requirements to get the data from them. Sometimes there may be a need to get data from multiple Locally 4 min read Navigating the HTML structure With Beautiful SoupFind the siblings of tags using BeautifulSoupPrerequisite: BeautifulSoup BeautifulSoup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come in built-in with Python. To install this type the below command in the terminal. In this article, we will learn about siblings in HTML tags using BeautifulSoup. He 2 min read Navigation with BeautifulSoupBeautifulSoup is a Python package used for parsing HTML and XML documents, it creates a parse tree for parsed paged which can be used for web scraping, it pulls data from HTML and XML files and works with your favorite parser to provide the idiomatic way of navigating, searching, and modifying the p 6 min read descendants generator â Python Beautifulsoupdescendants generator is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The .contents and .children attribute only consider a tagâs direct children. The descend 2 min read Searching and Extract for specific tags With Beautiful SoupPython BeautifulSoup - find all classPrerequisite:- Requests , BeautifulSoup The task is to write a program to find all the classes for a given Website URL. In Beautiful Soup there is no in-built method to find all classes. Module needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This modu 2 min read BeautifulSoup - Search by text inside a tagPrerequisites: Beautifulsoup Beautifulsoup is a powerful python module used for web scraping. This article discusses how a specific text can be searched inside a given tag. INTRODUCTION: BeautifulSoup is a Python library for parsing HTML and XML documents. It provides a simple and intuitive API for 4 min read Scrape Google Search Results using Python BeautifulSoupIn this article, we are going to see how to Scrape Google Search Results using Python BeautifulSoup. Module Needed:bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the te 3 min read Get tag name using Beautifulsoup in PythonPrerequisite: Beautifulsoup Installation Name property is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. Name object corresponds to the name of an XML or HTML t 1 min read Extracting an attribute value with beautifulsoup in PythonPrerequisite: Beautifulsoup Installation Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the 2 min read BeautifulSoup - Modifying the treePrerequisites: BeautifulSoup Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to m 5 min read Find the text of the given tag using BeautifulSoupWeb scraping is a process of using software bots called web scrapers in extracting information from HTML or XML content of a web page. Beautiful Soup is a library used for scraping data through python. Beautiful Soup works along with a parser to provide iteration, searching, and modifying the conten 2 min read Remove spaces from a string in PythonRemoving spaces from a string is a common task in Python that can be solved in multiple ways. For example, if we have a string like " g f g ", we might want the output to be "gfg" by removing all the spaces. Let's look at different methods to do so:Using replace() methodTo remove all spaces from a s 2 min read Understanding Character EncodingEver imagined how a computer is able to understand and display what you have written? Ever wondered what a UTF-8 or UTF-16 meant when you were going through some configurations? Just think about how "HeLLo WorlD" should be interpreted by a computer. We all know that a computer stores data in bits an 6 min read ASCII Vs UNICODEOverview :Unicode and ASCII are the most popular character encoding standards that are currently being used all over the world. Unicode is the universal character encoding used to process, store and facilitate the interchange of text data in any language while ASCII is used for the representation of 3 min read HTML TablesHTML (HyperText Markup Language) is the standard markup language used to create and structure web pages. It defines the layout of a webpage using elements and tags, allowing for the display of text, images, links, and multimedia content. As the foundation of nearly all websites, HTML is used in over 10 min read Creating new HTML elements With Beautiful SoupHTML AttributesHTML Attributes are special words used within the opening tag of an HTML element. They provide additional information about HTML elements. HTML attributes are used to configure and adjust the element's behavior, appearance, or functionality in a variety of ways. Each attribute has a name and a value 8 min read BeautifulSoup - Append to the contents of tagPrerequisites: Beautifulsoup Beautifulsoup is a Python library used to extract the contents from the webpages. It is used in extracting the contents from HTML and XML structures. To use this library, we need to install it first. Here we are going to append the text to the existing contents of tag. W 2 min read Modifying HTML with BeautifulSoupHow to insert a new tag into a BeautifulSoup object?In this article, we will see how to insert a new tag into a BeautifulSoup object. See the below examples to get a better idea about the topic. Example: HTML_DOC :  """        <html>        <head>          <title> Table Data </title>        </he 5 min read How to declare a custom attribute in HTML ?In this article, we will learn how to declare a custom attribute in HTML. Attributes are extra information that provides for the HTML elements. There are lots of predefined attributes in HTML. When the predefined attributes do not make sense to store extra data, custom attributes allow users to crea 2 min read How to Remove tags using BeautifulSoup in Python?Prerequisite- Beautifulsoup module In this article, we are going to draft a python script that removes a tag from the tree and then completely destroys it and its contents. For this, decompose() method is used which comes built into the module. Syntax: Beautifulsoup.Tag.decompose() Tag.decompose() r 2 min read Remove all style, scripts, and HTML tags using BeautifulSoupPrerequisite: BeautifulSoup, Requests Beautiful Soup is a Python library for pulling data out of HTML and XML files. In this article, we are going to discuss how to remove all style, scripts, and HTML tags using beautiful soup. Required Modules: bs4: Beautiful Soup (bs4) is a python library primaril 2 min read BeautifulSoup - Remove the contents of tagIn this article, we are going to see how to remove the content tag from HTML using BeautifulSoup. BeautifulSoup is a python library used for extracting html and xml files. Modules needed: BeautifulSoup: Our primary module contains a method to access a webpage over HTTP. For installation run this com 2 min read HTML Cleaning and Entity Conversion | PythonThe very important and always ignored task on web is the cleaning of text. Whenever one thinks to parse HTML, embedded Javascript and CSS is always avoided. The users are only interested in tags and text present on the webserver. lxml installation - It is a Python binding for C libraries - libxslt a 3 min read Working with CSS selectors With Beautiful SoupCSS element SelectorThe element selector in CSS is used to select HTML elements that are required to be styled. In a selector declaration, there is the name of the HTML element and the CSS properties which are to be applied to that element is written inside the brackets {}. Syntax:element { \\ CSS property}Example 1: T 2 min read Find the text of the given tag using BeautifulSoupWeb scraping is a process of using software bots called web scrapers in extracting information from HTML or XML content of a web page. Beautiful Soup is a library used for scraping data through python. Beautiful Soup works along with a parser to provide iteration, searching, and modifying the conten 2 min read BeautifulSoup - Find tags by CSS class with CSS SelectorsPrerequisites: Beautifulsoup Beautifulsoup is a Python library used for web scraping. BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The 2 min read Handling cookies and sessions with BeautifulSoup Retrieving Cookies in PythonRetrieving cookies in Python can be done by the use of the Requests library. Requests library is one of the integral part of Python for making HTTP requests to a specified URL. The below codes show different approaches to do show: 1. By requesting a session: Python3 1== # import the requests library 1 min read How cookies are used in a website?What are cookies? Cookies are small files which are stored on a user's computer. They are used to hold a modest amount of data specific to a particular client and website and can be accessed either by the web server or by the client computer When cookies were invented, they were basically little doc 3 min read BeautifulSoup - Error HandlingWhen scraping data from websites, we often face different types of errors. Some are caused by incorrect URLs, server issues or incorrect usage of scraping libraries like requests and BeautifulSoup. In this tutorial, weâll explore some common exceptions encountered during web scraping and how to hand 3 min read Like