Python | Parse a website with regex and urllib Last Updated : 23 Jan, 2019 Summarize Comments Improve Suggest changes Share Like Article Like Report Let's discuss the concept of parsing using python. In python we have lot of modules but for parsing we only need urllib and re i.e regular expression. By using both of these libraries we can fetch the data on web pages. Note that parsing of websites means that fetch the whole source code and that we want to search using a given url link, it will give you the output as the bulk of HTML content that you can't understand. Let's see the demonstration with an explanation to let you understand more about parsing. Code #1: Libraries needed Python3 1== # importing libraries import urllib.request import urllib.parse import re Code #2: Python3 1== url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} We have defined a url and some related values that we want to search. Remember that we define values as a dictionary and in this key value pair we define python programming to search on the defined url. Code #3: Python3 1== data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() In the first line we encode the values that we have defined earlier, then (line 2) we encode the same data that is understand by machine. In 3rd line of code we request for values in the defined url, then use the module urlopen() to open the web document that HTML. In the last line read() will help read the document line by line and assign it to respData named variable. Code #4: Python3 1== paragraphs = re.findall(r'<p>(.*?)</p>', str(respData)) for eachP in paragraphs: print(eachP) In order to extract the relevant data we apply regular expression. Second argument must be type string and if we want to print the data we apply simple print function. Below are few examples: Example #1: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Example #2: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'pandas', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Comment More infoAdvertise with us Next Article Python | Parse a website with regex and urllib J jitender_1998 Follow Improve Article Tags : Python Web Technologies python-utility Practice Tags : python Similar Reads Python Tutorial - Learn Python Programming Language Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly. It'sA high-level language, used in web development, data science, automation, AI and more.Known fo 10 min read Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth 15+ min read Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p 11 min read JavaScript Tutorial JavaScript is a programming language used to create dynamic content for websites. It is a lightweight, cross-platform, and single-threaded programming language. It's an interpreted language that executes code line by line, providing more flexibility.JavaScript on Client Side: On the client side, Jav 11 min read Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list 10 min read Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test 9 min read Web Development Web development is the process of creating, building, and maintaining websites and web applications. It involves everything from web design to programming and database management. Web development is generally divided into three core areas: Frontend Development, Backend Development, and Full Stack De 5 min read Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co 11 min read React Interview Questions and Answers React is an efficient, flexible, and open-source JavaScript library that allows developers to create simple, fast, and scalable web applications. Jordan Walke, a software engineer who was working for Facebook, created React. Developers with a JavaScript background can easily develop web applications 15+ min read Python Introduction Python was created by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was designed with focus on code readability and its syntax allows us to express concepts in fewer lines of code.Key Features of PythonPythonâs simple and readable syntax makes it beginner-frien 3 min read Like