How to scrape the web with Playwright in Python
Last Updated :
06 Sep, 2022
In this article, we will discuss about Playwright framework, Its feature, the advantages of Playwright, and the Scraping of a basic webpage.
The playwright is a framework for Web Testing and Automation. It is a fairly new web testing tool from Microsoft introduced to let users automate webpages more efficiently with fewer initial requirements as compared to the already existing tool Selenium. Although Playwright is significantly better than Selenium in terms of speed, usability, and reliability, It allows testing Chromium, Firefox, and WebKit with a single API. The playwright is built to enable cross-browser web automation that is reliable, and fast.
Features of Playwright
- Headless execution.
- Auto wait for elements.
- Intercept network activity.
- Emulate mobile devices, geolocation, and permissions.
- Support web components via shadow piercing selectors.
- Capture video, screenshots, and HAR files.
- Contexts allow for isolated sessions.
- Parallel execution.
Advantages of Playwright
- Cross-browser executable
- Completely open source
- Well documentation
- Executes tests in parallel
- API testing
- Context isolation
- Python support
Creating a Python virtual environment
It is always advisable to work in a separate virtual environment specifically if you are using a particular library. Here, we are creating a virtual environment “venv” and activating it.
Creating virtual environment
virtualenv venv
Activating it
venv/Scripts/activate
Installing and setting up Playwright:
pip install playwright
playwright install
Automating and scraping data from a webpage
After installing the Playwright library, now it's time to write some code to automate a webpage. For this article, we will use quotes.toscrape.com.
Step 1: We will import some necessary packages and set up the main function.
Python3
from playwright.sync_api import sync_playwright
def main():
pass
if __name__ == '__main__':
main()
Step 2: Now we will write our codes in the ‘main’ function. This code will open the above webpage, wait for 10000 milliseconds, and then it will close the webpage.
Python3
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto('https://quotes.toscrape.com/')
page.wait_for_timeout(10000)
browser.close()
Step 3: This will select all boxes with the ‘author’ class with for loop, and we will iterate through each element and will extract the quote and its author name. It always makes recommended to use a Python dictionary to store different data fields with key and value pairs. After that, we are printing out the dictionary in the terminal.
Python3
all_quotes = page.query_selector_all('.quote')
for quote in all_quotes:
text = quote.query_selector('.text').inner_text()
author = quote.query_selector('.author').inner_text()
print({'Author': author, 'Quote': text})
page.wait_for_timeout(10000)
browser.close()
Code Implementation
Complete code to scrape quotes and their authors:
Python3
from playwright.sync_api import sync_playwright
def main():
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto('https://quotes.toscrape.com/')
all_quotes = page.query_selector_all('.quote')
for quote in all_quotes:
text = quote.query_selector('.text').inner_text()
author = quote.query_selector('.author').inner_text()
print({'Author': author, 'Quote': text})
page.wait_for_timeout(10000)
browser.close()
if __name__ == '__main__':
main()
Output :
Similar Reads
BeautifulSoup4 Module - Python BeautifulSoup4 is a user-friendly Python library designed for parsing HTML and XML documents. It simplifies the process of web scraping by allowing developers to effortlessly navigate, search and modify the parse tree of a webpage. With BeautifulSoup4, we can extract specific elements, attributes an
3 min read
Python Web Scraping Tutorial Web scraping is the process of extracting data from websites automatically. Python is widely used for web scraping because of its easy syntax and powerful libraries like BeautifulSoup, Scrapy, and Selenium. In this tutorial, you'll learn how to use these Python tools to scrape data from websites and
10 min read
How to Build Web scraping bot in Python In this article, we are going to see how to build a web scraping bot in Python. Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our
8 min read
Automated Website Scraping using Scrapy Scrapy is a Python framework for web scraping on a large scale. It provides with the tools we need to extract data from websites efficiently, processes it as we see fit, and store it in the structure and format we prefer. Zyte (formerly Scrapinghub), a web scraping development and services company,
5 min read
Scrapy - Command Line Tools Prerequisite: Implementing Web Scraping in Python with Scrapy Scrapy is a python library that is used for web scraping and searching the contents throughout the web. It uses Spiders which crawls throughout the page to find out the content specified in the selectors. Hence, it is a very handy tool to
5 min read
Python | Tools in the world of Web Scraping Web page scraping can be done using multiple tools or using different frameworks in Python. There are variety of options available for scraping data from a web page, each suiting different needs. First, let's understand the difference between web-scraping and web-crawling. Web crawling is used to in
4 min read