Scraping a Table on https site using R
Last Updated :
28 Apr, 2025
In this article, we have discussed the basics of web scraping using the R language and the rvest and tidyverse libraries. We have shown how to extract tables from a website and how to manipulate the resulting data. The examples in this article should provide a good starting point for anyone looking to scrape tables from websites. The following are the key concepts related to scraping tables in R:
- Web scraping with R: R provides various libraries such as rvest and XML that can be used to extract data from websites.
- Reading HTML: R can read HTML pages, and these pages can be parsed to extract the data we are interested in.
- Selectors: To extract data from a website, we need to know the HTML structure of the page. Selectors in R allow us to select elements from the HTML page using CSS selectors or XPath.
- Parsing HTML: After selecting the elements of interest, the next step is to parse the HTML content and extract the data.
Before we start scraping tables, the following prerequisites must be met, R should be installed on the system. The rvest library must be installed in R. If it's not installed, it can be installed by running the following command in the R console:
install.packages("rvest")
Scraping a Table from a Static Website
In this example, we use the read_html function to read the HTML content of the website. Then we use the html_nodes function to select the table using a CSS selector. Finally, we extract the table content using the html_table function and print the first six rows of the table.
R
library(rvest)
# Read the HTML content of the website
webpage <- read_html("https://en.wikipedia.org/wiki/\
List_of_countries_by_GDP_(PPP)_per_capita")
# Select the table using CSS selector
table_node <- html_nodes(webpage, "table")
# Extract the table content
table_content <- html_table(table_node)[[2]]
# Print the table
head(table_content)
Output:
Scraping a Table from a Dynamic Website
Scraping a table from a dynamic website, which is generated using JavaScript. In this example, the rvest library is used to read the HTML code of the webpage and extract the table. The html_nodes function is used to select the first table on the page, and the html_table function is used to convert the HTML code into a DataFrame. Finally, the first few rows of the data frame are displayed using the head function.
R
library(rvest)
library(tidyverse)
# URL of the website
url <- "https://www.worldometers.info/world-population/\
population-by-country/"
# Read the HTML code of the page
html_code <- read_html(url)
# Use the html_nodes function to extract the table
table_html <- html_code %>% html_nodes("table") %>% .[[1]]
# Use the html_table function to convert the table
# HTML code into a data frame
table_df <- table_html %>% html_table()
# Inspect the first few rows of the data frame
head(table_df)
Output:
Similar Reads
Scrape an HTML Table Using rvest in R Web scraping is a technique used to extract data from websites. In R, the rvest package is a popular tool for web scraping. It's easy to use and works well with most websites. This article helps you to process of scraping an HTML table using rvest.What is rvest?rvest is an R package that simplifies
6 min read
Web Scraping Tables with Selenium and Python Selenium is the automation software testing tool that obtains the website, performs various actions, or obtains the data from the website. It was chiefly developed for easing the testing work by automating web applications. Nowadays, apart from being used for testing, it can also be used for making
4 min read
Scraping data in network traffic using Python In this article, we will learn how to scrap data in network traffic using Python. Modules Neededselenium: Selenium is a portable framework for controlling web browser.time: This module provides various time-related functions.json: This module is required to work with JSON data.browsermobproxy: This
5 min read
Web Scraping using R Language Web scraping is a technique that allows us to automatically extract information from websites, in situations where the data we need isnât available through downloadable datasets or public APIs (Application Programming Interfaces). Instead of manually copying and pasting content, web scraping uses co
4 min read
Web Scraping R Data From JSON Many websites provide their data in JSON format to be used. This data can be used by us for analysis in R. Â JSON (JavaScript Object Notation) is a text-based format for representing structured data based on JavaScript object syntax. In this article, we will see how to scrap data from a JSON web sour
4 min read
Spoofing IP address when web scraping using Python In this article, we are going to scrap a website using Requests by rotating proxies in Python. Modules RequiredRequests module allows you to send HTTP requests and returns a response with all the data such as status, page content, etc. Syntax:Â requests.get(url, parameter)Â JSON JavaScript Object No
3 min read