Open In App

How to Import XML into DataFrame using R

Last Updated : 07 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

A data frame is a two-dimensional, size-mutable, and heterogeneous data structure with labeled axes (rows and columns). It can be commonly used in the data analysis. Importing the data into the DataFrame is the crucial step in the data manipulation and analysis. DataFrames can be created from various data sources including CSV, JSON, and XML files.

XML Files in R

XML (Extensible Markup Language) is the markup language that can be used to encode documents in a format that is both human-readable and machine-readable. XML data can be structured hierarchically, making it necessary to process and flatten this structure when importing it into the DataFrame.

Using Various R Packages for Import

In R Language several packages can be used to import the XML data into the DataFrame. Most of the two popular packages are xml2 and XML.

  • xml2: The modern and straightforward package for parsing XML and HTML documents. It can provide the functions for reading the XML files and extracting data.
  • XML: The older package that can offers the wide range of the tools for parsing the XML and converting it into the R objects.

Now we will discuss step by step implementation of How to Import XML into DataFrame using R Programming Language.

Step 1: Install and Load the xml2 Package

Before using the xml2 package, we need to install it and load it into the R session. We can use the below commands for installing and loading the xml2 packages.

install.packages("xml2")
library(xml2)

Step 2: Create a XML File

We can extract the relevant data from the XML file and convert it into the DataFrame. Assume the XML file has the structure looks like this:

<data>
<record>
<name>Mahesh</name>
<age>23</age>
<city>India</city>
</record>
<record>
<name>Eswar</name>
<age>22</age>
<city>India</city>
</record>
</data>

Step 3: Read the XML file

We can use the read_xml functions to read the XML file into the R projects.

xml_file <- read_xml("path/to/your/example.xml")

First, we can extract the individuals record the nodes.

records <- xml_find_all(xml_file, "//record")

Next, we can extract the data from the each node.

names <- xml_text(xml_find_all(records, "name"))
ages <- xml_text(xml_find_all(records, "age"))
cities <- xml_text(xml_find_all(records, "city"))
  • xml_find_all(records, "name"): It can be used to find all the name elements with the each record.
  • xml_text: It can extract the text content from these elememts.
  • this process can be repeated for the age and city.

Finally, create the DataFrame from the extracted data:

df <- data.frame(
Name = names,
Age = as.integer(ages),
City = cities,
stringsAsFactors = FALSE
)
print(df)
  • data.frame: Creates the DataFrame with columns Name, Age and City.
  • as.integer(ages): It can be converts the age data to the integers.
  • stringAsFactors=False: It can be ensures that the character columns are not converted to the factors.

Practical Implementation to Import XML into DataFrame using R

First we create one XML file so copy this to a notepad and save as XML format.

XML
<data>
  <record>
    <name>Rahul Sharma</name>
    <age>28</age>
    <city>Mumbai</city>
  </record>
  <record>
    <name>Anjali Mehta</name>
    <age>32</age>
    <city>Delhi</city>
  </record>
  <record>
    <name>Vikram Singh</name>
    <age>45</age>
    <city>Bangalore</city>
  </record>
  <record>
    <name>Priya Verma</name>
    <age>26</age>
    <city>Hyderabad</city>
  </record>
  <record>
    <name>Rohit Gupta</name>
    <age>35</age>
    <city>Chennai</city>
  </record>
  <record>
    <name>Neha Agarwal</name>
    <age>29</age>
    <city>Pune</city>
  </record>
  <record>
    <name>Ajay Kumar</name>
    <age>40</age>
    <city>Kolkata</city>
  </record>
  <record>
    <name>Kavita Nair</name>
    <age>33</age>
    <city>Ahmedabad</city>
  </record>
  <record>
    <name>Sunil Patil</name>
    <age>38</age>
    <city>Surat</city>
  </record>
  <record>
    <name>Meera Joshi</name>
    <age>24</age>
    <city>Jaipur</city>
  </record>
</data>

Now we will Import XML into DataFrame using R.

R
# Install and load the xml2 package
install.packages("xml2")
library(xml2)

# Read the XML file
xml_file <- read_xml("C:\\Users\\GFG19565\\Downloads\\sample.xml")

# Extract nodes
records <- xml_find_all(xml_file, "//record")

# Extract data
names <- xml_text(xml_find_all(records, "name"))
ages <- xml_text(xml_find_all(records, "age"))
cities <- xml_text(xml_find_all(records, "city"))

# Create DataFrame
df <- data.frame(
  Name = names,
  Age = as.integer(ages),
  City = cities,
  stringsAsFactors = FALSE
)

# Print the DataFrame
print(df)

Output:

           Name Age      City
1 Rahul Sharma 28 Mumbai
2 Anjali Mehta 32 Delhi
3 Vikram Singh 45 Bangalore
4 Priya Verma 26 Hyderabad
5 Rohit Gupta 35 Chennai
6 Neha Agarwal 29 Pune
7 Ajay Kumar 40 Kolkata
8 Kavita Nair 33 Ahmedabad
9 Sunil Patil 38 Surat
10 Meera Joshi 24 Jaipur

Conclusion

Importing the XML data into a DataFrame can involves parsing the XML structure and converting it into the flat, tabular format. Using the R packages like xml2 and XML. We can efficiently extract the data from XML files and create the DataFrames suitable for the analysis. Handling the XML's hierarchical structure and it can ensuring the data type conversion are essential steps in this process. By the mastering these techniques, we can efficiently work with the XML data in the R Projects.


Next Article
Article Tags :

Similar Reads