How to extract text from a web page using Selenium java and save it as a text file?
Last Updated :
08 Oct, 2024
Extracting text from a web page using Selenium in Java is a common requirement in web automation and scraping tasks. Selenium, a popular browser automation tool, allows developers to interact with web elements and retrieve data from a webpage.
In this article, we will explore how to extract text from a web page and save it to a text file using Selenium Java, ensuring that the extracted data is stored efficiently for future use.
Example to extract text from a web page using Selenium Java and save it as a text file
step-by-step example demonstrating how to extract text from a web page using Selenium java and save it into a text file.
1. Add selenium Dependencies to your pom.xml file
XML
<dependencies>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.25.0</version> <!-- You can replace this with the latest version -->
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.7</version>
</dependency>
<!-- WebDriver Manager for managing browser drivers automatically -->
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>5.5.1</version>
</dependency>
</dependencies>
2.Create a Java class
Create a Java class to extract web element and import necessary selenium into the class.
WebTextExtractor.java
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
public class WebTextExtractor {
public static void main(String[] args) {
// Set the path for the ChromeDriver
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
// Initialize WebDriver
WebDriver driver = new ChromeDriver();
try {
// Navigate to the target web page
driver.get("https://example.com");
// Locate the element containing the text
WebElement element = driver.findElement(By.tagName("body")); // Adjust selector as needed
// Extract text from the element
String pageText = element.getText();
// Save the extracted text to a file
saveTextToFile("output.txt", pageText);
System.out.println("Text extracted and saved successfully.");
} catch (Exception e) {
e.printStackTrace();
} finally {
// Close the browser
driver.quit();
}
}
// Method to save text to a file
private static void saveTextToFile(String filename, String text) {
try (BufferedWriter writer = new BufferedWriter(new FileWriter(filename))) {
writer.write(text);
} catch (IOException e) {
System.err.println("Error writing to file: " + e.getMessage());
}
}
}
Output
- The code will execute, extract web content, and save it as a text file.
- Output of Folder Structure which Output.txt file created
Extracted text outputConclusion
In conclusion, extracting text from a web page using Selenium Java is a straightforward process that involves locating the target elements, retrieving the text, and saving it to a file. This method is particularly useful for web scraping, automated data collection, and content analysis.
By using Selenium WebDriver and Java, you can automate the extraction process and easily store the results for further processing or analysis.
Similar Reads
How to Save a Web Page with Selenium using Java? Selenium is widely known for automating browser interactions, but it can also be used to save web pages directly. This capability is particularly useful when you need to archive web content, save dynamic pages that change frequently, or scrape and store HTML for later analysis. In this tutorial, weâ
3 min read
How to get text from the alert box in java Selenium Webdriver? In Automation testing, we can capture the text from the alert message in Selenium WebDriver with the help of the Alert interface. By default, the webdriver object has control over the main page, once an alert pop-up gets generated, we have to shift the WebDriver focus from the main page to the alert
2 min read
How to Get All Available Links on the Page using Selenium in Java? Selenium is an open-source Web-Automation tool that is used to automate web Browser Testing. The major advantage of using selenium is, that it supports all major web browsers and works on all major Operating Systems, and it supports writing scripts on various languages such as Java, Â JavaScript, C#
2 min read
How to Run Opera Driver in Selenium Using Java? Selenium is a well-known software used for software testing purposes. Selenium consists of three parts. One is Selenium IDE, one is Selenium Webdriver & the last one is Selenium Grid. Among these Selenium Webdriver is the most important one. Using Webdriver online website testing can be done. Th
3 min read
How to download File in Selenium Using java Automating the file download process is essential in web automation testing, especially for validating functionalities involving document downloads such as PDFs, images, or CSV files. However, Selenium WebDriver doesnât directly handle file downloads. To overcome this limitation, we can configure th
2 min read