Spring Batch - Data Transformation with ItemProcessors
Last Updated :
24 Apr, 2025
In Spring Batch, processors play an important role in the processing phase of a batch job. Simply put, the processor in Spring Batch is like an intermediary that receives an item (data) from a reader, does some processing on it, and then sends it to the writer. The Processor in Spring Batch is represented by the ItemProcessor interface. This interface has a form:
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
where,
- "I": It is the type of input item read by the reader.
- "O": It is the type of output item that will be passed to the writer.
- When configuring a Spring Batch job, you can define a processor by implementing the ItemProcessor interface. For example:
public class MyItemProcessor implements ItemProcessor<InputType, OutputType> {
@Override
public OutputType process(InputType item) throws Exception {
// Process the input item and return the processed item
// This method is where the business logic for processing takes place
// You can transform, filter, or modify the input item here
return processedItem;
}
}
In the Spring Batch job configuration, you can then wire this processor into your step:
Java
@Bean
public Step myStep(ItemReader<InputType> reader,
ItemProcessor<InputType, OutputType> processor,
ItemWriter<OutputType> writer) {
return stepBuilderFactory.get("myStep")
.<InputType, OutputType>chunk(10)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
Where, processor is the instance of your custom ItemProcessor implementation.
Example:
Let's simplify the concept of processors in Spring Batch using a real-life analogy with the example of a content processing system for a website like GeeksforGeeks.Imagine GeeksforGeeks needs to update and process the content on its website regularly. They have a massive database of programming tutorials written in different formats, and they want to standardize the content before publishing it on the website. This is where the content processing system comes into play.
- Reader (Content Retriever): The system retrieves content from the GeeksforGeeks database. Each piece of content represents a programming tutorial in various languages (Python, Java, Ruby, C, C++, etc.).
- Processor (Content Processor): The processor is like the team of editors and reviewers who ensure that the content follows a standardized format and meets certain quality criteria before it goes live on the website.
In the context of Spring Batch, the ItemProcessor is similar to the content processing logic. For example, it could check for consistency in code formatting, add standardized headers, or perform language-specific adjustments. - Real-life analogy: If the tutorial content has code snippets, the processor might ensure that all code follows a consistent style guide and includes necessary comments.
public class CodeFormattingProcessor implements ItemProcessor<Tutorial, Tutorial> {
@Override
public Tutorial process(Tutorial tutorial) throws Exception {
// Check and standardize code formatting for the tutorial
tutorial.setCode(CodeFormatter.format(tutorial.getCode()));
return tutorial;
}
}
- Writer (Content Publisher): The writer is responsible for publishing the processed content to the website. In our analogy, this corresponds to updating the GeeksforGeeks database with the standardized content.
Real-life analogy: After the content processor has ensured consistent code formatting, the writer updates the database with the processed tutorial.
public class DatabaseWriter implements ItemWriter<Tutorial> {
@Override
public void write(List<? extends Tutorial> tutorials) throws Exception {
// Update the GeeksforGeeks database with the processed tutorials
tutorialDatabaseService.updateTutorials(tutorials);
}
}
By using Spring Batch, GeeksforGeeks can efficiently automate this content processing system, ensuring that all programming tutorials meet certain quality standards before being published on their website. The processor component, represented by the ItemProcessor, allows for the customization and standardization of content processing logic.
Advantages of Data Transformation with ItemProcessors in Spring Batch
- Modularity: Breaks the task into clear steps for better organization.
- Reusability: Creates tools that can be used again for different tasks, saving time.
- Scalability: Speeds up tasks by dividing the work among many helpers.
- Error Handling: Acts like a safety net, catching and dealing with unexpected issues.
- Complex Transformations: Centralizes intricate changes, simplifying the process.
- Integration: Easily connects with other tools or services for versatility.
- Testing and Debugging: Makes it simple to check and fix each part independently.
Data Transformation with ItemProcessors in Spring Batch
In Spring Batch, the ItemProcessor plays a crucial role in transforming data during batch processing. It allows you to apply custom logic to modify or enrich the data read by the ItemReader before it is written by the ItemWriter. Let's extend the example of a content processing system for GeeksforGeeks with additional attributes and provide a guide on how to perform data transformation using ItemProcessors.Below steps to be followed. Let's start from the beginning by creating a Spring Boot project and adding the necessary dependencies. For this example, I'll use Maven as the build tool.
Step 1: Create a Spring Boot Project
- Go to website Spring Initializr
- Set the following configurations:
- Project: Maven Project
- Language: Java
- Spring Boot: Latest stable version
- Group: Your desired group name, e.g. com.geeksforgeeks
- Artifact: Your desired artifact name, e.g. content-processor
- Dependencies:
- Spring Batch
- Spring Web
- Lombok
- Click on the "Generate" button to download the project zip file.
Step 2: Extract and Import into IDE
Extract the downloaded zip file and import the project into your preferred IDE (Eclipse, IntelliJ, etc.).
Step 3: Add Additional Dependencies
Open the pom.xml file in your project and add the necessary dependencies. For this example, we'll use H2 database for simplicity. If you are using a different database, adjust the dependencies accordingly. Below is the full pom.xml file configuration.
XML
<?xml version="1.0" encoding="UTF-8"?>
<!-- Maven Project Object Model (POM) file for the GeeksforGeeksContentProcessor Spring Boot App -->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- Specify the Maven version and POM format -->
<modelVersion>4.0.0</modelVersion>
<!-- Parent POM for Spring Boot projects -->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.6.1</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<!-- Project information -->
<groupId>com.geeksforgeeks</groupId>
<artifactId>GeeksforGeeksContentProcessor</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>GeeksforGeeksContentProcessor</name>
<description>RESTful API for GeeksforGeeks Content Processor Spring Boot App</description>
<!-- Project properties -->
<properties>
<java.version>8</java.version> <!-- Java version for the project -->
</properties>
<!-- Project dependencies -->
<dependencies>
<!-- Spring Boot starter for Spring Data JPA -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<!-- Spring Boot starter for Spring Batch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<!-- Spring Boot starter for building web applications -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- H2 Database as a runtime dependency -->
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Spring Boot devtools for development -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<!-- Lombok for simplified Java code -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<!-- Spring Boot starter for testing -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- Spring Batch testing support -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<!-- Maven Build Configuration -->
<build>
<plugins>
<!-- Spring Boot Maven Plugin -->
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<!-- Exclude Lombok from Spring Boot plugin -->
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
Step 4: Create a ProgrammingTutorial Class
Create the ProgrammingTutorial class with the additional attributes as per the requirement.
Java
package com.geeksforgeeks.model;
import java.util.Date;
import jakarta.persistence.Column;
import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;
/**
* Represents a programming tutorial entity.
*
* This class is annotated with JPA annotations for entity mapping. Lombok
* annotations are used to generate getters, setters, and constructors.
* Hibernate annotations are used to handle timestamp creation and updates.
*
* @author rahul.chauhan
*/
@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ProgrammingTutorial {
/**
* Unique identifier for the tutorial.
*/
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
/**
* Title of the programming tutorial.
*/
@Column
private String title;
/**
* Programming language covered in the tutorial.
*/
@Column
private String language;
/**
* Content of the programming tutorial.
*/
@Column
private String content;
/**
* Author of the programming tutorial.
*/
@Column
private String author;
/**
* Timestamp representing the creation time of the tutorial. Automatically
* populated by Hibernate.
*/
@CreationTimestamp
private Date createTime;
/**
* Timestamp representing the last update time of the tutorial. Automatically
* updated by Hibernate.
*/
@UpdateTimestamp
private Date lastUpdateTime;
}
Step 5: Create a TutorialRepository
Create a simple repository interface for accessing the database.
Java
package com.geeksforgeeks.repository;
import org.springframework.data.jpa.repository.JpaRepository;
import com.geeksforgeeks.model.ProgrammingTutorial;
public interface TutorialRepository extends JpaRepository<ProgrammingTutorial, Long> {
}
Step 6: Configure Application Properties
In your application.properties file, configure the H2 database and other Spring Batch properties.
# DataSource settings
spring.datasource.url=jdbc:h2:mem:testdb
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=testUser
spring.datasource.password=password
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
# H2 Console settings
spring.h2.console.enabled=true
spring.h2.console.path=/h2-console
# Hibernate settings
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
# Server port
server.port=8080
Step 7: Implement ItemReader , ItemWriter , ItemProcesor and Batch config class
Java
/*
BatchConfiguration.java
*/
package com.geeksforgeeks.batch;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import com.geeksforgeeks.model.ProgrammingTutorial;
@Configuration
@EnableBatchProcessing
@EnableScheduling
public class BatchConfiguration {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final JobLauncher jobLauncher;
private final JobCompletionNotificationListener notificationListener;
private final TutorialItemReader tutorialItemReader;
private final TutorialItemProcessor tutorialItemProcessor;
private final TutorialItemWriter tutorialItemWriter;
@Autowired
public BatchConfiguration(
JobBuilderFactory jobBuilderFactory,
StepBuilderFactory stepBuilderFactory,
JobLauncher jobLauncher,
JobCompletionNotificationListener notificationListener,
TutorialItemReader tutorialItemReader,
TutorialItemProcessor tutorialItemProcessor,
TutorialItemWriter tutorialItemWriter
) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
this.jobLauncher = jobLauncher;
this.notificationListener = notificationListener;
this.tutorialItemReader = tutorialItemReader;
this.tutorialItemProcessor = tutorialItemProcessor;
this.tutorialItemWriter = tutorialItemWriter;
}
@Bean
public Job processContentJob() {
return jobBuilderFactory.get("processContentJob")
.incrementer(new RunIdIncrementer())
.listener(notificationListener)
.flow(processContentStep())
.end()
.build();
}
@Bean
public Step processContentStep() {
return stepBuilderFactory.get("processContentStep")
.<ProgrammingTutorial, ProgrammingTutorial>chunk(10)
.reader(tutorialItemReader)
.processor(tutorialItemProcessor)
.writer(tutorialItemWriter)
.build();
}
}
Java
/*
JobCompletionNotificationListener.java
*/
package com.geeksforgeeks.batch;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.stereotype.Component;
@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {
@Override
public void afterJob(JobExecution jobExecution) {
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
System.out.println("Batch Job Completed Successfully! Time to verify the results.");
}
}
}
Java
/*
TutorialItemProcessor.java
*/
package com.geeksforgeeks.batch;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
@Component
public class TutorialItemProcessor implements ItemProcessor<ProgrammingTutorial, ProgrammingTutorial> {
@Override
public ProgrammingTutorial process(ProgrammingTutorial tutorial) throws Exception {
// Your transformation logic here
tutorial.setTitle("Transformed: " + tutorial.getTitle());
tutorial.setContent(transformContent(tutorial.getContent()));
return tutorial;
}
private String transformContent(String content) {
// Your content transformation logic here
// For example, perform language-specific adjustments
return content.toUpperCase();
}
}
Java
/*
TutorialItemReader.java
*/
package com.geeksforgeeks.batch;
import java.util.Iterator;
import java.util.List;
import org.springframework.batch.item.ItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
@Component
public class TutorialItemReader implements ItemReader<ProgrammingTutorial> {
@Autowired
private TutorialRepository tutorialRepository;
private Iterator<ProgrammingTutorial> tutorialIterator;
@Override
public ProgrammingTutorial read() throws Exception {
if (tutorialIterator == null || !tutorialIterator.hasNext()) {
initializeIterator();
}
return tutorialIterator.hasNext() ? tutorialIterator.next() : null;
}
private void initializeIterator() {
List<ProgrammingTutorial> tutorials = tutorialRepository.findAll();
tutorialIterator = tutorials.iterator();
}
}
Java
/*
TutorialItemWriter.java
*/
package com.geeksforgeeks.batch;
import java.util.List;
import org.springframework.batch.item.ItemWriter;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import com.geeksforgeeks.model.ProgrammingTutorial;
import com.geeksforgeeks.repository.TutorialRepository;
@Component
public class TutorialItemWriter implements ItemWriter<ProgrammingTutorial> {
@Autowired
private TutorialRepository tutorialRepository;
@Override
public void write(List<? extends ProgrammingTutorial> tutorials) throws Exception {
tutorialRepository.saveAll(tutorials);
}
}
Step 8: Create ContentProcessingController
Java
package com.geeksforgeeks.controller;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/api/content")
public class ContentProcessingController {
private final JobLauncher jobLauncher;
private final Job processContentJob;
public ContentProcessingController(JobLauncher jobLauncher, Job processContentJob) {
this.processContentJob = processContentJob;
this.jobLauncher = jobLauncher;
}
@PostMapping("/process")
public ResponseEntity<String> processContent() {
try {
JobParameters jobParameters = new JobParametersBuilder()
.addString("jobParam1", String.valueOf(System.currentTimeMillis())).toJobParameters();
JobExecution jobExecution = jobLauncher.run(processContentJob, jobParameters);
return ResponseEntity.ok("Content processing job initiated successfully. Job ID: " + jobExecution.getId());
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("Error initiating content processing job: " + e.getMessage());
}
}
}
Below is the project structure of created spring Boot application:

Run the Spring Boot Application
Testing Spring Boot Application
Use Postman to Trigger the Batch Job: Open Postman and create a new request:
- Method: POST
- URL: http://localhost:8080/api/content/process
OR just paste the below CURL into the postman request area:
curl -X POST -H "Content-Type: application/json" -d '{
"title": "Sample Tutorial",
"language": "Java",
"content": "This is a sample tutorial content.",
"author": "Rahul Dravid"
}' http://localhost:8080/api/tutorials
Send the request, and you should receive a response indicating that the content processing job has been initiated. See the below image for reference:
It is visible from the attached image the message showing as "Content processing job initiated successfully. Job ID: 3".
Conclusion
In conclusion, creating a Spring Batch application for processing programming tutorials involves configuring entity classes, implementing ItemProcessor, ItemReader, and ItemWriter components, setting up batch jobs, and creating endpoints for job initiation. The application can be tested using Postman, and the entire process is designed to streamline the batch processing of data, ensuring consistency and efficiency in handling large datasets. Monitoring and debugging tools, along with additional enhancements, can be employed to refine and optimize the application for specific use cases. Overall, Spring Batch simplifies the development of robust and scalable batch processing systems within a Spring Boot application.
Similar Reads
Batch Processing With Spring Cloud Data Flow
the Spring Cloud Data Flow is an open-source architectural component, that uses other well-known Java-based technologies to create streaming and batch data processing pipelines. The definition of batch processing is the uninterrupted, interaction-free processing of a finite amount of data. Component
3 min read
Spring Boot Batch Processing Using Spring Data JPA to CSV File
The Spring Batch is a framework in the Spring Boot ecosystem It can provide a lot of functionalities for Batch processing. The Spring Batch framework simplifies the batch development of applications by providing reliable components and other patterns for common batch processing concerns. Mostly, bat
7 min read
Data Transfer Object (DTO) in Spring MVC with Example
In Spring Framework, Data Transfer Object (DTO) is an object that carries data between processes. When you're working with a remote interface, each call is expensive. As a result, you need to reduce the number of calls. The solution is to create a Data Transfer Object that can hold all the data for
7 min read
Enabling Transaction Locks in Spring Data JPA
Transaction locks in Spring Data JPA can help manage concurrent data access. It can ensure the data consistency. By controlling how transactions acquire the locks on the database rows or tables, we can prevent issues such as lost updates and ensure that the application maintains data integrity. This
7 min read
Routing and Request Transformation in API Gateways in Spring Cloud Microservices
API gateways play a crucial role in modern microservices architectures by serving as the centralized entry point for client requests. They can handle the routing requests to the appropriate microservices and it can often involve the request transformation to adapt the client requests to the specific
10 min read
Spring Boot â Integrate with Apache Kafka for Streaming
Apache Kafka is a widely used distributed streaming platform that enables the development of scalable, fault-tolerant, and high-throughput applications. In this article, we'll walk you through the process of integrating Kafka with a Spring Boot application, providing detailed code examples and expla
7 min read
Programmatic Transaction Management in Spring
Programmatic Transaction Management in Spring provides a more flexible and customizable approach compared to declarative transaction management. Instead of using annotations or XML configurations, programmatic transaction management involves managing transactions explicitly in the code. This approac
6 min read
Spring Cloud - Tracing Services with Zipkin
In Spring Boot, Spring Cloud can provide various features, such as distributed tracing, through its integration with Zipkin, and it is an open-source distributed tracing system. Using Zipkin, you can trace the services of the spring cloud environments. Key Terminologies:Trace: It can be used to repr
7 min read
Spring WebFlux REST Application Integration with Spring Data R2DBC
Spring WebFlux is the framework from the Spring ecosystem that supports reactive programming for building asynchronous and non-blocking web applications. Unlike the traditional Spring MVC, which can use the blocking I/O. WebFlux can be designed to handle large volumes of requests using fewer resourc
4 min read
Spring Batch Example - Read and Process CSV File to MySQL Database
GeekMart is a multinational technology company that focuses on the e-commerce business. It has a large number of suppliers who supply huge volumes of products to them. Suppliers share the product details in CSV file format. Help GeekMart to read the product details from the CSV file and store it in
7 min read