Open In App

Spring Boot with Spring Batch

Last Updated : 25 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Spring Batch is a lightweight, robust framework used to develop batch processing applications. It provides reusable functionalities that are essential for processing large volumes of data, such as logging, transaction management, job processing, resource management, and scheduling. When integrated with Spring Boot, Spring Batch simplifies the development and execution of batch jobs, enabling developers to focus more on business logic.

Batch jobs typically involve processing large amounts of data, often following steps such as reading, processing, and writing. Spring Batch efficiently handles such jobs by breaking them down into steps that can be executed sequentially or in parallel.

Spring Batch

Spring Batch is built for efficient large-scale batch processing. A batch job is a task that processes large volumes of data without human intervention, often performed at scheduled intervals. Some common use cases of batch jobs include:

  • Processing large datasets
  • Migrating databases
  • Generating reports
  • ETL (Extract, Transform, Load) operations in data pipelines

Spring Batch provides pre-built functionalities such as transaction management, retries, error handling, and resource management, which are essential for any batch processing system. Below, we will explore the core concepts of Spring Batch and explain how it works.

1. Jobs, Steps, and Flow

Spring Batch processes data in a series of jobs. Each job consists of multiple steps, and each step follows a specific execution flow.

  • Job: A job is a collection of steps that execute in a particular sequence. It encapsulates the entire batch process.
  • Step: A step represents an individual phase of the job. Typically, a step involves reading data, processing it, and writing the results.
  • Flow: Steps can be executed sequentially or in parallel, depending on how the flow of the job is defined. Conditional flows can also be implemented, such as "Step 2 runs only if Step 1 succeeds."

Each step in Spring Batch is divided into three main phases: reading, processing, and writing.

2. Core Components of Spring Batch

Spring Batch operates using a well-defined structure to manage batch jobs. The key components are:

a. ItemReader

The ItemReader is responsible for reading data from a source. It is the first phase of a batch step and abstracts the input source. Common input sources include:

  • Files (CSV, XML, JSON, etc.)
  • Databases
  • Message queues
  • APIs

The ItemReader reads one item at a time and passes it to the processor. Once the input is exhausted, it returns null, signaling the end of the batch process.

Example of a StringReader:

public class StringReader implements ItemReader<String> {
private String[] data = {"Spring", "Batch", "Example"};
private int index = 0;

@Override
public String read() throws Exception {
if (index < data.length) {
return data[index++]; // Returns the next string in the array
} else {
return null; // Signals the end of reading
}
}
}

b. ItemProcessor

The ItemProcessor processes the data retrieved by the reader. This is where business logic is applied to transform the data.

For example, a processor can:

  • Filter out unwanted data
  • Transform the data (e.g., formatting, type conversion)
  • Perform calculations

Example of a StringProcessor that converts text to uppercase:

public class StringProcessor implements ItemProcessor<String, String> {
@Override
public String process(String item) throws Exception {
return item.toUpperCase(); // Converts each string to uppercase
}
}

c. ItemWriter

The ItemWriter writes the processed data to a target destination, such as a database, file, or message queue. The writer receives the transformed data from the processor and persists it.

Example of a ConsoleWriter that writes data to the console:

public class ConsoleWriter implements ItemWriter<String> {
@Override
public void write(List<? extends String> items) throws Exception {
for (String item : items) {
System.out.println(item); // Writes each processed item to the console
}
}
}

3. Chunk-Oriented Processing

Spring Batch employs chunk-oriented processing, where data is processed in chunks instead of as a large batch. A chunk is a set of data items that are read, processed, and written together in a single transaction.

In each step:

  1. Items are read and processed one by one, but only after the chunk size is reached are they committed in a single transaction.
  2. Once the chunk is filled, all items are processed and written as a batch.

This approach ensures efficient transaction management and improved performance.

Example of defining a chunk-oriented step:

stepBuilderFactory.get("step")
.<String, String>chunk(10) // Define a chunk size of 10 items
.reader(reader()) // Configure the ItemReader
.processor(processor()) // Configure the ItemProcessor
.writer(writer()) // Configure the ItemWriter
.build();

In this example:

  • Spring Batch will read 10 items.
  • Process those 10 items.
  • Write the 10 items in one transaction.

4. Job Repository and Metadata

The Job Repository is responsible for keeping track of job executions and their associated metadata. It records information such as:

  • JobInstance: The unique instance of a batch job. Each run with different parameters creates a new instance.
  • JobExecution: Represents an actual run of a job instance, storing details like start time, end time, status, and exit codes.
  • StepExecution: Tracks the execution details of each step within a job.

This repository allows Spring Batch to support features like:

  • Restartability: If a batch job fails, it can resume from the point of failure.
  • Monitoring: Developers can monitor the progress and completion status of batch jobs.

The Job Repository typically uses a relational database (such as HSQLDB, MySQL, or others) to store this metadata.

5. Transaction Management and Error Handling

Spring Batch ensures that each step in a job is transactionally secure. If a step fails, the changes made by that step can be rolled back to maintain data consistency. In chunk-oriented processing, the chunk size determines the scope of the transaction.

Error handling can be configured in Spring Batch using:

  • Retry Mechanism: Automatically retries a step or chunk if it fails.
  • Skip Logic: Skips items or steps that fail due to specific errors.
  • Listeners: Adds custom error-handling logic before or after the step execution.

Example of retry configuration:

.step("step")
.<String, String>chunk(10)
.reader(reader())
.processor(processor())
.writer(writer())
.faultTolerant()
.retry(Exception.class) // Retry on any exception
.retryLimit(3) // Retry up to 3 times
.build();

6. Scheduling Batch Jobs

Batch jobs are typically run on a scheduled basis. Spring Boot integrates easily with Spring’s scheduling features or external job schedulers like Quartz.

Using Spring Boot's @Scheduled annotation, we can trigger batch jobs at fixed intervals or using a cron expression.

Example of scheduling a batch job:

@EnableScheduling
public class BatchScheduler {

@Autowired
private JobLauncher jobLauncher; // Launches the job

@Autowired
private Job job; // The job to be scheduled

@Scheduled(cron = "0 0 12 * * ?") // Runs the job every day at noon
public void runJob() throws Exception {
JobParameters parameters = new JobParametersBuilder()
.addLong("time", System.currentTimeMillis()) // Add current time as a parameter
.toJobParameters();
jobLauncher.run(job, parameters); // Launch the job
}
}

Example Project: Spring Boot with Spring Batch

This project demonstrates how to create a simple batch processing job with Spring Batch. It reads data from a CSV file, processes it, and writes the results to a MySQL database.

Step 1: Create a New Spring Boot Project

Use IntelliJ IDEA to create a new Spring Boot project with the following options:

  • Name: spring-batch-example
  • Language: Java
  • Type: Maven
  • Packaging: Jar

Click on the Next button.

Project Metadata

Step 2: Add the Dependencies

Include the following dependencies in your pom.xml:

  • Spring Batch
  • Lombok
  • Spring Boot DevTools
  • Spring Data JPA
  • MySQL Driver

Click on the Create button.

Add Dependencies

Project Structure

After project creation done, the folder structure will be like below:

Project Folder Structure

Step 3: Configure Application Properties

In application.properties, configure your MySQL database, Hibernate, and Spring Batch settings.

# Application name
spring.application.name=spring-batch-example

# MySQL Database Configuration
spring.datasource.url=jdbc:mysql://localhost:3306/spring_batch_db?useSSL=false&serverTimezone=UTC
spring.datasource.username=root
spring.datasource.password=mypassword
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

# Hibernate properties
spring.jpa.hibernate.ddl-auto=update
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect

# Spring Batch settings
spring.batch.job.enabled=true
spring.batch.initialize-schema=always

# Other settings
server.port=8080

Step 4: Create CSV Data File

Add a data.csv file in the src/main/resources/ folder with the following content:

firstName,lastName
Mahesh ,Kadambala
Ravi,Teja
Lakshmi,Narayana
Praveen,Chowdary
Kiran ,Kumar
Saneep,Kumar
Akhil,Hero
Gautam,P
Madhavo ,Reddy
Suresh,Kumar
Ravi,Teja
Lakshmi,Narayana
Anusha,Reddy
Venkat,Rao
Praveen,Chowdary
Sowmya,Krishna
Kiran,Kumar
Manjula,Rao
Naveen,Prasad
Madhavi,Reddy
Srinivas,Rao
Ramya,Lakshmi
Venkatesh,Babu
Sujatha,Rani

Step 5: Create SQL Schema

In schema.sql, define the database table for storing data:

CREATE TABLE people (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255)
);

Step 6: Spring Batch Configuration

Create BatchConfig.java to configure Spring Batch with a reader, processor, and writer.

BatchConfig.java

Java
package com.gfg.springbatchexample;


import org.springframework.batch.core.*;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.support.transaction.ResourcelessTransactionManager;
import org.springframework.boot.CommandLineRunner;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.transaction.PlatformTransactionManager;

import javax.sql.DataSource;

@Configuration
public class BatchConfig {

    @Bean
    public FlatFileItemReader<Person> reader() {
        return new FlatFileItemReaderBuilder<Person>()
                .name("personItemReader")
                .resource(new ClassPathResource("data.csv"))
                .delimited()
                .names(new String[]{"firstName", "lastName"})
                .fieldSetMapper(fieldSet -> {
                    Person person = new Person();
                    person.setFirstName(fieldSet.readString("firstName"));
                    person.setLastName(fieldSet.readString("lastName"));
                    return person;
                })
                .build();
    }

    @Bean
    public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
        return new JdbcBatchItemWriterBuilder<Person>()
                .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
                .sql("INSERT INTO people (first_name, last_name) VALUES (:firstName, :lastName)")
                .dataSource(dataSource)
                .build();
    }


    @Bean
    public Step step(JobRepository jobRepository, PlatformTransactionManager transactionManager, FlatFileItemReader<Person> reader, JdbcBatchItemWriter<Person> writer) {
        return new StepBuilder("step", jobRepository)
                .<Person, Person>chunk(10, transactionManager) // You can still use chunk with JobRepository and TransactionManager
                .reader(reader)
                .writer(writer)
                .build();
    }

    @Bean
    public Job job(JobRepository jobRepository, Step step) {
        return new JobBuilder("importUserJob", jobRepository)
                .start(step)
                .build();
    }

    @Bean
    public CommandLineRunner runJob(JobLauncher jobLauncher, Job job) {
        return args -> {
            try {
                // Create unique JobParameters
                JobParameters jobParameters = new JobParametersBuilder()
                        .addLong("time", System.currentTimeMillis())  // unique parameter
                        .toJobParameters();

                jobLauncher.run(job, jobParameters);
                System.out.println("Batch job has been invoked.");
            } catch (JobExecutionException e) {
                System.err.println("Job failed: " + e.getMessage());
            }
        };
    }

    @Bean
    public PlatformTransactionManager transactionManager() {
        return new ResourcelessTransactionManager(); // For simple batch processing without needing a real database transaction manager.
    }
}

Step 7: Create the Person Model Class

Create a simple model class Person.java.

Person.java:

Java
package com.gfg.springbatchexample;

import jakarta.persistence.Entity;
import jakarta.persistence.GeneratedValue;
import jakarta.persistence.GenerationType;
import jakarta.persistence.Id;
import lombok.Getter;
import lombok.Setter;

@Setter
@Getter
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;


    // Getters and Setters
    private String firstName;
    private String lastName;


    public Person() {
    }


    public Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    @Override
    public String toString() {
        return "Person{" +
                "firstName='" + firstName + '\'' +
                ", lastName='" + lastName + '\'' +
                '}';
    }
}

Step 8: Create Repository Interface

Create PersonRepository.java to access the database.

Java
package com.gfg.springbatchexample;

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface PersonRepository extends JpaRepository<Person, Long> {
}

Step 9: Add Job Completion Listener

Create JobCompletionNotificationListener.java to log results after batch completion.

JobCompletionNotificationListener.java

Java
package com.gfg.springbatchexample;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.listener.JobExecutionListenerSupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Component;

@Component
public class JobCompletionNotificationListener extends JobExecutionListenerSupport {

    private static final Logger log = LoggerFactory.getLogger(JobCompletionNotificationListener.class);

    private final JdbcTemplate jdbcTemplate;

    @Autowired
    public JobCompletionNotificationListener(JdbcTemplate jdbcTemplate) {
        this.jdbcTemplate = jdbcTemplate;
    }

    @Override
    public void afterJob(JobExecution jobExecution) {
        if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
            log.info("Job finished, checking the results...");

            jdbcTemplate.query("SELECT first_name, last_name FROM people",
                    (rs, row) -> new Person(
                            rs.getString(1),
                            rs.getString(2))
            ).forEach(person -> log.info("Found <" + person + "> in the database."));
        }
    }
}

Step 10: Process Data

Create PersonItemProcessor.java to transform data.

PersonItemProcessor.java

Java
package com.gfg.springbatchexample;

import org.springframework.batch.item.ItemProcessor;

public class PersonItemProcessor implements ItemProcessor<Person, Person> {

    @Override
    public Person process(Person person) throws Exception {
        String firstName = person.getFirstName().toUpperCase();
        String lastName = person.getLastName().toUpperCase();

        return new Person(firstName, lastName);
    }
}

Step 11: Main class

In this main class, add the @EnableBatchProcessing annotation to enable the functionalities of the Spring Boot project. This is the entry point of the Spring Application.

Java
package com.gfg.springbatchexample;

import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
@EnableBatchProcessing
public class SpringBatchExampleApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringBatchExampleApplication.class, args);
    }

}

pom.xml File:

XML
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.4</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.gfg</groupId>
    <artifactId>spring-batch-example</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>spring-batch-example</name>
    <description>spring-batch-example</description>
    <url/>
    <licenses>
        <license/>
    </licenses>
    <developers>
        <developer/>
    </developers>
    <scm>
        <connection/>
        <developerConnection/>
        <tag/>
        <url/>
    </scm>
    <properties>
        <java.version>17</java.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jpa</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>com.mysql</groupId>
            <artifactId>mysql-connector-j</artifactId>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

Step 12: Run the application

Once the project is completed, now run the application, and it will start at the port 8080.

Application Runs
Console Logs

After the application starts, the csv data can insert into the database.

Person Table Data in Database

Person Table Data in Database

This project sets up a basic Spring Batch application that reads from a CSV, processes the data, and writes to a database.


Next Article

Similar Reads