0% found this document useful (0 votes)
70 views

Spring XD-based Architecture

Spring XD and Spring Cloud Dataflow reduce the overhead of big data engineering by providing frameworks for data ingestion, processing, analytics, and more. Spring XD builds on Spring technology and provides an architecture with modules like sources, processors, and sinks to create data pipelines for real-time and batch processing. Pivotal GemFire is a distributed data platform that pools resources across processes to manage data and behavior with high availability, performance, scalability and fault tolerance.

Uploaded by

Tarikh Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Spring XD-based Architecture

Spring XD and Spring Cloud Dataflow reduce the overhead of big data engineering by providing frameworks for data ingestion, processing, analytics, and more. Spring XD builds on Spring technology and provides an architecture with modules like sources, processors, and sinks to create data pipelines for real-time and batch processing. Pivotal GemFire is a distributed data platform that pools resources across processes to manage data and behavior with high availability, performance, scalability and fault tolerance.

Uploaded by

Tarikh Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Big Data Engineering is time-consuming and requires niche skills for

addressing data acquisition and data processing, as these aspects are


necessary for most solutions. Pivotal has introduced Spring XD and
Spring Cloud Dataflow to reduce the overhead in big data engineering.

Spring XD
The first round of innovation came in the form of Spring XD, which
provides a readily consumable solution for common tasks related to data
processing. Spring XD is built on top of proven Spring technology and
provides support for data ingestion, movement, processing, deep
analytics, stream processing, and batch processing.
Spring XD provides a sophisticated, stable, scalable framework for real-
time and batch processing. Picking up data and moving it from various
sources to targets is much easier with Spring XD.
Spring XD architecture has been widely adopted in traditional enterprise
ETL, real-time analytics, and creation of data science project workbenches.

Spring XD-based Architecture:


Spring XD-based architecture is depicted in the below diagram. With the
help of modules described below, we can create, run, deploy, and destroy
data pipelines and perform any kind of data processing on them.
The main components of SpringXD are Admin and Container.

1. Admin UI is used for sending a request to be processed to the


server, and the server processes the request with the relevant
module performing the task requested. Here, a module is a
component which creates the Spring application context.
2. All modules require an XD container to run and execute the
associated task performed by that module.

Following are the key modules in Spring XD architecture.


 Source: Creation of a stream always starts with a source module.
Source can use a polling mechanism or event driven mechanism
and only give an output.
 Processor: It takes input message and results the output message
after performing some type of processing on input.
 Sink: As the name suggests, this module terminates the stream and
sends the output to an external resource, e.g. HDFS.
 Job: This module performs the batch job.

Starting spring xd with xd-singlenode


Open a CMD
Go to: ./spring-xd-1.3.1.RELEASE\xd\bin
Run xd-singlenode

Build Spring Project, we have a jar file: SpringXDJob-0.0.1.jar, assume


we place jar file at:
C:\Users\Loi\SpringXDJob-0.0.1
Open a CMD
Go to: .\spring-xd-1.3.1.RELEASE\shell\bin
Run xd-shell
Pivotal GemFire
Pivotal GemFire is a data management platform that
provides real-time, consistent access to data-intensive
applications throughout widely distributed cloud
architectures. GemFire pools memory, CPU, network
resources, and optionally local disk across multiple processes
to manage application objects and behavior. It uses dynamic
replication and data partitioning techniques to implement
high availability, improved performance, scalability, and fault
tolerance. In addition to being a distributed data container,
GemFire is an in-memory data management system that
provides reliable asynchronous event notifications and
guaranteed message delivery.

You might also like