0% found this document useful (0 votes)
22 views

Technical Overview Fivetran Local Data Processing High Volume Real Time Data Replication

Uploaded by

oz.whitesell
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Technical Overview Fivetran Local Data Processing High Volume Real Time Data Replication

Uploaded by

oz.whitesell
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Technical Overview

Fivetran Local Data Processing:


High-volume, real-time data replication

High-volume data integration simplified

The Fivetran data movement platform can integrate all of your SaaS, file, event and
database systems. For enterprise log-based database replication, Fivetran Local Data
Processing (LDP) simplifies high-volume, real-time data movement. Our simple, yet
powerful data integration and synchronization solution reduces infrastructure costs while
allowing you to create new and enhanced services driven by data stored on-premises and
in the cloud.

With Local Data Processing, you can replace your batch processes with a variety of near
real-time data delivery and integration scenarios as well as enable consolidated, near real-
time analytics that improve your business insights. This whitepaper describes the
technologies and capabilities available, covering:

Use cases and topologies Integrated, all-in-one capabilities


Secure and efficient data transfer Broad platform support
Flexible architectures

During the first few decades of IT automation organizations implemented database-


based applications on-premises. Such implementations are still very common, despite
the popularity of cloud-based deployments. But individual applications are data silos,
and organizations recognize the value of consolidated access to data for analytics. With
the growth in data volumes, building consolidated, high performance and scalable data
analytics environments has become complex and very costly.

To address these challenges, enterprises are modernizing their infrastructure and


adding cloud data warehouses and data lakes to their data platforms to keep up with
growing data demands. This is creating new challenges for your data to move between
systems - seamlessly, automatically and securely.
Use cases and topologies

Local Data Processing is a data replication and validation solution that works across
complex, heterogeneous environments. It works with major on-premises and cloud
solutions, data lake solutions and data formats. This flexibility makes it ideal for a wide
range of data analytics, operational and transactional use cases. We support these use
cases through the following topologies:

Uni-directional
Move data in one direction, from source
to target. This is the most common
topology. Organizations use this topology
Consolidation
to feed a data warehouse or data lake,
A consolidation topology takes data from
and for data migrations projects.
multiple identical data stores and merges
it into a central data store, data
warehouse or data lake. Local Data
Processing gives you the option to add
columns and set metadata values to
enable uniqueness across multiple
identical sources.

Broadcast
To deliver data from production systems
into multiple destinations, for example a
data lake and a data warehouse, or simply
to distribute data, the broadcast topology
captures data from a single source and Cascading environment
delivers it to multiple targets. Local Data In this use case, Local Data Processing
Processing transfers the same data allows you to take data from one
everywhere or you can use filtering to source, put it in a centralized
distribute different data from the source operational data store, and then
database to different targets. broadcast it to several downstream
data warehouses or data marts.
Secure and efficient data transfer

As your data moves across your infrastructure, you need to ensure your data is secure
and protected. Local Data Processing offers robust security from source to target, in
transit and at rest.

LDP uses TLS 1.3 to secure data in transit between High Volume Agent (HVA) and the
LDP hub. Data encryption at-rest secures data stored on disk. In a distributed setup the
firewall only needs to be opened on a single port in a single direction, for the LDP hub
server to reach the HVA.

Local Data Processing improves the efficiency of data transport by capturing only
relevant changes from the source and then compressing the data as soon as it is
captured. Data always moves compressed between HVA and LDP hub, and is only
decompressed just before delivery into the target. To deliver data into a target, LDP
uses high-speed native technologies, including clustered file systems, staging tables and
other technology-appropriate options for maximum performance.

Most data warehouses are not architected to handle continuous updates from busy
transaction processing systems. LDP’s micro-batch (burst) approach provides an
optimized way to transfer changes in near real-time. The burst method first stages the
data. It then generates set-based SQL statements to merge changes into the target.

A flexible architecture

Fivetran’s technology is designed to fit into your architecture. With Local Data
Processing we recommend a distributed setup with High Volume Agent installation and
a central LDP hub for three reasons:

1. Performance: on the source the HVA identifies relevant changes with low latency
log-based Change Data Capture (CDC) before data is passed to the destination. All
data transfer between HVA and LDP is densely compressed and uses optimum
network communication.
2. Scalability: individual agents perform part of the work, offloading resource-intensive
processing from the hub.
3. Security: data transfer between HVA and LDP is always encrypted using TLS 1.3.
However, LDP also supports agentless connections to sources and destinations that
are an easier-to-configure alternative for environments requiring less performance or
scalability.
The LDP hub server manages all configurations, controls all processing and directs data
where it needs to go. In addition, the hub collects statistics and maintains metadata about
the Fivetran ecosystem.

The hub server is managed through a browser-based interface. It is used to define the
replication, schedule full and partial data loads, start/stop capture and apply jobs, as well
as perform data validation and repair. The hub server can be deployed anywhere in the
infrastructure—on the source, target or its own dedicated server. For most use cases with
more than two endpoints connected, the hub server is deployed on a separate middle tier
environment.

A complete data movement platform: An all-in-one, integrated solution

Fivetran Local Data Processing offers everything you need to set up and manage near
real-time data integration. You can integrate dozens of data storage platforms,
services and formats using one small download. The comprehensive data replication
and validation solution includes:

Centralized management/administrative console to configure, manage and


maintain data replication pipelines.
Target table creation and initial data load with automated table and data type
mapping, including support for automatic propagation of data definition changes
(DDL) in heterogeneous configurations.
Real-time, high-volume, log-based change data capture (CDC) on a multitude of
database technologies.
Cross platform data validation and repair between databases.
Rich monitoring and reporting.

Broad platform support

Bridging the gap between traditional relational database applications and modern
cloud-based data systems, Fivetran Local Data Processing supports an extensive range
of sources and targets, shown in the diagram below. LDP supports log-based Change
Data Capture on all supported source database technologies, including SAP HANA.

Due to the complexity of SAP ERP, the LDP technology provides unique capabilities to
streamline low-impact, log-based CDC from the enterprise application suite:
Table selection for SAP is through modules, based on table descriptions and with the
ability to see related tables based on the SAP dictionary.
Support for decoding of cluster, pool and long text tables in long-time SAP ECC
deployments that have not migrated to SAP HANA.
Full support for customer Z tables.
The ability to access the database through Netweaver instead of a direct database
connection, to allow for database runtime licenses.

LDP supports commonly-used cloud-based platforms including Snowflake, Google


BigQuery and Databricks, as well as numerous other technologies. An Agent Plugin
script or program can be used to deliver changes into a destination that is not natively
supported by LDP. Plugin examples are available on Github.

Getting started

For flexible, high-volume and near real-time data integration between databases and
cloud-based environments, look no further than Fivetran.

Reach out to [email protected] to learn more about our data movement platform.

You might also like