ADF - Realtime Problem Statement
ADF - Realtime Problem Statement
Architecture
Driver Program
The Driver Program is a process that runs the main() function of the application and
creates the SparkContext object.
The purpose of SparkContext is to coordinate the spark applications, running as
independent sets of processes on a cluster.
Graphical
Stock Data Integration & Report Stock Market
Source APIs Dashboard System Customer
Stock News
Level-1 DFD Dashboard
Reporting
UnSorted
Raw Stock Raw Stock
Data 1. Pull Data for Stock Data
Current 2. Sorting
Stock Data Today with relevant
Date & Take
Source APIs News
Stock Top(N)
Data Data
Sorted
Top(N)
3. Merge
Stock Data
the sorted
data with
Stock News relevant
Raw Stock news
News Data
ER Diagram
System Architecture
Data Factory Orchestration Sink
Source
Source Data
Stock API
Pipeline Sink
Source
Structured data
Generate Reports
Schedule
Trigger
s
Event Trigger
Runs on a calendar/
Clock
g slices
Trigger
Runs in response to
events
Features
Benefits from Data factory scheduling and
monitoring capabilities.
Data Flows
Type
s
Only available in some regions
https://docs.microsoft.com/en-us/azure/data-factory/concepts-
data-flow- overview#available-regions
Activities only run once the upstream dependency has been satisfied
Custom-made Solution