Spark Optimizations & Deployment
Spark Optimizations & Deployment
Spark optimizations & deployment
Big Data
1. Wide and Narrow transformations
Spark optimizations & deployment 2. Optimizations
3. Page Rank example
4. Deployment on clusters & clouds
1 2
Wide and Narrow transformations Wide and Narrow transformations
Narrow transformations Narrow transformations
• Local computations applied to each partition block • Local computations applied to each partition block
no communication between processes (or nodes) no communication between processes (or nodes)
only local dependencies (between parent & son RDDs) only local dependencies (between parent & son RDDs)
•Map() •Union() •Map() •Union()
•Filter() •Filter()
• In case of failure:
• In case of sequence of Narrow transformations: recompute only the damaged partition blocks
possible pipelining inside one step recompute/reload only its parent blocks
Lineage
3 4
Wide and Narrow transformations Wide and Narrow transformations
Wide transformations Wide transformations
• Computations requiring data from all parent RDD blocks • Computations requiring data from all parent RDD blocks
many comms between processes (and nodes) (shuffle & sort) many comms between processes (and nodes) (shuffle & sort)
non‐local dependencies (between parent & son RDDs) non‐local dependencies (between parent & son RDDs)
•groupByKey() •groupByKey()
•reduceByKey() •reduceByKey()
5 6
1
08/09/2021
Wide and Narrow transformations
Avoiding wide transformations with co‐partitioning Spark optimizations & deployment
• With identical partitioning of inputs:
wide transforma on → narrow transformation
1. Wide and Narrow transformations
2. Optimizations
• RDD Persistence
• RDD Co‐partitionning
• RDD controlled distribution
• Traffic minimization
Join with inputs Join with inputs
not co‐partitioned co‐partitioned • Maintaining parallelism
3. Page Rank example
• less expensive communications Control RDD partitioning 4. Deployment on clusters & clouds
• possible pipelining Force co‐partitioning
• less expensive fault tolerance (using the same partition map)
7 8
Optimizations: persistence Optimizations: persistence
Persistence of the RDD Persistence of the RDD to improve Spark application performances
RDD are stored: Spark application developper has to add instructions to force RDD
• in the memory space of the Spark Executors storage, and to force RDD forgetting:
• or on disk (of the node) when memory space of the Executor is full myRDD.persist(StorageLevel) // or myRDD.cache()
… // Transformations and Actions
By default: an old RDD is removed when memory space is required myRDD.unpersist()
(Least Recently Used policy)
Available storage levels:
An old RDD has to be re‐ • MEMORY_ONLY : in Spark Executor memory space
computed (using its lineage) • MEMORY_ONLY_SER : + serializing the RDD data
when needed again Lineage • MEMORY_AND_DISK : on local disk when no memory space
• MEMORY_AND_DISK_SER : + serializing the RDD data in memory
Spark allows to make a
« persistent » RDD to • DISK_ONLY : always on disk (and serialized)
avoid to recompute it Source : Stack Overflow RDD is saved in the Spark executor memory/disk space
limited to the Spark session
9 10
Optimizations: persistence
Persistence of the RDD to improve fault tolerance Spark optimizations & deployment
To face short term failures: Spark application developper can force
RDD storage with replication in the local memory/disk of several
Spark Executors 1. Wide and Narrow transformations
myRDD.persist(storageLevel.MEMORY_AND_DISK_SER_2) 2. Optimizations
… // Transformations and Actions
myRDD.unpersist() • RDD Persistence
• RDD Co‐partitionning
To face serious failures: Spark application developper can checkpoint • RDD controlled distribution
the RDD outside of the Spark data space, on HDFS or S3 or…
• Traffic minimization
myRDD.sparkContext.setCheckpointDir(directory)
myRDD.checkpoint() • Maintaining parallelism
… // Transformations and Actions 3. Page Rank example
Longer, but secure! 4. Deployment on clusters & clouds
11 12
2
08/09/2021
Optimizations: RDD co‐partitionning Optimizations: RDD co‐partitionning
5 main internal properties of a RDD: Specify a « partitioner »
• A list of partition blocks
getPartitions() val rdd2 = rdd1
.partitionBy(new HashPartitioner(100))
• A function for computing each partition block .persist()
To compute and
compute(…)
re‐compute the
• A list of dependencies on other RDDs: parent RDD when failure
Creates a new RDD (rdd2):
RDDs and transformations to apply happens
getDependencies()
• partitionned according to hash partitionner strategy
• on 100 Spark Executors
Optionally: To control the
• A Partitioner for key‐value RDDs: metadata Redistribute the RDD (rdd1 rdd2)
RDD partitioning,
specifying the RDD partitioning to achieve co‐
WIDE (expensive) transformation
partitioner() partitioning… • Do not keep the original partition (rdd1) in memory / on disk
• A list of nodes where each partition block To improve data • keep the new partition (rrd2) in memory / on disk
can be accessed faster due to data locality locality with to avoid to repeat a WIDE transformation when rdd2 is re‐used
getPreferredLocations(…) HDFS & YARN…
13 14
Optimizations: RDD co‐partitionning Optimizations: RDD co‐partitionning
Specify a « partitioner » Avoid repetitive WIDE transformations on large data sets
Repeated op.
val rdd2 = rdd1 Same
partitioner
.partitionBy(new HashPartitioner(100)) used on
.persist() Partitioner same set of
specified keys
Partitionners:
• Hash partitioner : B
Re‐partition Repeated op.
Key0, Key0+100, Key0+200… on one Spark Executor One time
Narrow
• Range partitioner : A A.join(B)
Wide Wide
[Key‐min ; Key‐max] on one Spark Executor • Make ONE Wide op (one time) to
avoid many Wide ops
• Custom partitioner (develop your own partitioner) :
• An explicit partitioning « propagates » B
Ex : Key = URL, hash partitioned to the transformation result
BUT : hash only the domain name of the URL
• Replace Wide op by Narrow op
all pages of the same domain on the same Spark A’.join(B)
• Do not re‐partition a RDD to use only A A’
Executor because they are frequently linked once! Wide Wide
15 16
Optimizations: RDD co‐partitionning Optimizations: RDD co‐partitionning
Co‐paritioning PageRank with partitioner (see further)
Repeated op.
Val links = …… // previous code
Use the same partitioner val links1 = links.partitionBy(new HashPartitioner(100)).persist()
Avoid to repeat Wide op.
var ranks = links1.mapValues(v => 1.0)
• Pb: flatMap{…urlinks.map(…)} can change the partitionning ?!
A A’ A’.join(B) B
Wide Narrow Narrow
17 18
3
08/09/2021
Optimization: RDD distribution
Spark optimizations & deployment Create and distribute a RDD
• By default: level of parallelism set by the nb of partition blocks
of the input RDD
1. Wide and Narrow transformations
• When the input is a in‐memory collection (list, array…), it needs
2. Optimizations to be parallelized:
• RDD Persistence val theData = List(("a",1), ("b",2), ("c",3),……)
• RDD Co‐partitionning sc.parallelize(theData).theTransformation(…)
• RDD controlled distribution Or :
• Traffic minimization val theData = List(1,2,3,……).par
theData.theTransformation(…)
• Maintaining parallelism
3. Page Rank example Spark adopts a distribution adapted to the cluster…
4. Deployment on clusters & clouds … but it can be tuned
19 20
Optimization: RDD distribution
Control of the RDD distribution Spark optimizations & deployment
• Most of transformations support an extra parameter to control
the distribution (and the parallelism)
1. Wide and Narrow transformations
• Example: 2. Optimizations
Default parallelism: • RDD Persistence
val theData = List(("a",1), ("b",2), ("c",3),……)
• RDD Co‐partitionning
sc.parallelize(theData).reduceByKey((x,y) => x+y)
• RDD controlled distribution
Tuned parallelism: • Traffic minimization
val theData = List(("a",1), ("b",2), ("c",3),……) • Maintaining parallelism
sc.parallelize(theData).reduceByKey((x,y) => x+y,8)
3. Page Rank example
8 partition blocks imposed for 4. Deployment on clusters & clouds
the result of the reduceByKey
21 22
Move almost all input data
shuffle
Huge trafic in the shuffle step !!
((x,y) => x+y):
1 int + 1 int 1 int
groupByKey will be time consumming: Limited trafic in the shuffle step shuffle
• no computation time…
• … but huge traffic on the network
of the cluster/cloud
But: ((x,y) => x+y): TD‐1
Optimize computations and communications 1 list + 1 list 1 longer list
in a Spark program
23 24
4
08/09/2021
Optimization: traffic minimization
RDD reduction with different input and reduced datatypes: Spark optimizations & deployment
Scala : rdd.aggregateByKey(init_acc)(
…, // mergeValueAccumulator fct
1. Wide and Narrow transformations
2. Optimizations
…, // mergeAccumulators fct • RDD Persistence
)
• RDD Co‐partitionning
• RDD controlled distribution
Scala : rdd.combineByKey( • Traffic minimization
…, // createAccumulator fct
• Maintaining parallelism
…, // mergeValueAccumulator fct 3. Page Rank example
…, // mergeAccumulators fct shuffle
4. Deployment on clusters & clouds
)
25 26
27 28
Optimization: maintaining parallelism
Computing an average value per key in parallel Spark optimizations & deployment
theMarks: {(‘’julie’’, 12), (‘’marc’’, 10), (‘’albert’’, 19), (‘’julie’’, 15), (‘’albert’’, 15),…}
29 30
5
08/09/2021
Contribution of page v
url 3 to the rank of page u
31 32
33 34
35 36
6
08/09/2021
37 38
Task DAG execution
Spark optimizations & deployment • A RDD is a dataset distributed among the Spark compute nodes
• Transformations are lazy operations: saved and executed further
1. Wide and Narrow transformations • Actions trigger the execution of the sequence of transformations
2. Optimizations A job is a sequence of RDD map
3. Page Rank example RDD transformations, mapValues
Transformation
4. Deployment on clusters & clouds ended by an action reduceByKey
• Task DAG execution RDD …
39 40
41 42
7
08/09/2021
Task DAG execution
Execution time as a function of the number of Spark executors Spark optimizations & deployment
Ex. of Spark application run: Spark pgm run on 1-15 nodes
• from 1 up to 15 executors 512 1. Wide and Narrow transformations
• with 1 executor per node
2. Optimizations
256
Good overall decrease but Exec Time(s) 3. Page Rank example
plateaus appear ! 128 4. Deployment on clusters & clouds
Probable load balancing
problem… 64 • Task DAG execution
• Spark execution on clusters
32
1 2 4 8 16 • Using the Spark cluster manager (standalone mode)
Ex: a graph of 4 parallel tasks Nb of nodes • Using YARN as cluster manager
• Using Mesos as cluster manager
on 1 on 2 on 3 • Ex of Spark execution on cloud
node: T nodes: T/2 nodes: T/2 A plateau appears
43 44
Spark cluster configuration: Spark cluster configuration:
• Add the list of cluster worker nodes in the Spark Master config. • Default config :
− (only) 1GB/Spark Executor
• Specify the maximum amount of memory per Spark Executor − Unlimited nb of CPU cores per application execution
spark-submit --executor-memory XX … − The Spark Master creates one mono‐core Executor on all
• Specify the total amount of CPU cores used to process one Worker nodes to process each job …
Spark application (through all its Spark executors) • You can limit the total nb of cores per job
spark-submit --total-executor-cores YY …
• You can concentrate the cores into few multi‐core Executors
45 46
47 48
8
08/09/2021
HDFS
Cluster deployment mode: Name Node Cluster worker node
& Hadoop Data Node
Spark Master Spark app. Driver
Cluster • DAG builder
Manager • DAG scheduler‐optimizer
• Task scheduler The Cluster Worker nodes should be the Data nodes, storing initial
RDD values or new generated (and saved) RDD
Laptop connection
Spark
executor
Spark
executor
Will improve the global data‐computations locality
can be turn off: When using HDFS: the Hadoop data nodes should be
Spark Spark Spark
production mode executor executor executor re‐used as worker nodes for Spark Executors
49 50
51 52
Using the Spark Master as cluster
manager (standalone mode)
spark-submit --master spark://node:port … myApp
Spark optimizations & deployment
Spark Master Cluster worker node
Cluster & Hadoop Data Node
Manager
Cluster worker node 1. Wide and Narrow transformations
& Hadoop Data Node 2. Optimizations
HDFS
Name Node Cluster worker node 3. Page Rank example
& Hadoop Data Node
4. Deployment on clusters & clouds
Strenght and weakness of standalone mode: • Task DAG execution
• Nothing more to install (included in Spark) • Spark execution on clusters
• Easy to configure • Using the Spark cluster manager (standalone mode)
• Can run different jobs concurrently • Using YARN as cluster manager
• Can not share the cluster with non‐Spark applications • Using Mesos as cluster manager
• Can not launch Executors on the data nodes hosting input data • Ex of Spark execution on cloud
• Limited scheduling mechanism (unique queue)
53 54
9
08/09/2021
HDFS HDFS
Name Node Cluster worker node Name Node Cluster worker node
& Hadoop Data Node & Hadoop Data Node
Spark cluster configuration: Spark cluster configuration:
• Add an env. variable defining the path to Hadoop conf directory • By default:
• Specify the maximum amount of memory per Spark Executor − (only) 1GB/Spark Executor
• Specify the amount of CPU cores used per Spark executor − (only) 1 CPU core per Spark Executor
spark-submit --executor-cores YY … − (only) 2 Spark Executors per job
• Specify the nb of Spark Executors per job: --num-executors • Usually better with few large Executors (RAM & nb of cores)…
55 56
57 58
59 60
10
08/09/2021
61 62
63 64
65 66
11
08/09/2021
67 68
3. Page Rank example
4. Deployment on clusters & clouds
• Task DAG execution
• Spark execution on clusters
• Ex of Spark execution on cloud
69 70
MyCluster‐1
• DAG scheduler‐optimizer • DAG scheduler‐optimizer
• Task scheduler • Task scheduler
Spark Master
HDFS
Name Node
71 72
12
08/09/2021
• DAG scheduler‐optimizer
• Task scheduler
Standalone Spark app. Driver
Spark Master • DAG builder
MyCluster‐1
• DAG scheduler‐optimizer
HDFS Spark Spark • Task scheduler
Name Node executor executor
Spark Spark Spark HDFS Spark Spark
executor executor executor Name Node executor executor
MyCluster‐2
73 74
Standalone
Spark Master • Allocate the right number of nodes
• Stop when you do not use, and re‐start further
HDFS
Name Node
Choose to allocate reliable or preemptible machines:
spark-ec2 stop MyCluster-1 Stop billing • Reliable machines during all the session (standard)
spark-ec2 … start MyCluster-1 Restart billing • Preemptibles machines (5x less expensive!)
require to support to loose some tasks, or to checkpoint…
spark-ec2 destroy MyCluster-1
75 76
Spark optimizations & deployment
77
13