sparkcontext初始化的流程
-sparkConf对象,也就是spark的配置对象,用来描述spark的配置信息,主要是以键值对的形式加载配置信息 -一旦通过newsparkconf()完成了对象的实例化,会默认加载spark.*配置文件 class SparkConf(loadDefaults:Boolean){ def this()=this(true) }
注意事项
-SparkContext对象的实例化,需要一个sparkconf对象作为参数, -在sparkcontext内部,会完成对这个sparkconf对象的克隆,得到一个各个属性值都完全相同的对象,但是和传入的sparkconf并不是一个对象, -在sparkcontext后续的操作中,使用到的到处到时这个克隆的sparkconf对象 -注意事项:将sparkconf对象的作为参数,传递给sparkcontext对象,后续修改这个sparkconf对象是无效的 override def clone:SparkConf={ val cloned=new SparkConf(false) settings.entrySet().asScala.foreach{ e=>cloned.set(e.getKey(),e.getValue(),true) } cloned
sparkcontext
sparkcontext的初始化过程: 1.初始化了spark对象,读取了默认的配置信息,并可以设置一些信息 2.将sparkconf对象,加载到sparkcontext中,对各个配置属性进行初始化的设置 3.通过createTaskScheduler方法,实例化了taskScheduler和DAGScheduler
// 根据传入的Master地址,创建SchedulerBackend和TaskScheduler // SparkContext.scala ,line 2692 private def createTaskScheduler( sc: SparkContext, master: String, deployMode: String): (SchedulerBackend, TaskScheduler) = { import SparkMasterRegex._ // When running locally, don't try to re-execute tasks on failure. val MAX_LOCAL_TASK_FAILURES = 1 master match { // setMaster("local"),local模式 case "local" => val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true) val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1) scheduler.initialize(backend) (backend, scheduler) // setMaster("local[2]") || setMaster("local[*]"),local模式 case LOCAL_N_REGEX(threads) => def localCpuCount: Int = Runtime.getRuntime.availableProcessors() // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads. val threadCount = if (threads == "*") localCpuCount else threads.toInt if (threadCount <= 0) { throw new SparkException(s"Asked to run locally with $threadCount threads") } val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true) val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount) scheduler.initialize(backend) (backend, scheduler) // Standalone模式 case SPARK_REGEX(sparkUrl) => val scheduler = new TaskSchedulerImpl(sc) val masterUrls = sparkUrl.split(",").map("spark://" + _) val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls) scheduler.initialize(backend) (backend, scheduler) // 其他的资源调度,例如Mesos、YARN case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) => // Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang. val memoryPerSlaveInt = memoryPerSlave.toInt if (sc.executorMemory > memoryPerSlaveInt) { throw new SparkException( "Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format( memoryPerSlaveInt, sc.executorMemory)) } val scheduler = new TaskSchedulerImpl(sc) val localCluster = new LocalSparkCluster( numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf) val masterUrls = localCluster.start() val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls) scheduler.initialize(backend) backend.shutdownCallback = (backend: StandaloneSchedulerBackend) => { localCluster.stop() } (backend, scheduler) } }
taskScheduler
/** * Low-level task scheduler interface, currently implemented exclusively by * [[org.apache.spark.scheduler.TaskSchedulerImpl]]. * This interface allows plugging in different task schedulers. Each TaskScheduler schedules tasks * for a single SparkContext. These schedulers get sets of tasks submitted to them from the * DAGScheduler for each stage, and are responsible for sending the tasks to the cluster, running * them, retrying if there are failures, and mitigating stragglers. They return events to the * DAGScheduler. */ taskscheduler是一个低级别的task调度的接口,目前只有一个实现类,就是taskSchedulerimpl,这个taskScheduler可以挂载在不同的调度器上,指的是(SchedulerBackend) 每一个taskScheduler只能为一个sparkContext调度任务,初始化taskScheduler是处理之前的spark任务,如果有心的sparkapplication提交此时就会销毁当前的taskScheduler,并创建一个新的taskScheduler来处理新的任务 taskScheduler可以从dagScheduler获取每一个stage的taskset,用来提交,处理这些task,发到集群执行,如果失败后就进行重复提交,处理散兵游勇,并将任务的执行结果反馈给dagscheduler (散兵游勇:提交给集群运行的task,可能会有掉队的情况,需要将这样的task处理掉,不至于由于这一两个task影响整体的执行)
taskSchedulerimpl
客户端需要先调用initialize()和start()方法,然后才可以通过runtasks提交taskset
// line81 // Task的等待时常,默认是100ms val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms") // line92 // 初始化TaskSet的时常,默认是15s val STARVATION_TIMEOUT_MS = conf.getTimeAsMs("spark.starvation.timeout", "15s") // line95 // 每一个Task分配到的CPU核数 val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1) // line136 // 调度模式,默认是FIFO private val schedulingModeConf = conf.get(SCHEDULER_MODE_PROPERTY, SchedulingMode.FIFO.toString)
/* coarseGrainedSchedulerBackend 粗粒度调度器(coarseGrainedSchedulerBackend) job的每一个声明周期,都会有一个executor 当一个task执行结束后,并不会立即释放executor 当一个新的task进来后,不会创建一个新的executor,会复用之前的executor 实现了executor复用 细粒度调度器(FineGrainedSchedulerBackend) task执行结束后,会释放executor 当一个新的task进来之后会创建一个新的executor去执行 standalone模式和yarn模式,只支持粗粒度的调度器 mesos支持细粒度调度器 任务的调度方式: fifo:先进先出调度 优先将executor分配到一个worker上,当这个worder资源不足时残水将executor分配到其他的worker上 fair:公平调度 基于负载均衡,平均的将executor分配到每一个worker节点 */ def initialize(backend: SchedulerBackend) { this.backend = backend schedulableBuilder = { schedulingMode match { case SchedulingMode.FIFO => new FIFOSchedulableBuilder(rootPool) case SchedulingMode.FAIR => new FairSchedulableBuilder(rootPool, conf) case _ => throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " + s"$schedulingMode") } } schedulableBuilder.buildPools() } override def start() { backend.start() if (!isLocal && conf.getBoolean("spark.speculation", false)) { logInfo("Starting speculative execution thread") speculationScheduler.scheduleWithFixedDelay(new Runnable { override def run(): Unit = Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() } }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS) } }
StandaloneSchedulerBackend
override def start() { // 调用父类CoarseGrainedSchedulerBackend中的方法实现 // driverEndpoint = createDriverEndpointRef(properties) // 实例化了一个Driver的RPC通信终端 super.start() // ... // 创建了一个Application的描述对象,传递了一系列参数,表示Application所需要的资源信息 val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command, webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit) // 创建了一个Application的任务对象,包含了作业的资源信息 // 用于和集群管理器进行通信 client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf) client.start() launcherBackend.setState(SparkAppHandle.State.SUBMITTED) // 等待注册是否完成,在StandaloneAppClient完成 waitForRegistration() launcherBackend.setState(SparkAppHandle.State.RUNNING) }
DriverEndpoint
是coarseGrainedSchedulerBackend的内部类,是driver端的通信模型
override def onStart() { // Periodically revive offers to allow delay scheduling to work val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s") reviveThread.scheduleAtFixedRate(new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { // 给自己发送一个ReviveOffers信号 Option(self).foreach(_.send(ReviveOffers)) } }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS) }
override def onStart() { // Periodically revive offers to allow delay scheduling to work val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s") reviveThread.scheduleAtFixedRate(new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { // 给自己发送一个ReviveOffers信号 Option(self).foreach(_.send(ReviveOffers)) } }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS) }
// 为Executor创建虚拟的资源信息 private def makeOffers() { // Make sure no executor is killed while some task is launching on it val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized { // Filter out executors under killing val activeExecutors = executorDataMap.filterKeys(executorIsAlive) val workOffers = activeExecutors.map { case (id, executorData) => new WorkerOffer(id, executorData.executorHost, executorData.freeCores) }.toIndexedSeq scheduler.resourceOffers(workOffers) } if (!taskDescs.isEmpty) { launchTasks(taskDescs) } }
StandaloneAppClient
override def onStart(): Unit = { try { //向master发送注册消息 //参数的1代表第一次注册,在注册逻辑中,如果注册失败,则会建这个数字+1继续调用注册, //失败次数>=3的时候注册失败 registerWithMaster(1) } catch { case e: Exception => logWarning("Failed to connect to master", e) markDisconnected() stop() } }
private def registerWithMaster(nthRetry: Int) { registerMasterFutures.set(tryRegisterAllMasters()) registrationRetryTimer.set(registrationRetryThread.schedule(new Runnable { override def run(): Unit = { if (registered.get) { registerMasterFutures.get.foreach(_.cancel(true)) registerMasterThreadPool.shutdownNow() } else if (nthRetry >= REGISTRATION_RETRIES) { markDead("All masters are unresponsive! Giving up.") } else { registerMasterFutures.get.foreach(_.cancel(true)) registerWithMaster(nthRetry + 1) } } }, REGISTRATION_TIMEOUT_SECONDS, TimeUnit.SECONDS)) }
Master
// line 258 // 在receive方法中,用来接收driver端发送过来的消息,进行模式匹配 case RegisterApplication(description, driver) => // TODO Prevent repeatd registrations from some driver if (state == RecoveryState.STANDBY) { // ignore, don't send response } else { logInfo("Registering app " + description.name) // 创建应用程序,封装对应的driver端的资源 val app = createApplication(description, driver) //在master内部完成appliction的注册 registerApplication(app) logInfo("Registered app " + description.name + " with ID " + app.id) // 使用持久化的操作,将任务的元信息保存,以便task使用 persistenceEngine.addApplication(app) // 告诉Driver端,注册完成! driver.send(RegisteredApplication(app.id, self)) schedule() }
override def receive: PartialFunction[Any, Unit] = { case RegisteredApplication(appId_, masterRef) => // FIXME How to handle the following cases? // 1. A master receives multiple registrations and sends back multiple // RegisteredApplications due to an unstable network. // 2. Receive multiple RegisteredApplication from different masters because the master is // changing. appId.set(appId_) registered.set(true) master = Some(masterRef) listener.connected(appId.get)
stage和task的划分
SparkContext.runJob
// rdd的action算子,会触发runjob方法,生成一个job def runJob[T, U: ClassTag]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], resultHandler: (Int, U) => Unit): Unit = { if (stopped.get()) { throw new IllegalStateException("SparkContext has been shutdown") } val callSite = getCallSite val cleanedFunc = clean(func) logInfo("Starting job: " + callSite.shortForm) if (conf.getBoolean("spark.logLineage", false)) { logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString) } // 到此spark的任务就可以在sparkcontext中得以运行,转发到dagSchuster中 dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get) progressBar.foreach(_.finishAll()) rdd.doCheckpoint() }
DAGScheduler.runJob
def runJob[T, U]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], callSite: CallSite, resultHandler: (Int, U) => Unit, properties: Properties): Unit = { val start = System.nanoTime // 通过submitjob方法提交spark的作业 // 通过action算子,生成job,通过submit提交给taskScheduler val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties) ThreadUtils.awaitReady(waiter.completionFuture, Duration.Inf) waiter.completionFuture.value.get match { case scala.util.Success(_) => logInfo("Job %d finished: %s, took %f s".format (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9)) case scala.util.Failure(exception) => logInfo("Job %d failed: %s, took %f s".format (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9)) // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler. val callerStackTrace = Thread.currentThread().getStackTrace.tail exception.setStackTrace(exception.getStackTrace ++ callerStackTrace) throw exception } }
def submitJob[T, U]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], callSite: CallSite, resultHandler: (Int, U) => Unit, properties: Properties): JobWaiter[U] = { // ... // eventProcessLoop: 事件循环器先事件循环处理器中提交了一个jobsubmitted事件 // 通过post方法进行事件的提交,稍后就会有一个线程来处理 eventProcessLoop.post(JobSubmitted( jobId, rdd, func2, partitions.toArray, callSite, waiter, SerializationUtils.clone(properties))) waiter }
EventLoop
run
override def run(): Unit = { try { while (!stopped.get) { val event = eventQueue.take() // 从事件队列中取出一个事件 try { // 接收这个事件,处理这个事件 // 这个方法在当前的类中是一个抽象方法 // 跳转到dagSchedulereventprocessloop中实现 onReceive(event) } catch { case NonFatal(e) => try { onError(e) } catch { case NonFatal(e) => logError("Unexpected error in " + name, e) } } } } catch { case ie: InterruptedException => // exit even if eventQueue is not empty case NonFatal(e) => logError("Unexpected error in " + name, e) } }
DAGSchedulerEventProcessLoop
override def onReceive(event: DAGSchedulerEvent): Unit = { val timerContext = timer.time() try { doOnReceive(event) } finally { timerContext.stop() } }
private def doOnReceive(event: DAGSchedulerEvent): Unit = event match { // 处理JobSubmitted事件 case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) => dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) }
handleJobSubmitted
// 这个方法,是DAG的精髓所在 // Stage的划分和Task的划分都在这里 private[scheduler] def handleJobSubmitted(jobId: Int, finalRDD: RDD[_], func: (TaskContext, Iterator[_]) => _, partitions: Array[Int], callSite: CallSite, listener: JobListener, properties: Properties) { var finalStage: ResultStage = null try { // 拆分ResultStage,一个作业的最后阶段,就是一个ResuleStage finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite) } catch { case e: Exception => logWarning("Creating new stage failed due to exception - job: " + jobId, e) listener.jobFailed(e) return } // ... // 将拆分好的Stage提交 submitStage(finalStage) }
private def createResultStage( rdd: RDD[_], func: (TaskContext, Iterator[_]) => _, partitions: Array[Int], jobId: Int, callSite: CallSite): ResultStage = { // 找到ResultStage的父Stage val parents = getOrCreateParentStages(rdd, jobId) val id = nextStageId.getAndIncrement() val stage = new ResultStage(id, rdd, func, partitions, parents, jobId, callSite) stageIdToStage(id) = stage updateJobIdStageIdMaps(jobId, stage) stage } private def getOrCreateParentStages(rdd: RDD[_], firstJobId: Int): List[Stage] = { getShuffleDependencies(rdd).map { shuffleDep => getOrCreateShuffleMapStage(shuffleDep, firstJobId) }.toList } // 通过宽窄依赖划分Stage,从后往前查找,递归调用自己,直到找不到父Stage为止 private def getOrCreateShuffleMapStage( shuffleDep: ShuffleDependency[_, _, _], firstJobId: Int): ShuffleMapStage = { shuffleIdToMapStage.get(shuffleDep.shuffleId) match { case Some(stage) => stage case None => // Create stages for all missing ancestor shuffle dependencies. getMissingAncestorShuffleDependencies(shuffleDep.rdd).foreach { dep => // Even though getMissingAncestorShuffleDependencies only returns shuffle dependencies // that were not already in shuffleIdToMapStage, it's possible that by the time we // get to a particular dependency in the foreach loop, it's been added to // shuffleIdToMapStage by the stage creation process for an earlier dependency. See // SPARK-13902 for more information. if (!shuffleIdToMapStage.contains(dep.shuffleId)) { createShuffleMapStage(dep, firstJobId) } } // Finally, create a stage for the given shuffle dependency. createShuffleMapStage(shuffleDep, firstJobId) } }
private def submitStage(stage: Stage) { val jobId = activeJobForStage(stage) if (jobId.isDefined) { logDebug("submitStage(" + stage + ")") if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) { // 找到没有父Stage的Stage val missing = getMissingParentStages(stage).sortBy(_.id) logDebug("missing: " + missing) if (missing.isEmpty) { logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents") // 提交任务 submitMissingTasks(stage, jobId.get) } else { for (parent <- missing) { submitStage(parent) } waitingStages += stage } } } else { abortStage(stage, "No active job for stage " + stage.id, None) } }
// 用来构建Task和提交Task private def submitMissingTasks(stage: Stage, jobId: Int) { // ... val tasks: Seq[Task[_]] = try { val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array() stage match { case stage: ShuffleMapStage => stage.pendingPartitions.clear() partitionsToCompute.map { id => val locs = taskIdToLocations(id) val part = stage.rdd.partitions(id) stage.pendingPartitions += id new ShuffleMapTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId), Option(sc.applicationId), sc.applicationAttemptId) } case stage: ResultStage => partitionsToCompute.map { id => val p: Int = stage.partitions(id) val part = stage.rdd.partitions(p) val locs = taskIdToLocations(id) new ResultTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, id, properties, serializedTaskMetrics, Option(jobId), Option(sc.applicationId), sc.applicationAttemptId) } } } catch { case NonFatal(e) => abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e)) runningStages -= stage return } if (tasks.size > 0) { logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " + s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})") // 将Task封装到TaskSet中,提交给TaskScheduler taskScheduler.submitTasks(new TaskSet( tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties)) stage.latestInfo.submissionTime = Some(clock.getTimeMillis()) } else { // Because we posted SparkListenerStageSubmitted earlier, we should mark // the stage as completed here in case there are no tasks to run markStageAsFinished(stage, None) val debugString = stage match { case stage: ShuffleMapStage => s"Stage ${stage} is actually done; " + s"(available: ${stage.isAvailable}," + s"available outputs: ${stage.numAvailableOutputs}," + s"partitions: ${stage.numPartitions})" case stage : ResultStage => s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})" } logDebug(debugString) submitWaitingChildStages(stage) } }
override def submitTasks(taskSet: TaskSet) { backend.reviveOffers() } override def reviveOffers() { driverEndpoint.send(ReviveOffers) } // 此时DriverEndPoint的receive可以接收到消息 override def receive: PartialFunction[Any, Unit] = { case ReviveOffers => makeOffers() } private def makeOffers() { // Make sure no executor is killed while some task is launching on it val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized { // Filter out executors under killing val activeExecutors = executorDataMap.filterKeys(executorIsAlive) val workOffers = activeExecutors.map { case (id, executorData) => new WorkerOffer(id, executorData.executorHost, executorData.freeCores) }.toIndexedSeq scheduler.resourceOffers(workOffers) } if (!taskDescs.isEmpty) { launchTasks(taskDescs) } } // Launch tasks returned by a set of resource offers private def launchTasks(tasks: Seq[Seq[TaskDescription]]) { for (task <- tasks.flatten) { val serializedTask = TaskDescription.encode(task) if (serializedTask.limit >= maxRpcMessageSize) { scheduler.taskIdToTaskSetManager.get(task.taskId).foreach { taskSetMgr => try { var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " + "spark.rpc.message.maxSize (%d bytes). Consider increasing " + "spark.rpc.message.maxSize or using broadcast variables for large values." msg = msg.format(task.taskId, task.index, serializedTask.limit, maxRpcMessageSize) taskSetMgr.abort(msg) } catch { case e: Exception => logError("Exception in error callback", e) } } } else { val executorData = executorDataMap(task.executorId) executorData.freeCores -= scheduler.CPUS_PER_TASK logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " + s"${executorData.executorHost}.") // 在Driver端,向Executor发送一个执行句柄LanchTask,包含了Task的信息 executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask))) } } }
CoarseGrainedExecutorBackend
driver端向executor端发送了任务执行句柄,executor可以接收到对应的消息
override def receive: PartialFunction[Any, Unit] = { case LaunchTask(data) => if (executor == null) { exitExecutor(1, "Received LaunchTask command but executor was null") } else { // 将Driver端发送过来的Task的信息解码 // 获取Task所依赖的资源 val taskDesc = TaskDescription.decode(data.value) logInfo("Got assigned task " + taskDesc.taskId) executor.launchTask(this, taskDesc) } }
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = { val tr = new TaskRunner(context, taskDescription) runningTasks.put(taskDescription.taskId, tr) threadPool.execute(tr) }
// 进入到TaskRunner的内部,在run方法的第334行 // 这个value就是Task的执行结果 val value = try { val res = task.run( taskAttemptId = taskId, attemptNumber = taskDescription.attemptNumber, metricsSystem = env.metricsSystem) threwException = false res } // line 407 // val serializedResult: ByteBuffer = // 将计算结果序列化,返回给Driver端 // line 429 // execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)
spark的作业提交
rdd.action()查看作业的提交流程,最终执行的是sc.runJob(),该runJob会在rdd的所有分区partition上面执行计算,并将结果返回,返回值是一个array 通过一系列的runJob的重载函数,最终将作业的云心转发给dagscheduler.runjob方法去执行 submitjob 将jobsubmitted提交到作业处理队列(EventLoop)中 该队列启动一个后台线程,不断从队列中获取相关事件,进行执行 在dagSchedulerEventProcessLoop的doOnReceive中完成 dagScheduler.handleJobSubmitted在该方法中进行阶段stage划分 createResultStage--核心阶段划分 1.spark中的stage有两类,resultState和shuflleMapstage 2.resultstage的直接创建new出来 3.上游的shuffleMapStage通过getOrCreateParentStages创建--parent 4.提交stage作业submitstage(finalstage) 递归的安装依赖,顺序提交satge阶段 if(missing.isEmpty){ submitMissingTasks(stage,jobId.get)//第一个没有parent的stage }else{ for(parent<-missing){ //一个阶段一个阶段的提交作业 sumbmitStage(parent) } } 在submitMissingTasks方法中提交task,如果是resultStage,构建resultTask,如果是shuffleMapStage构建shuffleMapTask,注意返回值是Seq[Task[_]],也就是说一个分区会生成一个task去处理 taskScheduler.submitTasks(new TaskSet(task.toArray,stage.id,stage.latestInfo.attemptId,jobId.properties)) 1.作业有通过taskScheduler来提交,提交的是Dagscheduler将stage封装好的task(taskset) 2.现在将spark的task作业交给taskScheduler关联的backend--backend.reviveOffers() loop-->通过coarseGrainedSchedulerBackend来完成提交真正的作业提交,通过driverEndPoint来提交,发送了一个ReviveOffers事件--->makeOffers() 核心代码 if(!taskDescs.isEmpty){ launchTasks(taskDescs) } 在driverEndpoint中向executor发送了一个launchtask的时间,由coarseGrainedExecutorBackend接收并处理--executorlaunchtask(this,taskDesc) 将task任务封装进了一个taskrunner线程,并将该线程丢到线程池汇总运行 1.第一个步将driver中接收到的续流化的task进行反序列化 2.调度task的run方法执行作业 3.task作业执行的返回中用value接收 作业的执行在runTask(context)中完成 因为不同类型的task,作业运行方式不同,shuffleMapTask和resultTask执行方式肯定不一样 接着以下两条线
shuffleMap task | 通过externalsorter完成数据读的过程,每次从上游拉取默认为48M的数据到缓冲区进行业务计算 |
---|---|
通过shuffleManager创建shufflewriter将作业执行数据输出到磁盘,最终结果状态通过mapstatus来封装表示 | func(context,rdd,iterator(partition,context)) |
execbackendstatusupdate(taskId,Taskstate.FINISHEND,serializedResult)作业执行完毕之后做了task状态的的更新,携带者上一步执行的节骨oserializeResult | computeOrReadCheckpoint |
进一步将此作业结果封装成statusupdate的case calss发送给driver(driverEndPoint)<br />makeoffers(excutorId)执行下一步的作业 | shuffleRddcompute(split,context) |
Loop--> | shufflemanager.getReader读取上游的数据(sortshuffleManager) |
blockStoreShuffleReader中的read方法读取上游数据 |