spark源码

本文深入探讨了SparkContext的初始化过程,包括sparkcontext、taskScheduler、taskSchedulerimpl和StandaloneSchedulerBackend的细节。同时,解析了DriverEndpoint、StandaloneAppClient以及Master的角色。文章还详细阐述了stage和task的划分,通过SparkContext.runJob和DAGScheduler.runJob的调用来理解任务执行。最后,介绍了Spark的作业提交流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

sparkcontext初始化的流程

-sparkConf对象,也就是spark的配置对象,用来描述spark的配置信息,主要是以键值对的形式加载配置信息
-一旦通过newsparkconf()完成了对象的实例化,会默认加载spark.*配置文件
class SparkConf(loadDefaults:Boolean){
    def this()=this(true)
}

注意事项

-SparkContext对象的实例化,需要一个sparkconf对象作为参数,
-在sparkcontext内部,会完成对这个sparkconf对象的克隆,得到一个各个属性值都完全相同的对象,但是和传入的sparkconf并不是一个对象,
-在sparkcontext后续的操作中,使用到的到处到时这个克隆的sparkconf对象
-注意事项:将sparkconf对象的作为参数,传递给sparkcontext对象,后续修改这个sparkconf对象是无效的
​
​
override def clone:SparkConf={
    val cloned=new SparkConf(false)
    settings.entrySet().asScala.foreach{
        e=>cloned.set(e.getKey(),e.getValue(),true)
    }
    cloned

sparkcontext

sparkcontext的初始化过程:
1.初始化了spark对象,读取了默认的配置信息,并可以设置一些信息
2.将sparkconf对象,加载到sparkcontext中,对各个配置属性进行初始化的设置
3.通过createTaskScheduler方法,实例化了taskScheduler和DAGScheduler
// 根据传入的Master地址,创建SchedulerBackend和TaskScheduler
// SparkContext.scala ,line 2692
private def createTaskScheduler(
    sc: SparkContext,
    master: String,
    deployMode: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._
​
    // When running locally, don't try to re-execute tasks on failure.
    val MAX_LOCAL_TASK_FAILURES = 1
​
    master match {
        // setMaster("local"),local模式
        case "local" =>
            val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
            val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
            scheduler.initialize(backend)
            (backend, scheduler)
​
        // setMaster("local[2]") || setMaster("local[*]"),local模式
        case LOCAL_N_REGEX(threads) =>
            def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
            // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
            val threadCount = if (threads == "*") localCpuCount else threads.toInt
            if (threadCount <= 0) {
                throw new SparkException(s"Asked to run locally with $threadCount threads")
            }
            val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
            val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
            scheduler.initialize(backend)
            (backend, scheduler)
​
        // Standalone模式
        case SPARK_REGEX(sparkUrl) =>
            val scheduler = new TaskSchedulerImpl(sc)
            val masterUrls = sparkUrl.split(",").map("spark://" + _)
            val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
            scheduler.initialize(backend)
            (backend, scheduler)
​
        // 其他的资源调度,例如Mesos、YARN
        case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
            // Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
            val memoryPerSlaveInt = memoryPerSlave.toInt
            if (sc.executorMemory > memoryPerSlaveInt) {
                throw new SparkException(
                    "Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
                        memoryPerSlaveInt, sc.executorMemory))
            }
​
            val scheduler = new TaskSchedulerImpl(sc)
            val localCluster = new LocalSparkCluster(
                numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
            val masterUrls = localCluster.start()
            val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
            scheduler.initialize(backend)
            backend.shutdownCallback = (backend: StandaloneSchedulerBackend) => {
                localCluster.stop()
            }
            (backend, scheduler)
    }
}

taskScheduler

/**
 * Low-level task scheduler interface, currently implemented exclusively by
 * [[org.apache.spark.scheduler.TaskSchedulerImpl]].
 * This interface allows plugging in different task schedulers. Each TaskScheduler schedules tasks
 * for a single SparkContext. These schedulers get sets of tasks submitted to them from the
 * DAGScheduler for each stage, and are responsible for sending the tasks to the cluster, running
 * them, retrying if there are failures, and mitigating stragglers. They return events to the
 * DAGScheduler.
 */
 taskscheduler是一个低级别的task调度的接口,目前只有一个实现类,就是taskSchedulerimpl,这个taskScheduler可以挂载在不同的调度器上,指的是(SchedulerBackend)
 每一个taskScheduler只能为一个sparkContext调度任务,初始化taskScheduler是处理之前的spark任务,如果有心的sparkapplication提交此时就会销毁当前的taskScheduler,并创建一个新的taskScheduler来处理新的任务
 taskScheduler可以从dagScheduler获取每一个stage的taskset,用来提交,处理这些task,发到集群执行,如果失败后就进行重复提交,处理散兵游勇,并将任务的执行结果反馈给dagscheduler
 (散兵游勇:提交给集群运行的task,可能会有掉队的情况,需要将这样的task处理掉,不至于由于这一两个task影响整体的执行)

taskSchedulerimpl

客户端需要先调用initialize()和start()方法,然后才可以通过runtasks提交taskset
// line81
// Task的等待时常,默认是100ms
val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms")
// line92
// 初始化TaskSet的时常,默认是15s
val STARVATION_TIMEOUT_MS = conf.getTimeAsMs("spark.starvation.timeout", "15s")
// line95
// 每一个Task分配到的CPU核数
val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)
// line136
// 调度模式,默认是FIFO
private val schedulingModeConf = conf.get(SCHEDULER_MODE_PROPERTY, SchedulingMode.FIFO.toString)
/*
coarseGrainedSchedulerBackend
粗粒度调度器(coarseGrainedSchedulerBackend)
    job的每一个声明周期,都会有一个executor
    当一个task执行结束后,并不会立即释放executor
    当一个新的task进来后,不会创建一个新的executor,会复用之前的executor
    实现了executor复用
细粒度调度器(FineGrainedSchedulerBackend)
    task执行结束后,会释放executor
    当一个新的task进来之后会创建一个新的executor去执行
standalone模式和yarn模式,只支持粗粒度的调度器
mesos支持细粒度调度器
任务的调度方式:
    fifo:先进先出调度
        优先将executor分配到一个worker上,当这个worder资源不足时残水将executor分配到其他的worker上
    fair:公平调度
        基于负载均衡,平均的将executor分配到每一个worker节点
*/
def initialize(backend: SchedulerBackend) {
    this.backend = backend
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
        case _ =>
          throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
          s"$schedulingMode")
      }
    }
    schedulableBuilder.buildPools()
}
​
override def start() {
    backend.start()
​
    if (!isLocal && conf.getBoolean("spark.speculation", false)) {
      logInfo("Starting speculative execution thread")
      speculationScheduler.scheduleWithFixedDelay(new Runnable {
        override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
          checkSpeculatableTasks()
        }
      }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
    }
}

StandaloneSchedulerBackend

override def start() {
    // 调用父类CoarseGrainedSchedulerBackend中的方法实现
    // driverEndpoint = createDriverEndpointRef(properties)
    // 实例化了一个Driver的RPC通信终端
    super.start()
​
    // ...
    
    // 创建了一个Application的描述对象,传递了一系列参数,表示Application所需要的资源信息
    val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
    // 创建了一个Application的任务对象,包含了作业的资源信息
    // 用于和集群管理器进行通信
    client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
    client.start()
    launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
    // 等待注册是否完成,在StandaloneAppClient完成
    waitForRegistration()
    launcherBackend.setState(SparkAppHandle.State.RUNNING)
  }

DriverEndpoint

是coarseGrainedSchedulerBackend的内部类,是driver端的通信模型
override def onStart() {
    // Periodically revive offers to allow delay scheduling to work
    val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s")
​
    reviveThread.scheduleAtFixedRate(new Runnable {
        override def run(): Unit = Utils.tryLogNonFatalError {
            // 给自己发送一个ReviveOffers信号
            Option(self).foreach(_.send(ReviveOffers))
        }
    }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
}
override def onStart() {
    // Periodically revive offers to allow delay scheduling to work
    val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s")
​
    reviveThread.scheduleAtFixedRate(new Runnable {
        override def run(): Unit = Utils.tryLogNonFatalError {
            // 给自己发送一个ReviveOffers信号
            Option(self).foreach(_.send(ReviveOffers))
        }
    }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
}
// 为Executor创建虚拟的资源信息
private def makeOffers() {
    // Make sure no executor is killed while some task is launching on it
    val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {
        // Filter out executors under killing
        val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
        val workOffers = activeExecutors.map { case (id, executorData) =>
            new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
        }.toIndexedSeq
        scheduler.resourceOffers(workOffers)
    }
    if (!taskDescs.isEmpty) {
        launchTasks(taskDescs)
    }
}

StandaloneAppClient

override def onStart(): Unit = {
    try {
        //向master发送注册消息
        //参数的1代表第一次注册,在注册逻辑中,如果注册失败,则会建这个数字+1继续调用注册,
        //失败次数>=3的时候注册失败
        registerWithMaster(1)
    } catch {
        case e: Exception =>
        logWarning("Failed to connect to master", e)
        markDisconnected()
        stop()
    }
}
private def registerWithMaster(nthRetry: Int) {
    registerMasterFutures.set(tryRegisterAllMasters())
    registrationRetryTimer.set(registrationRetryThread.schedule(new Runnable {
        override def run(): Unit = {
            if (registered.get) {
                registerMasterFutures.get.foreach(_.cancel(true))
                registerMasterThreadPool.shutdownNow()
            } else if (nthRetry >= REGISTRATION_RETRIES) {
                markDead("All masters are unresponsive! Giving up.")
            } else {
                registerMasterFutures.get.foreach(_.cancel(true))
                registerWithMaster(nthRetry + 1)
            }
        }
    }, REGISTRATION_TIMEOUT_SECONDS, TimeUnit.SECONDS))
}

Master

// line 258
// 在receive方法中,用来接收driver端发送过来的消息,进行模式匹配
​
case RegisterApplication(description, driver) =>
    // TODO Prevent repeatd registrations from some driver
    if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
    } else {
        logInfo("Registering app " + description.name)
        // 创建应用程序,封装对应的driver端的资源
        val app = createApplication(description, driver)
        //在master内部完成appliction的注册
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        // 使用持久化的操作,将任务的元信息保存,以便task使用
        persistenceEngine.addApplication(app)
        // 告诉Driver端,注册完成!
        driver.send(RegisteredApplication(app.id, self))
        schedule()
    }
override def receive: PartialFunction[Any, Unit] = {
    case RegisteredApplication(appId_, masterRef) =>
        // FIXME How to handle the following cases?
        // 1. A master receives multiple registrations and sends back multiple
        // RegisteredApplications due to an unstable network.
        // 2. Receive multiple RegisteredApplication from different masters because the master is
        // changing.
        appId.set(appId_)
        registered.set(true)
        master = Some(masterRef)
        listener.connected(appId.get)

stage和task的划分

SparkContext.runJob

// rdd的action算子,会触发runjob方法,生成一个job
def runJob[T, U: ClassTag](
    rdd: RDD[T],
    func: (TaskContext, Iterator[T]) => U,
    partitions: Seq[Int],
    resultHandler: (Int, U) => Unit): Unit = {
    if (stopped.get()) {
        throw new IllegalStateException("SparkContext has been shutdown")
    }
    val callSite = getCallSite
    val cleanedFunc = clean(func)
    logInfo("Starting job: " + callSite.shortForm)
    if (conf.getBoolean("spark.logLineage", false)) {
        logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)
    }
    // 到此spark的任务就可以在sparkcontext中得以运行,转发到dagSchuster中
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    progressBar.foreach(_.finishAll())
    rdd.doCheckpoint()
}

DAGScheduler.runJob

def runJob[T, U](
    rdd: RDD[T],
    func: (TaskContext, Iterator[T]) => U,
    partitions: Seq[Int],
    callSite: CallSite,
    resultHandler: (Int, U) => Unit,
    properties: Properties): Unit = {
    val start = System.nanoTime
    // 通过submitjob方法提交spark的作业
    // 通过action算子,生成job,通过submit提交给taskScheduler
    val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)
    ThreadUtils.awaitReady(waiter.completionFuture, Duration.Inf)
    waiter.completionFuture.value.get match {
        case scala.util.Success(_) =>
        logInfo("Job %d finished: %s, took %f s".format
                (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
        case scala.util.Failure(exception) =>
        logInfo("Job %d failed: %s, took %f s".format
                (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
        // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.
        val callerStackTrace = Thread.currentThread().getStackTrace.tail
        exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)
        throw exception
    }
}
def submitJob[T, U](
    rdd: RDD[T],
    func: (TaskContext, Iterator[T]) => U,
    partitions: Seq[Int],
    callSite: CallSite,
    resultHandler: (Int, U) => Unit,
    properties: Properties): JobWaiter[U] = {
    // ...
    // eventProcessLoop: 事件循环器先事件循环处理器中提交了一个jobsubmitted事件
    // 通过post方法进行事件的提交,稍后就会有一个线程来处理
    eventProcessLoop.post(JobSubmitted(
        jobId, rdd, func2, partitions.toArray, callSite, waiter,
        SerializationUtils.clone(properties)))
    waiter
}
​

EventLoop

run

override def run(): Unit = {
    try {
        while (!stopped.get) {
            val event = eventQueue.take()       // 从事件队列中取出一个事件
            try {
                // 接收这个事件,处理这个事件
                // 这个方法在当前的类中是一个抽象方法
                // 跳转到dagSchedulereventprocessloop中实现
                onReceive(event)
            } catch {
                case NonFatal(e) =>
                try {
                    onError(e)
                } catch {
                    case NonFatal(e) => logError("Unexpected error in " + name, e)
                }
            }
        }
    } catch {
        case ie: InterruptedException => // exit even if eventQueue is not empty
        case NonFatal(e) => logError("Unexpected error in " + name, e)
    }
}

DAGSchedulerEventProcessLoop

override def onReceive(event: DAGSchedulerEvent): Unit = {
    val timerContext = timer.time()
    try {
        doOnReceive(event)
    } finally {
        timerContext.stop()
    }
}
private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
    // 处理JobSubmitted事件
    case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
        dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
}

handleJobSubmitted

// 这个方法,是DAG的精髓所在
// Stage的划分和Task的划分都在这里
private[scheduler] def handleJobSubmitted(jobId: Int,
      finalRDD: RDD[_],
      func: (TaskContext, Iterator[_]) => _,
      partitions: Array[Int],
      callSite: CallSite,
      listener: JobListener,
      properties: Properties) {
    var finalStage: ResultStage = null
    try {
      // 拆分ResultStage,一个作业的最后阶段,就是一个ResuleStage
      finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
    } catch {
      case e: Exception =>
        logWarning("Creating new stage failed due to exception - job: " + jobId, e)
        listener.jobFailed(e)
        return
    }
    // ...
    // 将拆分好的Stage提交
    submitStage(finalStage)
  }
private def createResultStage(
    rdd: RDD[_],
    func: (TaskContext, Iterator[_]) => _,
    partitions: Array[Int],
    jobId: Int,
    callSite: CallSite): ResultStage = {
    // 找到ResultStage的父Stage
    val parents = getOrCreateParentStages(rdd, jobId)
    val id = nextStageId.getAndIncrement()
    val stage = new ResultStage(id, rdd, func, partitions, parents, jobId, callSite)
    stageIdToStage(id) = stage
    updateJobIdStageIdMaps(jobId, stage)
    stage
}
​
private def getOrCreateParentStages(rdd: RDD[_], firstJobId: Int): List[Stage] = {
    getShuffleDependencies(rdd).map { shuffleDep =>
        getOrCreateShuffleMapStage(shuffleDep, firstJobId)
    }.toList
}
​
// 通过宽窄依赖划分Stage,从后往前查找,递归调用自己,直到找不到父Stage为止
private def getOrCreateShuffleMapStage(
    shuffleDep: ShuffleDependency[_, _, _],
    firstJobId: Int): ShuffleMapStage = {
    shuffleIdToMapStage.get(shuffleDep.shuffleId) match {
        case Some(stage) =>
            stage
​
        case None =>
            // Create stages for all missing ancestor shuffle dependencies.
            getMissingAncestorShuffleDependencies(shuffleDep.rdd).foreach { dep =>
                // Even though getMissingAncestorShuffleDependencies only returns shuffle dependencies
                // that were not already in shuffleIdToMapStage, it's possible that by the time we
                // get to a particular dependency in the foreach loop, it's been added to
                // shuffleIdToMapStage by the stage creation process for an earlier dependency. See
                // SPARK-13902 for more information.
                if (!shuffleIdToMapStage.contains(dep.shuffleId)) {
                    createShuffleMapStage(dep, firstJobId)
                }
            }
            // Finally, create a stage for the given shuffle dependency.
            createShuffleMapStage(shuffleDep, firstJobId)
    }
}
private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
        logDebug("submitStage(" + stage + ")")
        if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
            // 找到没有父Stage的Stage
            val missing = getMissingParentStages(stage).sortBy(_.id)
            logDebug("missing: " + missing)
            if (missing.isEmpty) {
                logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
                // 提交任务
                submitMissingTasks(stage, jobId.get)
            } else {
                for (parent <- missing) {
                    submitStage(parent)
                }
                waitingStages += stage
            }
        }
    } else {
        abortStage(stage, "No active job for stage " + stage.id, None)
    }
}
// 用来构建Task和提交Task
private def submitMissingTasks(stage: Stage, jobId: Int) {
    // ...
    
    
    val tasks: Seq[Task[_]] = try {
        val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
        stage match {
            case stage: ShuffleMapStage =>
                stage.pendingPartitions.clear()
                partitionsToCompute.map { id =>
                    val locs = taskIdToLocations(id)
                    val part = stage.rdd.partitions(id)
                    stage.pendingPartitions += id
                    new ShuffleMapTask(stage.id, stage.latestInfo.attemptId,
                                       taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
                                       Option(sc.applicationId), sc.applicationAttemptId)
                }
​
            case stage: ResultStage =>
                partitionsToCompute.map { id =>
                    val p: Int = stage.partitions(id)
                    val part = stage.rdd.partitions(p)
                    val locs = taskIdToLocations(id)
                    new ResultTask(stage.id, stage.latestInfo.attemptId,
                                   taskBinary, part, locs, id, properties, serializedTaskMetrics,
                                   Option(jobId), Option(sc.applicationId), sc.applicationAttemptId)
                }
        }
    } catch {
        case NonFatal(e) =>
        abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e))
        runningStages -= stage
        return
    }
​
    if (tasks.size > 0) {
        logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " +
                s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
        // 将Task封装到TaskSet中,提交给TaskScheduler
        taskScheduler.submitTasks(new TaskSet(
            tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))
        stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
    } else {
        // Because we posted SparkListenerStageSubmitted earlier, we should mark
        // the stage as completed here in case there are no tasks to run
        markStageAsFinished(stage, None)
​
        val debugString = stage match {
            case stage: ShuffleMapStage =>
            s"Stage ${stage} is actually done; " +
            s"(available: ${stage.isAvailable}," +
            s"available outputs: ${stage.numAvailableOutputs}," +
            s"partitions: ${stage.numPartitions})"
            case stage : ResultStage =>
            s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})"
        }
        logDebug(debugString)
​
        submitWaitingChildStages(stage)
    }
}
override def submitTasks(taskSet: TaskSet) {
    backend.reviveOffers()
}
override def reviveOffers() {
    driverEndpoint.send(ReviveOffers)
}
// 此时DriverEndPoint的receive可以接收到消息
override def receive: PartialFunction[Any, Unit] = {
    case ReviveOffers =>
        makeOffers()
}
​
private def makeOffers() {
    // Make sure no executor is killed while some task is launching on it
    val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {
        // Filter out executors under killing
        val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
        val workOffers = activeExecutors.map { case (id, executorData) =>
            new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
        }.toIndexedSeq
        scheduler.resourceOffers(workOffers)
    }
    if (!taskDescs.isEmpty) {
        launchTasks(taskDescs)
    }
}
​
// Launch tasks returned by a set of resource offers
private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
    for (task <- tasks.flatten) {
        val serializedTask = TaskDescription.encode(task)
        if (serializedTask.limit >= maxRpcMessageSize) {
            scheduler.taskIdToTaskSetManager.get(task.taskId).foreach { taskSetMgr =>
                try {
                    var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
                    "spark.rpc.message.maxSize (%d bytes). Consider increasing " +
                    "spark.rpc.message.maxSize or using broadcast variables for large values."
                    msg = msg.format(task.taskId, task.index, serializedTask.limit, maxRpcMessageSize)
                    taskSetMgr.abort(msg)
                } catch {
                    case e: Exception => logError("Exception in error callback", e)
                }
            }
        }
        else {
            val executorData = executorDataMap(task.executorId)
            executorData.freeCores -= scheduler.CPUS_PER_TASK
​
            logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +
                     s"${executorData.executorHost}.")
            
            // 在Driver端,向Executor发送一个执行句柄LanchTask,包含了Task的信息
            executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
        }
    }
}

CoarseGrainedExecutorBackend

driver端向executor端发送了任务执行句柄,executor可以接收到对应的消息
override def receive: PartialFunction[Any, Unit] = {
    case LaunchTask(data) =>
        if (executor == null) {
            exitExecutor(1, "Received LaunchTask command but executor was null")
        } else {
            // 将Driver端发送过来的Task的信息解码
            // 获取Task所依赖的资源
            val taskDesc = TaskDescription.decode(data.value)
            logInfo("Got assigned task " + taskDesc.taskId)
            executor.launchTask(this, taskDesc)
        }
}
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
    val tr = new TaskRunner(context, taskDescription)
    runningTasks.put(taskDescription.taskId, tr)
    threadPool.execute(tr)
}
// 进入到TaskRunner的内部,在run方法的第334行
// 这个value就是Task的执行结果
val value = try {
    val res = task.run(
        taskAttemptId = taskId,
        attemptNumber = taskDescription.attemptNumber,
        metricsSystem = env.metricsSystem)
    threwException = false
    res
}
​
// line 407
// val serializedResult: ByteBuffer = 
// 将计算结果序列化,返回给Driver端
​
// line 429
// execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)

spark的作业提交

rdd.action()查看作业的提交流程,最终执行的是sc.runJob(),该runJob会在rdd的所有分区partition上面执行计算,并将结果返回,返回值是一个array
​
通过一系列的runJob的重载函数,最终将作业的云心转发给dagscheduler.runjob方法去执行
​
submitjob
​
将jobsubmitted提交到作业处理队列(EventLoop)中
​
该队列启动一个后台线程,不断从队列中获取相关事件,进行执行
​
在dagSchedulerEventProcessLoop的doOnReceive中完成
​
dagScheduler.handleJobSubmitted在该方法中进行阶段stage划分
​
createResultStage--核心阶段划分
1.spark中的stage有两类,resultState和shuflleMapstage
2.resultstage的直接创建new出来
3.上游的shuffleMapStage通过getOrCreateParentStages创建--parent
4.提交stage作业submitstage(finalstage)
​
递归的安装依赖,顺序提交satge阶段
if(missing.isEmpty){
    submitMissingTasks(stage,jobId.get)//第一个没有parent的stage
}else{
    for(parent<-missing){
        //一个阶段一个阶段的提交作业
        sumbmitStage(parent)
    }
}
在submitMissingTasks方法中提交task,如果是resultStage,构建resultTask,如果是shuffleMapStage构建shuffleMapTask,注意返回值是Seq[Task[_]],也就是说一个分区会生成一个task去处理
​
taskScheduler.submitTasks(new TaskSet(task.toArray,stage.id,stage.latestInfo.attemptId,jobId.properties))
1.作业有通过taskScheduler来提交,提交的是Dagscheduler将stage封装好的task(taskset)
2.现在将spark的task作业交给taskScheduler关联的backend--backend.reviveOffers()
​
loop-->通过coarseGrainedSchedulerBackend来完成提交真正的作业提交,通过driverEndPoint来提交,发送了一个ReviveOffers事件--->makeOffers()
核心代码
if(!taskDescs.isEmpty){
    launchTasks(taskDescs)
}
​
在driverEndpoint中向executor发送了一个launchtask的时间,由coarseGrainedExecutorBackend接收并处理--executorlaunchtask(this,taskDesc)
​
将task任务封装进了一个taskrunner线程,并将该线程丢到线程池汇总运行
​
1.第一个步将driver中接收到的续流化的task进行反序列化
2.调度task的run方法执行作业
3.task作业执行的返回中用value接收
​
作业的执行在runTask(context)中完成
因为不同类型的task,作业运行方式不同,shuffleMapTask和resultTask执行方式肯定不一样
​
接着以下两条线
shuffleMap task通过externalsorter完成数据读的过程,每次从上游拉取默认为48M的数据到缓冲区进行业务计算
通过shuffleManager创建shufflewriter将作业执行数据输出到磁盘,最终结果状态通过mapstatus来封装表示func(context,rdd,iterator(partition,context))
execbackendstatusupdate(taskId,Taskstate.FINISHEND,serializedResult)作业执行完毕之后做了task状态的的更新,携带者上一步执行的节骨oserializeResultcomputeOrReadCheckpoint
进一步将此作业结果封装成statusupdate的case calss发送给driver(driverEndPoint)<br />makeoffers(excutorId)执行下一步的作业shuffleRddcompute(split,context)
Loop-->shufflemanager.getReader读取上游的数据(sortshuffleManager)
blockStoreShuffleReader中的read方法读取上游数据

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值