Spark任务提交流程(整理版)

Spark的任务, 生产环境中一般提交到Yarn上执行. 具体流程如下图所示

1、client提交任务到RM.
2、RM启动AM.
3、AM启动Driver线程, 并向RM申请资源.
4、RM返回可用资源列表.
5、AM通过nmClient启动Container, 并且启动ExecutorBackend后台进程.
6、Executor反向注册给Driver
7、Executor启动任务

​​在这里插入图片描述
我们通过截取部分源码来展示上诉7个过程.

1、client提交任务到RM.

从spark-submit.sh中找到第一个类, 找到main函数入口
①main

  //所在类org.apache.spark.deploy.SparkSubmit
  override def main(args: Array[String]): Unit = {
   
   
    val appArgs = new SparkSubmitArguments(args)
    // appArgs.action初始化 
    // action = Option(action).getOrElse(SUBMIT)
    appArgs.action match {
   
   
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    }
  }

②submit(appArgs, uninitLog)

private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
   
   
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
    def doRunMain(): Unit = {
   
   
      if (args.proxyUser != null) {
   
   
      } else {
   
   
        runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
      }
    }
    if (args.isStandaloneCluster && args.useRest) {
   
   
} else {
   
   
      doRunMain()
    }
  }

③runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
yarn模式下childMainClass就是类"org.apache.spark.deploy.yarn.YarnClusterApplication"

private def runMain(
      childArgs: Seq[String],
      childClasspath: Seq[String],
      sparkConf: SparkConf,
      childMainClass: String,
      verbose: Boolean): Unit = {
   
   
    var mainClass: Class[_] = null
    
    try {
   
   
      mainClass = Utils.classForName(childMainClass)
    } 

    val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
   
   
      mainClass.newInstance().asInstanceOf[SparkApplication]
    } 

    try {
   
   
      app.start(childArgs.toArray, sparkConf)
    } 
  }

④app.start(childArgs.toArray, sparkConf)
app是通过反射, 夹在类"org.apache.spark.deploy.yarn.YarnClusterApplication"

private[spark] class YarnClusterApplication extends SparkApplication {
   
   

  override def start(args: Array[String], conf: SparkConf): Unit = {
   
   

    new Client(new ClientArguments(args), conf).run()
  }

}

⑤new Client(new ClientArguments(args), conf).run()
这里有个点, 就是初始化Client, 我们可以看到

  def run(): Unit = {
   
   
    this.appId = submitApplication()
  }

⑥this.appId = submitApplication()
yarnClient向Yarn提交任务, 可以看到提交的其实是command命令, RM会找一个NM解析命令并启动, 这就到了我们的下一步

  def submitApplication(): ApplicationId = {
   
   
    var appId: ApplicationId = null
    try {
   
   
      launcherBackend.connect()

	  //YarnClient yarnClient= new YarnClientImpl();
      yarnClient.init(hadoopConf)
      yarnClient.start()

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      // Set up the appropriate contexts to launch our AM
      // createContainerLaunchContext()方法封装了一个command命令, 集群模式下启动"org.apache.spark.deploy.yarn.ApplicationMaster"类
      val containerContext 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值