Arthas入门教程

少年的白日梦

于 2025-05-14 14:19:33 发布

阅读量757

点赞数 22

CC 4.0 BY-SA版权

分类专栏：工具文章标签： java arthas

本文链接：https://blog.csdn.net/qq_38310244/article/details/147953160

工具专栏收录该内容

2 篇文章

订阅专栏

背景

线上诊断的工具，实时查看java应用的信息，可以在不修改代码的情况下，诊断线上应用问题，包括查看方法调用的出入参、异常，监测方法执行耗时，类加载信息等。

官网

https://arthas.aliyun.com/doc/

下载

https://arthas.aliyun.com/arthas-boot.jar

快速开始

启动

java -jar arthas-boot.jar

在这里插入图片描述
选择需要监控的程序，这里第三个是我自己的demo程序，输入3回车：

在这里插入图片描述

到此，arthas启动完成

停止

输入stop，回车

代码Demo

package gov.df.ys.demo.controller;

import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadFactory;


@Slf4j
@RestController
@RequestMapping("/arthas")
public class ArthasController {

    private List<String> nameList = new ArrayList<>();
    private static List<String> staticList = new ArrayList<>();

    @GetMapping("/hello")
    public String hello(String name){
        log.info(name);
        String s = testHello(name);
        log.info(s);
        testExc(s);
        getJson(name);
        testLongTime();
        return s;
    }

    private String testHello(String name){
        testHello();
        if ("cc".equals(name)){
            return "-1";
        }
        return "hello " + name;
    }

    private void testExc(String name){
        if ("-1".equals(name)){
            throw new RuntimeException("name is 'cc'");
        }
        nameList.add(name);
        staticList.add("static:"+name);
    }

    private JSONObject getJson(String name){
        for (int i = 0; i < 10; i++) {
            byte[] c = new byte[500* 1024]; //300k

            JSONObject jsonObject = new JSONObject();
            jsonObject.put("name",name);
            jsonObject.put("c",c);
        }
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("name",name);
        return jsonObject;
    }

    private void testLongTime(){
        try {
            log.info("模拟耗时...");
            Thread.sleep(1500L);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    private JSONObject testJson(String name){
        JSONObject jsonObject = new JSONObject();
        jsonObject.put("testJsonName",name);
        return jsonObject;
    }

    private void testHello(){
        log.info("二级方法");
    }




    @GetMapping("/fullCpu")
    public String fullCpu(){
        // 自定义ThreadFactory
        ThreadFactory namedThreadFactory = r -> new Thread(r, "FullCpuThread" );
        ExecutorService executorService = Executors.newSingleThreadExecutor(namedThreadFactory);
        executorService.execute(()->{
            while (true){
                log.info("线程执行");
            }
        });
        executorService.shutdown();
        return "启动fullCpu";
    }

    // -Xmx30m
    @GetMapping("/fullGc")
    public String fullGc(){
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.execute(()->{
            while (true){
                byte[] c = new byte[3* 1024* 1024];
                try {
                    Thread.sleep(2000); // 每次分配后暂停1秒
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        });
        log.info("启动fullGc");
        return "启动fullGc";
    }

    private static final Object lockA = new Object();
    private static final Object lockB = new Object();

    @GetMapping("/testLock")
    public String testLock() {
        Thread thread1 = new Thread(() -> {
            synchronized (lockA) {
                log.info("Thread 1: Holding lock A");
                try {
                    Thread.sleep(10); //休眠10ms，确保拿到lockA
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }

                log.info("Thread 1: Waiting for lock B");
                synchronized (lockB) {
                    log.info("Thread 1: Acquired lock B");
                }
            }
        });

        Thread thread2 = new Thread(() -> {
            synchronized (lockB) {
                log.info("Thread 2: Holding lock B");

                try {
                    Thread.sleep(10); // 休眠10ms，确保拿到lockB
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }

                log.info("Thread 2: Waiting for lock A");
                synchronized (lockA) {
                    log.info("Thread 2: Acquired lock A");
                }
            }
        });

        thread1.start();
        thread2.start();

        return "启动死锁";
    }
}

常用命令

万事不知就：help命令

dashboard

打开监控面板，ctrl+c可以中断执行

语法：

dashboard -i -n

例如：
dashboard -i 1000 -n 2 #间隔10秒，刷新2次后 ==> 退出
dashboard -i 1000	#每间隔10秒刷新一次

-i：指定刷新面板时间间隔
-n：指定刷新次数
什么都不加，默认每5秒刷新一次

在这里插入图片描述

ID：Java级别的线程ID，注意这个ID不能跟jstack中的nativeID一一对应
NAME：线程名
GROUP：线程组名
PRIORITY：线程优先级, 1~10之间的数字，越大表示优先级越高
STATE：线程的状态
CPU%：线程消耗的cpu占比，采样100ms，将所有线程在这100ms内的cpu使用量求和，再算出每个线程的cpu使用占比。
TIME：线程运行总时间，数据格式为分：秒
INTERRUPTED：线程当前的中断位状态
DAEMON：是否是daemon线程

thread

查看线程状态

语法：

参数名称	参数说明
id	线程 id
[n:]	指定最忙的前 N 个线程并打印堆栈
[b]	找出当前阻塞其他线程的线程
[i `<value>`]	指定 cpu 使用率统计的采样间隔，单位为毫秒，默认值为 200
[–all]	显示所有匹配的线程

thread #查看全部线程状态
thread -b	#查看阻塞的线程
thread --state #查看指定状态的线程
thread -state waiting	#查看等待状态的线程
thread 线程id		#查看线程的详细信息（排查问题很重要，会打印堆栈报错信息）
thread -i #指定采样时间间隔
thread -i 1000 #统计最近 1000ms 内的线程 CPU 时间
thread -i 1000 -n 3 # 列出 1000ms 内最忙的 3 个线程栈

在这里插入图片描述

monitor

监控方法的执行情况

语法：

monitor 类名 方法名 -c 次数

monitor 类名 * -c 次数	# 方法名为*，表示监控该类下所有的方法
    
例如：monitor gov.df.ys.demo.controller.ArthasController hello -c 5
    监控gov.df.ys.demo.controller.ArthasController的hello方法，每5秒更新一次

在这里插入图片描述

watch

监控方法返回值

语法：

watch 的参数比较多，主要是因为它能在 4 个不同的场景观察对象

参数名称	参数说明
class-pattern	类名表达式匹配
method-pattern	函数名表达式匹配
express	观察表达式，默认值：`{params, target, returnObj}`，可选值：`throwExp`
condition-express	条件表达式
[b]	在函数调用之前观察
[e]	在函数异常之后观察
[s]	在函数返回之后观察
[f]	在函数结束之后(正常返回和异常返回)观察
[E]	开启正则表达式匹配，默认为通配符匹配
[x:]	指定输出结果的属性遍历深度，默认为 1，最大值是 4（一般是2，2的话会打印出具体入参的值，1只会打印入参类型）

特别说明

watch 命令定义了 4 个观察事件点，即 -b 函数调用前，-e 函数异常后，-s 函数返回后，-f 函数结束后
4 个观察事件点 -b、-e、-s 默认关闭，-f 默认打开，当指定观察点被打开后，在相应事件点会对观察表达式进行求值并输出
这里要注意函数入参和函数出参的区别，有可能在中间被修改导致前后不一致，除了 -b 事件点 params 代表函数入参外，其余事件都代表函数出参
当使用 -b 时，由于观察事件点是在函数调用前，此时返回值或异常均不存在
在 watch 命令的结果里，会打印出location信息。location有三种可能值：AtEnter，AtExit，AtExceptionExit。对应函数入口，函数正常 return，函数抛出异常
-n控制打印次数，例如-n 2，则只会打印两次

常规语法

watch 类的全限定名 方法名 "{params,target,returnObj}" -x 2

例如：watch gov.df.ys.demo.controller.ArthasController hello "{params,target,returnObj}" -x 2

在这里插入图片描述

监控gov.df.ys.demo.controller.ArthasController类的hello方法的执行。

条件过滤

watch 类的全限定名 方法名 "{params,returnObj}" "条件" -x 2
    
例如：
# 监控入参为zhang的hello方法，
watch gov.df.ys.demo.controller.ArthasController hello "{params,returnObj}" "params[0].equals('zhang')" -x 2

其中params[0]代表hello方法的第一个入参，条件的匹配方式有：<、>、=、equals、contains等，这里表示监控hello方法第一个参数匹配上zhang则输出监控日志。

格式化输出

watch 类的全限定名 方法名 "{params,returnObj.toSting()}" "条件" -x 2
    
# 例如在getJson方法中，返回的是一个JSONObject，当我们监控这个方法时：
watch gov.df.ys.demo.controller.ArthasController getJson "{params,returnObj}" -x 2

在这里插入图片描述

返回的是一个对象，当使用：watch gov.df.ys.demo.controller.ArthasController getJson "{params,returnObj.toSting()}" -x 2 时：

在这里插入图片描述

toString()方法也可以放在params上

查看类成员变量

watch 类的全限定名 方法名 "{params,target.类的成员变量,returnObj}" "条件" -x 2
    
例如：查看ArthasController中的nameList

watch gov.df.ys.demo.controller.ArthasController getJson "{params,target.nameList,returnObj}"  -x 2

在这里插入图片描述

调用类方法

watch 类的全限定名 方法名 "{params,target.方法名,returnObj}" "条件" -x 2
   
例如：调用ArthasController的testJson方法
watch gov.df.ys.demo.controller.ArthasController getJson "{params,target.testJson(params[0]),returnObj}"  -x 2

# 这里调用了testJson方法，并把getJson方法的第一个入参作为testJson方法的入参

在这里插入图片描述

按照耗时过滤

watch 类的全限定名 方法名 "{params,target,returnObj}" "#cost>10" -x 2

# cost是方法执行的耗时，单位是ms
    
例如：
watch gov.df.ys.demo.controller.ArthasController * "{params,target,returnObj}"  "#cost>1000" -x 2
# 监控ArthasController类下所有的方法，打印耗时超过1000毫秒的方法

在这里插入图片描述

排除掉指定的类

如果需要监控某个包，但又需要排除一些不需要监控的类：

watch 包名 * --exclude-class-pattern 类名

例如：
watch gov.df.ys.demo.controller * --exclude-class-pattern gov.df.ys.demo.controller.ArthasController

不匹配子类

默认情况下 watch/trace/monitor/stack/tt 命令都会匹配子类。如果想不匹配，可以通过全局参数关掉。

options disable-sub-class true

trace

语法：

方法内部调用路径，并输出方法路径上的每个节点上耗时

参数名称	参数说明
class-pattern	类名表达式匹配
method-pattern	方法名表达式匹配
condition-express	条件表达式
[E]	开启正则表达式匹配，默认为通配符匹配
`[n:]`	命令执行次数，默认值为 100。
`#cost`	方法执行耗时
`[m <arg>]`	指定 Class 最大匹配数量，默认值为 50。长格式为`[maxMatch <arg>]`。

trace 类的全限定名 方法名

例如：监控trace gov.df.ys.demo.controller.ArthasController方法下所有的方法
trace gov.df.ys.demo.controller.ArthasController *

在这里插入图片描述

会以树形结构打印方法的调用链路，每个方法执行的耗时和耗时占总耗时的百分比

监控耗时大于100ms的方法：
    
trace gov.df.ys.demo.controller.ArthasController * '#cost > 100'

stack

输出当前方法被调用的调用路径

语法：

stack 类的全限定名 方法名

例如：
# 查看getJson的调用路径
stack gov.df.ys.demo.controller.ArthasController getJson

在这里插入图片描述

可能看起来有点像报错日志

tt

方法执行数据的时空隧道，记录下指定方法每次调用的入参和返回信息，并能对这些不同的时间下调用进行观测

语法：

tt -t 类的全限定名 方法名

例如： 监控ArthasController类下所有的方法：
tt -t gov.df.ys.demo.controller.ArthasController *

在这里插入图片描述

当需要查看某个方法时：

tt -i id

例如：
# 查看getJson方法，因为getJson方法的id是1021：
tt -i 1021

在这里插入图片描述

可以根据当前的参数重新调用方法

# 重新调用
tt -i id -p
#	再重新调用3次
tt -i id -p --replay-interval 3
#	再重新调用3次,并且间隔2S
tt -i id -p --replay-times 3 --replay-interval 2000

jad

反编译指定已加载类的源码

语法:

jad 类的全限定名 方法名

#反编译结果里会带有ClassLoader信息，通过--source-only选项，可以只打印源代码
jad --source-only 类的全限定名 方法名
    
#不显示行号 lineNumber 参数默认值为 true，显示指定为 false 则不打印行号
jad 类的全限定名 --lineNumber false

例如：
jad --source-only gov.df.ys.demo.controller.ArthasController #反编译查看ArthasController类
jad --source-only gov.df.ys.demo.controller.ArthasController hello #反编译查看ArthasController类的hello方法

ogln

执行 ognl 表达式，一般用来查看类的静态变量

如：查看ArthasController类的静态变量staticList:

ognl @gov.df.ys.demo.controller.ArthasController@staticList

在这里插入图片描述

memory

查看 JVM 内存信息

语法

memory

在这里插入图片描述

quit

退出当前 Arthas 客户端，其他 Arthas 客户端不受影响

当需要重新连接的时候：

java -jar arthas-boot.jar

stop

关闭 Arthas 服务端，所有 Arthas 客户端全部退出

打开多个终端

有时排查问题一个终端是不够，需要打开多个终端执行不同的命令，这时我们新建终端，然后执行：

telnet loaclhost 3658

就可以打开新的终端连上arthas了，但是新建终端监控的服务还是跟原来的一样

说明：

3658是arthas启动的端口号
新建终端只能跟原始的终端监控同一个服务

常见问题排查

排查方法入参

假如调用接口出现异常，需要排查方法的入参问题，这里以http://localhost:8081/admin/api/arthas/hello?name=cc为例，监控hello方法：

watch gov.df.ys.demo.controller.ArthasController hello "{params,returnObj,throwExp}" -x 2

在这里插入图片描述

可以看到入参已经被打印了出来，并且打印了报错日志，从报错日志中我们可以定位到报错的类名、方法名和行号，我们进一步排查，进行反编译查看代码：

jad --source-only gov.df.ys.demo.controller.ArthasController testExc

在这里插入图片描述

此时反编译后定位到第46行后发现，根据入参进行了判断，再进一步看看入参是什么：

watch gov.df.ys.demo.controller.ArthasController testExc "{params,throwExp}" -x 2

在这里插入图片描述

我们可以观察到testExc方法的入参hello cc，再根据之前的反编译查看代码的结果，就可以推断出是该方法的入参出了问题。

如果还需要继续追溯方法入参的来源是被哪个方法修改了，就只能翻代码查看具体逻辑了。

排查耗时方法

假如需要某个接口耗时很长，可以对该接口进行监控。这里以http://localhost:8081/admin/api/arthas/hello?name=aa为例，监控hello方法：

trace gov.df.ys.demo.controller.ArthasController hello

在这里插入图片描述

可以看到gov.df.ys.demo.controller.ArthasController:testLongTime()耗时非常高，我们可以进一步使用trace观察gov.df.ys.demo.controller.ArthasController:testLongTime()的耗时情况。

Full Gc

我们可能遇到系统卡顿甚至假死的情况，大部分情况都是由于频繁Full GC问题导致。出现Full GC主要是因为堆的内存不足，代码中又频繁创建新对象，需要频繁进行GC回收内存用于给新对象分配空间，GC回收后的内存不足以给新对象使用时就会出现OOM。

这里以fullGc接口为例：http://localhost:8081/admin/api/arthas/fullGc:

设置demo启动的vm参数：

在这里插入图片描述

-Xmx30m #指定JVM的最大堆内存大小位30M

启动后，调用http://localhost:8081/admin/api/arthas/fullGc，启动10个线程不断的在堆上创建对象。这时，再调用http://localhost:8081/admin/api/arthas/hello?name=aa接口的时候，会出现系统卡顿或者假死。

dashboard

在这里插入图片描述

可以看到GC的线程都靠前，Eden区和老年区的使用率都快接近100%，并且老年代的GC次数和时间越来越大。

避免频繁GC：

设置堆栈大小
- -Xms：指定JVM的初始堆内存大小，即堆内存的起始分配量。例如，-Xms512m表示初始堆内存大小为512MB。
- -Xmx：指定JVM的最大堆内存大小，即堆内存的最大分配量。例如，-Xmx2g表示最大堆内存大小为2GB
选择合适的垃圾收集器

CPU飙高

生产排查问题的时候，可能会使用top命令观察系统CPU的使用情况，当出现CPU飙高的时候，可以进一步借用Arthas来排查问题。

这里以fullCpu接口为例：http://localhost:8081/admin/api/arthas/fullCpu

启动服务后调用：http://localhost:8081/admin/api/arthas/fullCpu，打开线程监控面板：

thread

在这里插入图片描述

可以看到名为FullCpuThread的线程使用率相对其他很高。进一步查看这个线程：

thread 73

在这里插入图片描述

可以从输出发现使用CPU高的代码片段在99行，使用反编译查看代码：

jad --source-only gov.df.ys.demo.controller.ArthasController

在这里插入图片描述

定位到第99行，可以排查出问题的结果。

我们是通过线程名称来判断该线程就是代码中定义的线程名称，由此可见：在日常开发中使用线程时，为线程指定名称十分重要

线程死锁

有时会出现一些请求阻塞的问题，有可能是线程死锁导致的。这里以testLock为例：http://localhost:8081/admin/api/arthas/testLock

启动服务后，调用调用http://localhost:8081/admin/api/arthas/testLock，打开线程监控面板：

thread

在这里插入图片描述

可以在线程面板顶部看到，处于BLOCKED（阻塞）状态的线程有两个。进一步排查看是哪两个线程：

thread -b

在这里插入图片描述

可以看到线程id为60的和线程id为61的相互阻塞了，进一步查看线程问题：

thread 60

在这里插入图片描述

定位到了代码片段，继续反编译代码查看：

jad --source-only gov.df.ys.demo.controller.ArthasController

在这里插入图片描述

定位到140行，根据代码可以发现出现了死锁。

总结

排查问题时
- 先使用dashboard看一下全局的内存、GC、线程状态
- 再使用thread命令查看线程的状态是否正常
当需要排查某个接口的问题时
- 使用watch、trace、stack等命令对接口进行监控
压测时，主要就是使用monitor命令查看接口耗时了，根据耗时情况再进一步排查耗时的方法