0% found this document useful (0 votes)
107 views

Linux Performance 2018: Brendan Gregg

This document provides a 3-paragraph summary of the Linux performance document: The document discusses various performance improvements and issues related to Linux. It describes the introduction of KPTI patches to address Meltdown vulnerabilities and the impact on performance, with one server showing a 27% reduction in MySQL query performance after applying the patches. It also analyzes the impact of KPTI on translation lookaside buffer (TLB) misses, with one server seeing a 16% increase in TLB misses after applying KPTI. Finally, it discusses the enhanced Berkeley Packet Filter (eBPF) technology and how it is being used for tasks like off-CPU analysis, intrusion detection, and identifying disk I/O latency outliers

Uploaded by

tejas kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Linux Performance 2018: Brendan Gregg

This document provides a 3-paragraph summary of the Linux performance document: The document discusses various performance improvements and issues related to Linux. It describes the introduction of KPTI patches to address Meltdown vulnerabilities and the impact on performance, with one server showing a 27% reduction in MySQL query performance after applying the patches. It also analyzes the impact of KPTI on translation lookaside buffer (TLB) misses, with one server seeing a 16% increase in TLB misses after applying KPTI. Finally, it discusses the enhanced Berkeley Packet Filter (eBPF) technology and how it is being used for tasks like off-CPU analysis, intrusion detection, and identifying disk I/O latency outliers

Uploaded by

tejas kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Linux Performance

2018

Brendan Gregg
Senior Performance Architect

Oct 2018
http://neuling.org/linux-next-size.html
Post frequency:

4 per year https://kernelnewbies.org/Linux_4.18

4 per week https://lwn.net/Kernel/

400 per day LKML http://vger.kernel.org/vger-lists.html


#linux-kernel
https://meltdownattack.com/
KPTI Linux 4.15
& backports
Cloud Hypervisor
(patches)

Linux Kernel CPU


(KPTI) (microcode)

Application
(retpolne)
Server A: 31353 MySQL queries/sec
serverA# mpstat 1
Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU)
01:09:13 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:09:14 AM all 86.89 0.00 13.08 0.00 0.00 0.00 0.00 0.00 0.00 0.03
01:09:15 AM all 86.77 0.00 13.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:16 AM all 86.93 0.00 13.02 0.00 0.00 0.00 0.03 0.00 0.00 0.03
[...]

Server B: 22795 queries/sec (27% slower)


serverB# mpstat 1
Linux 4.14.12-virtual (bgregg-c5.9xl-i-xxx) 02/09/2018 _x86_64_ (36 CPU)
01:09:44 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
01:09:45 AM all 82.94 0.00 17.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:46 AM all 82.78 0.00 17.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:09:47 AM all 83.14 0.00 16.86 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[...]
Linux KPTI patches for Meltdown flush the Translation
Lookaside Buffer

Virtual Physical
Address Address
CPU MMU Main
Memory
hit miss
(walk) Page
TLB Table
Server A: TLB miss walks 3.5%
serverA# ./tlbstat 1
K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB%
95913667 99982399 1.04 86588626 115441706 1507279 1837217 1.57 1.92
95810170 99951362 1.04 86281319 115306404 1507472 1842313 1.57 1.92
95844079 100066236 1.04 86564448 115555259 1511158 1845661 1.58 1.93
95978588 100029077 1.04 86187531 115292395 1508524 1845525 1.57 1.92
[...]

Server B: TLB miss walks 19.2% (16% higher)


serverB# ./tlbstat 1
K_CYCLES K_INSTR IPC DTLB_WALKS ITLB_WALKS K_DTLBCYC K_ITLBCYC DTLB% ITLB%
95911236 80317867 0.84 911337888 719553692 10476524 7858141 10.92 8.19
95927861 80503355 0.84 913726197 721751988 10518488 7918261 10.96 8.25
95955825 80533254 0.84 912994135 721492911 10524675 7929216 10.97 8.26
96067221 80443770 0.84 912009660 720027006 10501926 7911546 10.93 8.24
[...]
http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html
Enhanced BPF Linux 4.*

also known as just "BPF"

User-Defined BPF Programs Kernel

SDN Configuration
Runtime Event Targets
DDoS Mitigation
verifier sockets
Intrusion Detection
kprobes
Container Security
BPF uprobes
Observability tracepoints
BPF
Firewalls (bpfilter) perf_events
actions
Device Drivers

eBPF is solving new things: off-CPU + wakeup analysis
eBPF bcc Linux 4.4+

https://github.com/iovisor/bcc
e.g., identify multimodal disk I/O latency and outliers
with bcc/eBPF biolatency
# biolatency -mT 10
Tracing block device I/O... Hit Ctrl-C to end.

19:19:04
msecs : count distribution
0 -> 1 : 238 |********* |
2 -> 3 : 424 |***************** |
4 -> 7 : 834 |********************************* |
8 -> 15 : 506 |******************** |
16 -> 31 : 986 |****************************************|
32 -> 63 : 97 |*** |
64 -> 127 : 7 | |
128 -> 255 : 27 |* |

19:19:14
msecs : count distribution
0 -> 1 : 427 |******************* |
2 -> 3 : 424 |****************** |
[…]
bcc/eBPF programs are laborious: biolatency
# define BPF program if args.disks:
bpf_text = """ bpf_text = bpf_text.replace('STORAGE',
#include <uapi/linux/ptrace.h> 'BPF_HISTOGRAM(dist, disk_key_t);')
#include <linux/blkdev.h> bpf_text = bpf_text.replace('STORE',
'disk_key_t key = {.slot = bpf_log2l(delta)}; ' +
typedef struct disk_key { 'void *__tmp = (void *)req->rq_disk->disk_name; ' +
char disk[DISK_NAME_LEN]; 'bpf_probe_read(&key.disk, sizeof(key.disk), __tmp); ' +
u64 slot; 'dist.increment(key);')
} disk_key_t; else:
BPF_HASH(start, struct request *); bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
STORAGE bpf_text = bpf_text.replace('STORE',
'dist.increment(bpf_log2l(delta));')
// time block I/O if debug or args.ebpf:
int trace_req_start(struct pt_regs *ctx, struct request *req) print(bpf_text)
{ if args.ebpf:
u64 ts = bpf_ktime_get_ns(); exit()
start.update(&req, &ts);
return 0; # load BPF program
} b = BPF(text=bpf_text)
if args.queued:
// output b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start")
int trace_req_completion(struct pt_regs *ctx, struct request *req) else:
{ b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start")
u64 *tsp, delta; b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start")
b.attach_kprobe(event="blk_account_io_completion",
// fetch timestamp and calculate delta fn_name="trace_req_completion")
tsp = start.lookup(&req);
if (tsp == 0) { print("Tracing block device I/O... Hit Ctrl-C to end.")
return 0; // missed issue
} # output
delta = bpf_ktime_get_ns() - *tsp; exiting = 0 if args.interval else 1
FACTOR dist = b.get_table("dist")
while (1):
// store as histogram try:
STORE sleep(int(args.interval))
except KeyboardInterrupt:
start.delete(&req); exiting = 1
return 0;
} print()
""" if args.timestamp:
print("%-8s\n" % strftime("%H:%M:%S"), end="")
# code substitutions
if args.milliseconds: dist.print_log2_hist(label, "disk")
bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;') dist.clear()
label = "msecs"
else: countdown -= 1
bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;') if exiting or countdown == 0:
label = "usecs" exit()
… rewritten in bpftrace (launched Oct 2018)!
#!/usr/local/bin/bpftrace

BEGIN
{
printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
}

kprobe:blk_account_io_start
{
@start[arg0] = nsecs;
}

kprobe:blk_account_io_completion
/@start[arg0]/

{
@usecs = hist((nsecs - @start[arg0]) / 1000);
delete(@start[arg0]);
}
eBPF bpftrace (aka BPFtrace) Linux 4.9+

# Syscall count by program


bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Read size distribution by process:


bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'

# Files opened by process


bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm,
str(args->filename)); }'

# Trace kernel function


bpftrace -e 'kprobe:do_nanosleep { printf(“sleep by %s”, comm); }'

# Trace user-level function


Bpftrace -e 'uretprobe:/bin/bash:readline { printf(“%s\n”, str(retval)); }’


Good for one-liners & short scripts; bcc is good for complex tools

https://github.com/iovisor/bpftrace
bpftrace Internals
eBPF XDP Linux 4.8+

https://www.netronome.com/blog/frnog-30-faster-networking-la-francaise/
eBPF bpfilter Linux 4.18+

ipfwadm (1.2.1)
ipchains (2.2.10)
iptables
nftables (3.13)
jit-compiled
bpfilter (4.18+) NIC offloading

https://lwn.net/Articles/747551/
Linux 4.9
BBR
TCP congestion control algorithm
Bottleneck Bandwidth and RTT
1% packet loss: we see 3x better throughput

https://twitter.com/amernetflix/status/892787364598132736
https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/ https://queue.acm.org/detail.cfm?id=3022184
Linux 4.12
Kyber
Multiqueue block I/O scheduler
Tune target read & write latency
Up to 300x lower 99th latencies in our testing

reads (sync) dispatch

writes (async) dispatch

completions
Kyber (simplified) queue size adjust

https://lwn.net/Articles/720675/
Linux 4.17
Hist Triggers
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
# trigger info:
hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048
[active]
[…]
{ stacktrace:
__kmalloc+0x11b/0x1b0
ftrace
seq_buf_alloc+0x1b/0x50
seq_read+0x2cc/0x370
advanced
proc_reg_read+0x3d/0x80 summaries
__vfs_read+0x28/0xe0
vfs_read+0x86/0x140
SyS_read+0x46/0xb0
system_call_fastpath+0x12/0x6a
} hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768

https://www.kernel.org/doc/html/latest/trace/histogram.html
Linux 4.?
PSI not merged yet

Pressure Stall Information


More saturation metrics!
The USE Method
/proc/pressure/cpu Saturation
/proc/pressure/memory Resource
/proc/pressure/io Errors Utilization
10-, 60-, and 300-second averages
X (%)

https://lwn.net/Articles/759781/
More perf 4.4 - 4.19 (2016 - 2018)

TCP listener lockless (4.4) ●
perf_event_open() [ku]probes (4.17)

copy_file_range() (4.5) ●
AF_XDP sockets (4.18)

madvise() MADV_FREE (4.5) ●
Block I/O latency controller (4.19)

epoll multithread scalability (4.5) ●
CAKE for bufferbloat (4.19)

Kernel Connection Multiplexor (4.6) ●
New async I/O polling (4.19)

Writeback management (4.10)
… and many minor improvements to:

Hybrid block polling (4.10)
• perf

BFQ I/O scheduler (4.12)
• CPU scheduling

Async I/O improvements (4.13)
• futexes

In-kernel TLS acceleration (4.13)
• NUMA

Socket MSG_ZEROCOPY (4.14)
• Huge pages

Asynchronous buffered I/O (4.14)
• Slab allocation

Longer-lived TLB entries with PCID (4.14)
• TCP, UDP

mmap MAP_SYNC (4.15)
• Drivers

Software-interrupt context hrtimers (4.16)
• Processor support

Idle loop tick efficiency (4.17)
• GPUs
Take Aways
1. Run latest
2. Browse major features
eg, https://kernelnewbies.org/Linux_4.19
Some Linux perf Resources
- http://www.brendangregg.com/linuxperf.html
- https://kernelnewbies.org/LinuxChanges
- https://lwn.net/Kernel
- https://github.com/iovisor/bcc
- http://blog.stgolabs.net/search/label/linux
- http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html

You might also like