-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Fix docker stats parsing with large amount of interrupts #49734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docker stats parsing with large amount of interrupts #49734
Conversation
vvoland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I left some comments.
Also, could you please add unit tests that test this function against the example file contents we provided?
daemon/stats_unix.go
Outdated
| rdr := bufio.NewReader(f) | ||
|
|
||
| for { | ||
| data, isPrefix, err := rdr.ReadLine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isPrefix is a bit weird name - looking at the code it isn't exactly obvious what it actually is.
Perhaps we could rename it to:
| data, isPrefix, err := rdr.ReadLine() | |
| data, tooLong, err := rdr.ReadLine() |
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's its name in https://pkg.go.dev/bufio#Reader.ReadLine
But, it'll be set for each partial read - then not-set on the final read of the long line. The data from that final read needs to be discarded too.
It'd be best to discard long lines before converting the bytes to a string, there's no need to check whether a line begins with "cpu" if the line's too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's its name in https://pkg.go.dev/bufio#Reader.ReadLine
But, it'll be set for each partial read - then not-set on the final read of the long line. The data from that final read needs to be discarded too.
It'd be best to discard long lines before converting the bytes to a string, there's no need to check whether a line begins with "cpu" if the line's too long.
Good catch! I've separated the condition into two and skipping any further processing if partial read is true. The only trouble happens a bit with the last read... Didn't know it works this way.
Maybe it is alright it's converted into string and then discarded in the other condition that checks for cpu string? Since it's only one line with small size. Otherwise I'd have to probably add some kind of helper variable or nested loop that'd add more complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm sorry - I completely misread it, and missed the assumption that cpu lines are first in the file. Thank you for moving the string(data), I think that's slightly better - but not the issue I first thought! So, I think it can just be:
// Assume all cpu* records are at the start of the file, like glibc:
// https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/getsysstats.c#L108-L135
if isPartial || len(data) < 4 {
break
}
line := string(data)
if line[:3] != "cpu" {
break
}(Please do squash the commits though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed!
I think I've messed up squashing the commits.
Yes, I'll try my best to create the unit tests... Never did it so it takes me a bit to figure out how to do them properly... :) |
84379a3 to
c294f39
Compare
I've added the unit test. Please let me know if any improvements are needed. I was also thinking if we should add more unit tests if the /proc/stat structure is different. But then I thought that may never happens i guess? |
robmry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you!
It wouldn't do any harm to test the error paths - but as it is, the change is already an improvement. |
c294f39 to
302bf7c
Compare
|
(Rebased to sort out the validate-vendor test.) |
|
Oh, for the Windows unit tests, |
vvoland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a couple of non-blocking suggestions 😄
Not really blockers though, so can be left for a follow up.
| // information. | ||
| func getSystemCPUUsage() (cpuUsage uint64, cpuNum uint32, _ error) { | ||
| f, err := os.Open("/proc/stat") | ||
| f, err := os.Open(procStatPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could split this into two functions:
func getSystemCPUUsage() (cpuUsage uint64, cpuNum uint32, _ error) {
f, err := os.Open("/proc/stat")
if err != nil {
return 0, 0, err
}
defer f.Close()
return readSystemCPUUsage(f)
}func readSystemCPUUsage(r io.Reader) (cpuUsage uint64, cpuNum uint32, _ error) {
rdr := bufio.NewReaderSize(r, 1024)
...
}| dummyFilePath := filepath.Join("testdata", "stat") | ||
| expectedCpuUsage := uint64(65647090000000) | ||
| expectedCpuNum := uint32(128) | ||
|
|
||
| origStatPath := procStatPath | ||
| procStatPath = dummyFilePath | ||
| defer func() { procStatPath = origStatPath }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way we can avoid replacing the file path and inline the testdata in the test:
| dummyFilePath := filepath.Join("testdata", "stat") | |
| expectedCpuUsage := uint64(65647090000000) | |
| expectedCpuNum := uint32(128) | |
| origStatPath := procStatPath | |
| procStatPath = dummyFilePath | |
| defer func() { procStatPath = origStatPath }() | |
| input := strings.NewReader(``` | |
| <stat content> | |
| ```) |
| _, err := os.Stat(dummyFilePath) | ||
| assert.NilError(t, err) | ||
|
|
||
| cpuUsage, cpuNum, err := getSystemCPUUsage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then test the function that allows passing the file content directly:
| cpuUsage, cpuNum, err := getSystemCPUUsage() | |
| cpuUsage, cpuNum, err := readSystemCPUUsage(input) |
| assert.Equal(t, cpuUsage, expectedCpuUsage) | ||
| assert.Equal(t, cpuNum, expectedCpuNum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can inline expectedCpuUsage and expectedCpuNum and use cmp assertions:
| assert.Equal(t, cpuUsage, expectedCpuUsage) | |
| assert.Equal(t, cpuNum, expectedCpuNum) | |
| assert.Check(t, is.Equal(cpuUsage, uint64(65647090000000))) | |
| assert.Check(t, is.Equal(cpuNum, uint32(128))) |
fa20c22 to
5c0c8ce
Compare
This fix address issues where the scanner was unable to properly parse longer outputs from /proc/stat. This could happen on an ARM machine with large amount of CPU cores (and interrupts). By switching to reader we have more control over data parsing and dump unnecessary data Signed-off-by: Patrik Leifert <[email protected]>
5c0c8ce to
e22d04e
Compare
@robmry @vvoland fixed. I think I've messed up your previous rebase, so you might need to do it again... I'm still fighting with it a bit sometimes 🙈 |
|
We can ignore the failed unit tests - |
vvoland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Fixes #49709
- What I did
I've fixed a bug where
docker statswas unable to collect stats on a machine with large amount of CPU cores and interrupts.- How I did
I've changed scanner for reader to allow larger lines parsing (default 64kB scanner buffer) from /proc/stat. Scanner has of course option to increase the internal buffer size, but we wanted to avoid increasing the buffer and wasting memory just for a lines that are discarded anyway.
Another issue was if scanner threw an error, there's no way how to read at least the part that fits into the buffer, so it wasn't possisble to determine where the error exactly happened.
ReadLine function has a flag isPrefix that is set to true if the line is longer and doesn't fit into internal buffer. Based on that we can skip further processing.
- How to verify it
On a machine with large amount of interrupts (where the line exceeds 64kB in /proc/stat) run command
docker stats.- A picture of a cute animal (not mandatory but encouraged)