DIY CPU profiler: the simplest case of symbolization
TweetThu 23 March 2023
In the BPF maps to pprof post we managed to collect CPU samples and store them in pprof format. Here I would like to explore the simplest case of symbolization and put shared libraries aside.
Symbolization is resolving sampled memory addresses to function names (symbols).
In our case, simply searching for an address in .symtab
ELF section.
What's in the profile?
Let's profile a C program that ensures we'll get lots of CPU samples.
It computes Fibonacci numbers naively (without caching),
so fibNaive
function should dominate the profile.
long fibNaive(long n) {
if (n <= 2) {
return 1;
}
return fibNaive(n-2) + fibNaive(n-1);
}
#include <stdio.h>
// fibNaive calculates a Fibonacci number n,
// e.g., when n=7, the result is 13: 1 1 2 3 5 8 13.
//
// The running time complexity is exponential 2**n,
// the space complexity is n because
// when we descend the binary tree recursively,
// we use number of stacks equal to its height which is n
// (the stack frames n=7->5->3->1).
long fibNaive(long n) {
if (n <= 2) {
return 1;
}
return fibNaive(n-2) + fibNaive(n-1);
}
int main() {
// Calculating 50th Fibonacci number will keep CPU busy
// and give us time to run a CPU profiler.
long n = 50;
long res = fibNaive(n);
printf("Fibonacci number %li: %li\n", n, res);
return 0;
}
﹩ # For reference, I used gcc 11.3.0 and Ubuntu 22.04.
﹩ gcc -Og -fno-pie -no-pie -fcf-protection=none -o fib main.c
﹩ ./fib &
﹩ sudo go run ./cmd/profiler3 -pid=$(pidof fib)
/proc/14548/maps
start=0x401000 limit=0x402000 offset=0x1000 /vagrant/diy-parca-agent/fib
start=0x7f3cd28ac000 limit=0x7f3cd2a41000 offset=0x28000 /usr/lib/x86_64-linux-gnu/libc.so.6
start=0x7f3cd2ab6000 limit=0x7f3cd2ae0000 offset=0x2000 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
start=0x7ffc69d62000 limit=0x7ffc69d64000 offset=0x0 [vdso]
start=0xffffffffff600000 limit=0xffffffffff601000 offset=0x0 [vsyscall]
Waiting for stack traces for 10s...
{PID:14548 UserStackID:535 KernelStackID:-14} seen 138 times
535 [401126]
...
After ten seconds the profiler will terminate leaving us with cpu.pprof
file.
Let's inspect it with
protoc.
﹩ git clone https://github.com/google/pprof.git
﹩ sudo snap install protobuf --classic
﹩ zcat cpu.pprof | \
protoc --decode perftools.profiles.Profile ./pprof/proto/profile.proto
...
sample {
location_id: 10
value: 138
}
...
mapping {
id: 1
memory_start: 4198400
memory_limit: 4202496
file_offset: 4096
filename: 3
}
...
location {
id: 10
mapping_id: 1
address: 4198694
}
...
string_table: ""
string_table: "samples"
string_table: "count"
string_table: "/vagrant/diy-parca-agent/fib"
string_table: "/usr/lib/x86_64-linux-gnu/libc.so.6"
string_table: "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2"
string_table: "[vdso]"
string_table: "[vsyscall]"
sample_type {
type: 1
unit: 2
}
sample {
location_id: 1
value: 18
}
sample {
location_id: 2
value: 27
}
sample {
location_id: 3
value: 7
}
sample {
location_id: 4
value: 286
}
sample {
location_id: 5
value: 1
}
sample {
location_id: 6
value: 111
}
sample {
location_id: 7
value: 7
}
sample {
location_id: 8
value: 79
}
sample {
location_id: 5
value: 2
}
sample {
location_id: 9
value: 29
}
sample {
location_id: 10
value: 138
}
sample {
location_id: 11
value: 1
}
sample {
location_id: 11
value: 36
}
sample {
location_id: 12
value: 34
}
sample {
location_id: 10
value: 2
}
sample {
location_id: 13
value: 1
}
sample {
location_id: 14
value: 22
}
sample {
location_id: 12
value: 1
}
sample {
location_id: 15
value: 37
}
sample {
location_id: 10
value: 1
}
sample {
location_id: 5
value: 105
}
sample {
location_id: 16
value: 54
}
mapping {
id: 1
memory_start: 4198400
memory_limit: 4202496
file_offset: 4096
filename: 3
}
mapping {
id: 2
memory_start: 139899207073792
memory_limit: 139899208732672
file_offset: 163840
filename: 4
}
mapping {
id: 3
memory_start: 139899209211904
memory_limit: 139899209383936
file_offset: 8192
filename: 5
}
mapping {
id: 4
memory_start: 140722084126720
memory_limit: 140722084134912
filename: 6
}
mapping {
id: 5
memory_start: 18446744073699065856
memory_limit: 18446744073699069952
filename: 7
}
location {
id: 1
mapping_id: 1
address: 4198715
}
location {
id: 2
mapping_id: 1
address: 4198705
}
location {
id: 3
mapping_id: 1
address: 4198700
}
location {
id: 4
mapping_id: 1
address: 4198724
}
location {
id: 5
mapping_id: 1
address: 4198706
}
location {
id: 6
mapping_id: 1
address: 4198708
}
location {
id: 7
mapping_id: 1
address: 4198727
}
location {
id: 8
mapping_id: 1
address: 4198736
}
location {
id: 9
mapping_id: 1
address: 4198745
}
location {
id: 10
mapping_id: 1
address: 4198694
}
location {
id: 11
mapping_id: 1
address: 4198712
}
location {
id: 12
mapping_id: 1
address: 4198707
}
location {
id: 13
mapping_id: 1
address: 4198739
}
location {
id: 14
mapping_id: 1
address: 4198743
}
location {
id: 15
mapping_id: 1
address: 4198744
}
location {
id: 16
mapping_id: 1
address: 4198719
}
string_table: ""
string_table: "samples"
string_table: "count"
string_table: "/vagrant/diy-parca-agent/fib"
string_table: "/usr/lib/x86_64-linux-gnu/libc.so.6"
string_table: "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2"
string_table: "[vdso]"
string_table: "[vsyscall]"
string_table: "cpu"
string_table: "nanoseconds"
time_nanos: 1679419143530427477
duration_nanos: 10000000000
period_type {
type: 8
unit: 9
}
period: 10000000
The example above shows a pprof sample that tells us
that some function (its memory address: 4198694
) has been seen 138 times in stack traces.
That function is located in the mapping id 1 according to location id 10,
somewhere between memory_start: 4198400
and memory_limit: 4202496
addresses.
The mapping points to file_offset: 4096
within some file whose path is defined by filename: 3
.
I assume it's the third value in the pprof string table counting from zero:
string_table: "/vagrant/diy-parca-agent/fib"
.
What does it all mean?
If we recall from Linux process post,
Linux organizes the virtual memory of a process as a collection of areas (segments)
where an area is a chunk of related virtual pages.
An area can be associated (mapped) with a contiguous chunk of a file,
in this case it is fib
executable object file starting at file offset 4096.
Let's look at the file's segments and find the one mentioned in the sample.
﹩ readelf --segments --wide fib
Elf file type is EXEC (Executable file)
Entry point 0x401040
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0002d8 0x0002d8 R 0x8
INTERP 0x000318 0x0000000000400318 0x0000000000400318 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0004f0 0x0004f0 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x00019d 0x00019d R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000104 0x000104 R 0x1000
LOAD 0x002e10 0x0000000000403e10 0x0000000000403e10 0x000220 0x000228 RW 0x1000
DYNAMIC 0x002e20 0x0000000000403e20 0x0000000000403e20 0x0001d0 0x0001d0 RW 0x8
NOTE 0x000338 0x0000000000400338 0x0000000000400338 0x000020 0x000020 R 0x8
NOTE 0x000358 0x0000000000400358 0x0000000000400358 0x000044 0x000044 R 0x4
GNU_PROPERTY 0x000338 0x0000000000400338 0x0000000000400338 0x000020 0x000020 R 0x8
GNU_EH_FRAME 0x002020 0x0000000000402020 0x0000000000402020 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002e10 0x0000000000403e10 0x0000000000403e10 0x0001f0 0x0001f0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .dynamic .got
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
...
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x00019d 0x00019d R E 0x1000
...
Section to Segment mapping:
Segment Sections...
...
03 .init .plt .text .fini
...
As we can see the ELF program header describes
Flg=R E
readable and executable segment
loaded from Offset=0x001000
(that's hex of file_offset: 4096
) within the file
into memory starting at 0x0000000000401000
virtual address (that's hex of memory_start: 4198400
).
According to FileSiz=0x00019d
only 413 bytes are loaded from the file.
Given the alignment requirement Align=0x1000
,
the memory area ends at 0x0000000000402000
virtual address (that's hex of memory_limit: 4202496
).
So the area consists of a single page, i.e., 4096 bytes = memory_limit - memory_start.
Here is how a process's virtual address space looks when the executable file is mapped.
Executable object file Virtual address space
_______ _______
| | 0x3def | | N
| | | |
| ... | 0x2000 | ... | 0x402000
|-------| -----------> vm_end |-------|
| .fini | 0x1190 | .fini | 0x401190
| .text | 0x1040 | .text | 0x401040
| .plt | 0x1020 | .plt | 0x401020
| .init | 0x1000 | .init | 0x401000
|-------| ---------> vm_start |-------|
| ... | | ... |
|_______| 0 |_______| 0
We can also see which ELF sections are placed within the segment: .init .plt .text .fini
.
Our function in question should be somewhere in .text
section,
because it's where machine code of the compiled program lives.
﹩ readelf --sections --wide fib
There are 30 section headers, starting at offset 0x3670:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 0000000000400318 000318 00001c 00 A 0 0 1
[ 2] .note.gnu.property NOTE 0000000000400338 000338 000020 00 A 0 0 8
[ 3] .note.gnu.build-id NOTE 0000000000400358 000358 000024 00 A 0 0 4
[ 4] .note.ABI-tag NOTE 000000000040037c 00037c 000020 00 A 0 0 4
[ 5] .gnu.hash GNU_HASH 00000000004003a0 0003a0 00001c 00 A 6 0 8
[ 6] .dynsym DYNSYM 00000000004003c0 0003c0 000060 18 A 7 1 8
[ 7] .dynstr STRTAB 0000000000400420 000420 000050 00 A 0 0 1
[ 8] .gnu.version VERSYM 0000000000400470 000470 000008 02 A 6 0 2
[ 9] .gnu.version_r VERNEED 0000000000400478 000478 000030 00 A 7 1 8
[10] .rela.dyn RELA 00000000004004a8 0004a8 000030 18 A 6 0 8
[11] .rela.plt RELA 00000000004004d8 0004d8 000018 18 AI 6 23 8
[12] .init PROGBITS 0000000000401000 001000 00001b 00 AX 0 0 4
[13] .plt PROGBITS 0000000000401020 001020 000020 10 AX 0 0 16
[14] .text PROGBITS 0000000000401040 001040 00014e 00 AX 0 0 16
[15] .fini PROGBITS 0000000000401190 001190 00000d 00 AX 0 0 4
[16] .rodata PROGBITS 0000000000402000 002000 00001f 00 A 0 0 4
[17] .eh_frame_hdr PROGBITS 0000000000402020 002020 000034 00 A 0 0 4
[18] .eh_frame PROGBITS 0000000000402058 002058 0000ac 00 A 0 0 8
[19] .init_array INIT_ARRAY 0000000000403e10 002e10 000008 08 WA 0 0 8
[20] .fini_array FINI_ARRAY 0000000000403e18 002e18 000008 08 WA 0 0 8
[21] .dynamic DYNAMIC 0000000000403e20 002e20 0001d0 10 WA 7 0 8
[22] .got PROGBITS 0000000000403ff0 002ff0 000010 08 WA 0 0 8
[23] .got.plt PROGBITS 0000000000404000 003000 000020 08 WA 0 0 8
[24] .data PROGBITS 0000000000404020 003020 000010 00 WA 0 0 8
[25] .bss NOBITS 0000000000404030 003030 000008 00 WA 0 0 1
[26] .comment PROGBITS 0000000000000000 003030 00002b 01 MS 0 0 1
[27] .symtab SYMTAB 0000000000000000 003060 000348 18 28 18 8
[28] .strtab STRTAB 0000000000000000 0033a8 0001b1 00 0 0 1
[29] .shstrtab STRTAB 0000000000000000 003559 000116 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
...
[12] .init PROGBITS 0000000000401000 001000 00001b 00 AX 0 0 4
[13] .plt PROGBITS 0000000000401020 001020 000020 10 AX 0 0 16
[14] .text PROGBITS 0000000000401040 001040 00014e 00 AX 0 0 16
[15] .fini PROGBITS 0000000000401190 001190 00000d 00 AX 0 0 4
...
Symbol table
Information about functions and global variables (aka symbols)
defined and referenced in our program lives in a symbol table in .symtab
section.
An object file's symbol table holds information needed to locate and relocate a program's symbolic definitions and references. https://www.man7.org/linux/man-pages/man5/elf.5.html
This implies that we should be able to find
which symbol corresponds to memory address: 4198694
or 0x401126
in hexadecimal form.
Let's inspect the symbol table.
﹩ readelf --symbols fib
Symbol table '.dynsym' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@GLIBC_2.34 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.3.4 (3)
Symbol table '.symtab' contains 35 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS crt1.o
2: 000000000040037c 32 OBJECT LOCAL DEFAULT 4 __abi_tag
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
4: 0000000000401080 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones
5: 00000000004010b0 0 FUNC LOCAL DEFAULT 14 register_tm_clones
6: 00000000004010f0 0 FUNC LOCAL DEFAULT 14 __do_global_dtors_aux
7: 0000000000404030 1 OBJECT LOCAL DEFAULT 25 completed.0
8: 0000000000403e18 0 OBJECT LOCAL DEFAULT 20 __do_global_dtor[...]
9: 0000000000401120 0 FUNC LOCAL DEFAULT 14 frame_dummy
10: 0000000000403e10 0 OBJECT LOCAL DEFAULT 19 __frame_dummy_in[...]
11: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
12: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
13: 0000000000402100 0 OBJECT LOCAL DEFAULT 18 __FRAME_END__
14: 0000000000000000 0 FILE LOCAL DEFAULT ABS
15: 0000000000403e20 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
16: 0000000000402020 0 NOTYPE LOCAL DEFAULT 17 __GNU_EH_FRAME_HDR
17: 0000000000404000 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE_
18: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_mai[...]
19: 0000000000404020 0 NOTYPE WEAK DEFAULT 24 data_start
20: 0000000000404030 0 NOTYPE GLOBAL DEFAULT 24 _edata
21: 0000000000401190 0 FUNC GLOBAL HIDDEN 15 _fini
22: 0000000000404020 0 NOTYPE GLOBAL DEFAULT 24 __data_start
23: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
24: 0000000000404028 0 OBJECT GLOBAL HIDDEN 24 __dso_handle
25: 0000000000402000 4 OBJECT GLOBAL DEFAULT 16 _IO_stdin_used
26: 0000000000404038 0 NOTYPE GLOBAL DEFAULT 25 _end
27: 0000000000401070 5 FUNC GLOBAL HIDDEN 14 _dl_relocate_sta[...]
28: 0000000000401040 38 FUNC GLOBAL DEFAULT 14 _start
29: 0000000000404030 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
30: 000000000040115a 52 FUNC GLOBAL DEFAULT 14 main
31: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __printf_chk@GLI[...]
32: 0000000000404030 0 OBJECT GLOBAL HIDDEN 24 __TMC_END__
33: 0000000000401000 0 FUNC GLOBAL HIDDEN 12 _init
34: 0000000000401126 52 FUNC GLOBAL DEFAULT 14 fibNaive
Symbol table '.symtab' contains 35 entries:
Num: Value Size Type Bind Vis Ndx Name
...
34: 0000000000401126 52 FUNC GLOBAL DEFAULT 14 fibNaive
So here you go, fibNaive C function we were looking for:
Value=0000000000401126
The symbol's absolute run-time address since it's an executable object file. For relocatable object files, it would be an offset from the beginning of the section defined byNdx=14
.Size=52
The size of the function's code is 52 bytes.Type=FUNC
The symbol is associated with a function.Bind=GLOBAL
This indicates that the symbol is global, i.e., it can be referenced by other modules.Vis=DEFAULT
Default symbol visibility rules apply: strong (functions and initialized global vars) and weak symbols (uninitialized global vars) are available to other modules.Ndx=14
A symbol is "defined" in relation to.text
section according to section header table index, seereadelf --sections fib
.Name=fibNaive
The symbol namefibNaive
was taken from string table in section.strtab
, seereadelf --string-dump .strtab fib
.
😌
Great, let's check the remaining memory addresses we've got in the pprof file
(I've sorted them for our convenience).
Unfortunately, I couldn't find corresponding symbols in .symtab
section.
4198694 = 0x401126 fibNaive
4198700 = 0x40112c ?
4198705 = 0x401131 ?
4198706 = 0x401132 ?
4198707 = 0x401133 ?
4198708 = 0x401134 ?
4198712 = 0x401138 ?
4198715 = 0x40113b ?
4198719 = 0x40113f ?
4198724 = 0x401144 ?
4198727 = 0x401147 ?
4198736 = 0x401150 ?
4198739 = 0x401153 ?
4198743 = 0x401157 ?
4198744 = 0x401158 ?
4198745 = 0x401159 ?
What's interesting is that these addresses are within Size=52
bytes
from fibNaive
memory address, e.g.,
the distance to the highest address is 51 bytes = 4198745 - 4198694.
If we sort the symbols from .symtab
section by Value
,
we'll see that fibNaive
is between frame_dummy
and main
symbols.
The rest of unresolved memory addresses from the samples
fit in between fibNaive
and main
symbols.
0000000 = 0x000000 crt1.o
0000000 = 0x000000 crtstuff.c
0000000 = 0x000000 main.c
0000000 = 0x000000 crtstuff.c
0000000 = 0x000000
0000000 = 0x000000 __libc_start_main@GLIBC_2.34
0000000 = 0x000000 __gmon_start__
0000000 = 0x000000 __printf_chk@GLIBC_2.3.4
4195196 = 0x40037c __abi_tag
4198400 = 0x401000 _init
4198464 = 0x401040 _start
4198512 = 0x401070 _dl_relocate_static_pie
4198528 = 0x401080 deregister_tm_clones
4198576 = 0x4010b0 register_tm_clones
4198640 = 0x4010f0 __do_global_dtors_aux
4198688 = 0x401120 frame_dummy
4198694 = 0x401126 fibNaive
4198746 = 0x40115a main
4198800 = 0x401190 _fini
4202496 = 0x402000 _IO_stdin_used
4202528 = 0x402020 __GNU_EH_FRAME_HDR
4202752 = 0x402100 __FRAME_END__
4210192 = 0x403e10 __frame_dummy_init_array_entry
4210200 = 0x403e18 __do_global_dtors_aux_fini_array_entry
4210208 = 0x403e20 _DYNAMIC
4210688 = 0x404000 _GLOBAL_OFFSET_TABLE_
4210720 = 0x404020 data_start
4210720 = 0x404020 __data_start
4210728 = 0x404028 __dso_handle
4210736 = 0x404030 completed.0
4210736 = 0x404030 _edata
4210736 = 0x404030 __bss_start
4210736 = 0x404030 __TMC_END__
4210744 = 0x404038 _end
0x401120 frame_dummy
0x401126 fibNaive
0x40112c ?
0x401131 ?
0x401132 ?
0x401133 ?
0x401134 ?
0x401138 ?
0x40113b ?
0x40113f ?
0x401144 ?
0x401147 ?
0x401150 ?
0x401153 ?
0x401157 ?
0x401158 ?
0x401159 ?
0x40115a main
Disassembling the program confirms that memory addresses in question
correspond to instructions of fibNaive
function.
﹩ objdump --disassemble fib
...
0000000000401120 <frame_dummy>:
401120: f3 0f 1e fa endbr64
401124: eb 8a jmp 4010b0 <register_tm_clones>
0000000000401126 <fibNaive>:
401126: 48 83 ff 02 cmp $0x2,%rdi
40112a: 7f 06 jg 401132 <fibNaive+0xc>
40112c: b8 01 00 00 00 mov $0x1,%eax
401131: c3 ret
401132: 55 push %rbp
401133: 53 push %rbx
401134: 48 83 ec 08 sub $0x8,%rsp
401138: 48 89 fb mov %rdi,%rbx
40113b: 48 8d 7f fe lea -0x2(%rdi),%rdi
40113f: e8 e2 ff ff ff call 401126 <fibNaive>
401144: 48 89 c5 mov %rax,%rbp
401147: 48 8d 7b ff lea -0x1(%rbx),%rdi
40114b: e8 d6 ff ff ff call 401126 <fibNaive>
401150: 48 01 e8 add %rbp,%rax
401153: 48 83 c4 08 add $0x8,%rsp
401157: 5b pop %rbx
401158: 5d pop %rbp
401159: c3 ret
000000000040115a <main>:
40115a: 48 83 ec 08 sub $0x8,%rsp
40115e: bf 32 00 00 00 mov $0x32,%edi
401163: e8 be ff ff ff call 401126 <fibNaive>
401168: 48 89 c1 mov %rax,%rcx
40116b: ba 32 00 00 00 mov $0x32,%edx
401170: be 04 20 40 00 mov $0x402004,%esi
401175: bf 01 00 00 00 mov $0x1,%edi
40117a: b8 00 00 00 00 mov $0x0,%eax
40117f: e8 ac fe ff ff call 401030 <__printf_chk@plt>
401184: b8 00 00 00 00 mov $0x0,%eax
401189: 48 83 c4 08 add $0x8,%rsp
40118d: c3 ret
...
0000000000401126 <fibNaive>:
401126: 48 83 ff 02 cmp $0x2,%rdi
40112a: 7f 06 jg 401132 <fibNaive+0xc>
40112c: b8 01 00 00 00 mov $0x1,%eax
401131: c3 ret
401132: 55 push %rbp
401133: 53 push %rbx
401134: 48 83 ec 08 sub $0x8,%rsp
401138: 48 89 fb mov %rdi,%rbx
40113b: 48 8d 7f fe lea -0x2(%rdi),%rdi
40113f: e8 e2 ff ff ff call 401126 <fibNaive>
401144: 48 89 c5 mov %rax,%rbp
401147: 48 8d 7b ff lea -0x1(%rbx),%rdi
40114b: e8 d6 ff ff ff call 401126 <fibNaive>
401150: 48 01 e8 add %rbp,%rax
401153: 48 83 c4 08 add $0x8,%rsp
401157: 5b pop %rbx
401158: 5d pop %rbp
401159: c3 ret
It means that bpf_get_stackid
we used in
profiler
collected values stored in rip
register,
i.e., memory addresses of machine instructions.
If we compiled the program with frame pointers,
bpf_get_stackid
would have additionally unwound a stack which
could look like [401126, ..., 40115a]
(fibNaive, ..., main)
instead of [401126]
when frame pointers are disabled (only rip
value is stored),
see Javier's explanation.
PC to symbol address
Since we have sorted symbol addresses from .symtab
,
we can binary search them to find
whose instructions were executing when a sample was taken.
0x401120 frame_dummy
0x401126 fibNaive
0x40112c ?
0x40115a main
When the program counter (PC) contains instruction address 0x401126
,
then fibNaive
function symbol is found in the symbol table.
When PC points to 0x40112c
instruction,
there is no match in the symbol table,
so the main
symbol's predecessor is resolved
because binary search stops at 0x40115a
.
func PC2SymbolAddress(symbols []int, pc int) int {
lft := 0
var mid int
rgt := len(symbols) - 1
for lft <= rgt {
mid = lft + (rgt-lft)/2
symAddr := symbols[mid]
switch {
case symAddr == pc:
return symAddr
case symAddr < pc:
lft = mid + 1
case symAddr > pc:
rgt = mid - 1
}
}
// Since pc wasn't found in the array,
// now mid points to symbol address > pc, i.e., 0x40115a.
//
// 0x401120 frame_dummy
// 0x401126 fibNaive
// 0x40112c ?
// 0x40115a main
//
// Therefore the desired symbol's address is 0x401126.
if mid >= 1 {
return symbols[mid-1]
}
return -1
}
In reality symbolization is much more complicated. Most likely you would have to resolve symbols in shared libraries, and also deal with things like:
- DWARF-based Stack Walking Using eBPF.
Alas, frame pointers are disabled by default in gcc,
but you can enable them with the
-fno-omit-frame-pointer
flag. - Looking for symbols formatted in DWARF
which could be part of an executable file or distributed separately,
see Fantastic Symbols and Where to Find Them.
The
-g
gcc flag produces debugging information in DWARF format.
When profiling JIT-compiled code you would have to
look for symbols in /tmp/perf-$PID.map
file, see
Fantastic Symbols and Where to Find Them - Part 2.
Category: Profiler Tagged: bpf profiling