This is a write-up of a bug in Undo’s software replay technology that I diagnosed and fixed. I’m writing it up while the details are still fairly clear in my memory and I have access to the relevant debugging transcripts.
Some background first. Undo’s product LiveRecorder is a software record-and-replay technology. You run your buggy program under LiveRecorder and get a recording of its execution, which you can then replay in UDB (the Undo debugger), allowing you to run it forwards and backwards and jump to any point in the execution history to examine the registers and memory at that point. If the program crashes on an incorrect value you can set a watchpoint on the location of the value and run the program backwards to see where the incorrect value came from. The technology relies on the fact that most computation is deterministic: that is, the outcome only depends on the state of the program’s registers and memory. This means that the recording doesn’t need to save all the details of the program’s execution, but only the nondeterministic operations: that is, the results of system calls and a few special instructions like RDTSC
. Other details can be re-computed at replay time, keeping the recording size manageable.
This particular bug was provoked when I pushed a small commit removing some long-dead code. This caused a backwards-compatibility test to fail in the overnight run, and since my commit was “responsible” for the failure (as determined by git bisect
), it fell to me to fix the problem. This put me in a bad mood, because I was certain that there was nothing wrong with my commit, and that what was going on was that removing the dead code had perturbed the code layout causing some previously latent bug to spring into life. There’s little that’s more annoying than to be blamed for something that isn’t your fault. But afterwards I was glad I worked on it.
What was the failure? The backwards-compatibility test took a recording made with LiveRecorder version 5.3, and replayed it on the latest version of UDB (which at the time was 6.6). When the recording was replayed, UDB crashed with this message:
************************************************************** Fatal error: xsave_type != 0xcafedead && (xsave_type & ~(~0x2f)) == 0 Additional information: B-code has not set up instr_ab_to_nonab_vars.xsave.type correctly: cafedead Location: src/engine_impl/events/event_x86.inc.c:185:event_xsave_replay_sanity_checks [10635:198932] **************************************************************
What this means is that there was a discrepancy between the execution sequence of the program when it was recorded and the execution sequence when it was replayed. When recorded, the program had executed an XSAVE
instruction, but at the corresponding point in replay the program executed a JLE
instead (see the disassembly below). Emulating the XSAVE
instruction should have set the xsave_type
variable to an appropriate value, but it remained at its default value of 0xcafedead
. The transcript below demonstrates how to reproduce the bug in UDB after the recording has been loaded. I used the ugo time
command to jump to just before the crash, and the info events
command to query the non-deterministic events around this time (you can see that the recording thinks it is about to execute an XSAVE
event). I used the step-instruction
command (si
for short) to step over the instructions, demonstrating that crash occurs on the JLE
at 0x2b033904e0a9
.
start 1> ugo time 26863981 0x00002b033904e0a0 in ?? () 98% 26,863,981> info events -a -200 -b +1 time=26,863,832:0xffffffffffffffff: write. result=0x69 size=80. time=26,863,981:0xffffffffffffffff: XSAVE. size=96. 98% 26,863,981> disas $pc,$pc+11 Dump of assembler code from 0x2b033904e0a0 to 0x2b033904e0ab: => 0x00002b033904e0a0 <_IO_file_write@@GLIBC_2.2.5+32>: sub %rax,%rbx 0x00002b033904e0a3 <_IO_file_write@@GLIBC_2.2.5+35>: add %rax,%rbp 0x00002b033904e0a6 <_IO_file_write@@GLIBC_2.2.5+38>: test %rbx,%rbx 0x00002b033904e0a9 <_IO_file_write@@GLIBC_2.2.5+41>: jle 0x2b033904e100 <_IO_file_write@@GLIBC_2.2.5+128> End of assembler dump. 98% 26,863,981> si 3 0x00002b033904e0a9 in _IO_file_write@@GLIBC_2.2.5 () from /tmp/undodb.11859.1637847800.0086906.848e573eb2bbc4f3/debuggee-1-9wz3qm99/symbol-files/lib64/libc.so.6 98% 26,863,981> si ************************************************************** Fatal error: xsave_type != 0xcafedead && (xsave_type & ~(~0x2f)) == 0 Additional information: B-code has not set up instr_ab_to_nonab_vars.xsave.type correctly: cafedead Location: src/engine_impl/events/event_x86.inc.c:185:event_xsave_replay_sanity_checks [11904:198932] **************************************************************
We call this kind of bug, where replay behaviour does not match record behaviour, a “non-determinism” bug, because the usual cause is that we failed to capture some non-deterministic feature of the system at record time, and so when we replay the program it takes a different execution path which we eventually discover through failure to execute the expected instruction. However, it seemed doubtful that this was the cause in this case, because the recording replayed correctly in UDB version 6.5.
To find the point where the execution paths diverged, I set up two instances of UDB replaying the same recording, one with version 6.5 (which I knew could replay the recording correctly) and one with 6.6 (which could not). Then I made both instances go to the same time in execution history and compared the execution state (registers and memory) between the two. I did a binary search to find the point where the replay of the recording diverged: this was a comparison instruction which succeeded in 6.5 and failed in 6.6. The comparison was with the register XMM0
. Here’s the value of the XMM0
register in 6.6:
98% 26,861,822> p/x $xmm0.uint128 $3 = 0x0
and here’s the same register at the same point in 6.5:
98% 26,861,822> p/x $xmm0.uint128 $3 = 0x3f618961b5740d10
So in 6.6 the XMM0
register was somehow incorrectly cleared. I traced this back to the preceding XRSTOR
instruction:
98% 26,860,035> ugo time 26,860,035:0x2b0335972bf4 0x00002b0335972bf4 in _dl_runtime_resolve_xsavec () from /tmp/undodb.15785.1637859117.4496183.6b6e82dbc96fddf0/debuggee-1-03wan4qf/symbol-files/lib64/ld-linux-x86-64.so.2 98% 26,860,035> disas $pc,$pc+1 Dump of assembler code from 0x2b0335972bf4 to 0x2b0335972bf5: => 0x00002b0335972bf4 <_dl_runtime_resolve_xsavec+132>: xrstor 0x40(%rsp) End of assembler dump. 98% 26,860,035> p/x $xmm0.uint128 $22 = 0x3f618961b5740d10 98% 26,860,035> si 0x00002b0335972bf9 in _dl_runtime_resolve_xsavec () from /tmp/undodb.15785.1637859117.4496183.6b6e82dbc96fddf0/debuggee-1-03wan4qf/symbol-files/lib64/ld-linux-x86-64.so.2 98% 26,860,035> p/x $xmm0.uint128 $23 = 0x0
In 6.6 this XRSTOR
instruction cleared the XMM0
register but in 6.5 it left XMM0
unchanged (by restoring the same value). What was going on here? On x86-64, the XRSTOR
instruction reverses the effect of the XSAVE
instruction: XSAVE
copies the processor state (or a subset thereof, depending on the flags in the RAX
register) to memory, and XRSTOR
restores processor state from memory. (You can read the gory details in the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2: Instruction Set Reference.) The operation of the XRSTOR
instruction is controlled by the XSTATE_BV
field, at the offset of 512 from the base of the XSAVE
region. This is a bit-vector describing the data stored in the XSAVE
region. It is intersected with the requested-feature bitmap in the RAX
register to determine what parts of the processor state get restored.
The XSAVE
area was on the stack at 0x40(%rsp)
and the requested-feature bitmap was in the RAX
register:
98% 26,860,035> p/x $rax $10 = 0xee
The value 0xee
included, among other flags, the XSTATE_SSE
flag, which includes the XMM
registers. The XSTATE_BV
field was here:
98% 26,860,035> p/x *(uint64_t*)($rsp+0x40+512) $11 = 0x2
The value 0x2
was the XSTATE_SSE
flag, so these were the registers that were supposed to be restored. The XMM0
register is stored at offset 160 from the base of the XSAVE
area. In UDB version 6.6 this contained the wrong value (zero):
98% 26,860,035> x/2g $rsp+0x40+(10*16) 0x7ffd1741d260: 0x0000000000000000 0x0000000000000000
but in version 6.5 the corresponding contents were non-zero:
98% 26,860,035> x/2g $rsp+0x40+(10*16) 0x7ffd1741d260: 0x3f618961b5740d10 0x0000000000000000
Something must have gone wrong at the preceding XSAVE
instruction, which failed to save the XMM0
register to the XSAVE
area. I used the ugo event prev
command to jump to this instruction. You’ll see from the disassembly below that this was actually an XSAVEC
instruction (a variant of XSAVE
that “compresses” the result by omitting the parts of the processor state that were not requested to be saved).
98% 26,860,035> ugo event prev Going to: time=26,859,273:0xffffffffffffffff: XSAVE. size=96. 0x00002b0335972bd8 in _dl_runtime_resolve_xsavec () from /tmp/undodb.28658.1637919175.2831407.d35582fb6fc784c6/debuggee-1-y0jqeq4w/symbol-files/lib64/ld-linux-x86-64.so.2 98% 26,859,273> disas $pc, $pc+1 Dump of assembler code from 0x2b0335972bd8 to 0x2b0335972bd9: => 0x00002b0335972bd8 <_dl_runtime_resolve_xsavec+104>: xsavec 0x40(%rsp) End of assembler dump.
I checked the value of the XMM0
register, used the step-instruction
command to step over the XSAVE
instruction, and then examined memory to see what had been saved:
98% 26,859,273> p/x $xmm0.uint128 $15 = 0x3f618961b5740d10 98% 26,859,273> si 0x00002b0335972bdd in _dl_runtime_resolve_xsavec () from /tmp/undodb.28093.1637918403.5949795.bc5f1f623b0c8cb6/debuggee-1-r8r7phvh/symbol-files/lib64/ld-linux-x86-64.so.2 98% 26,859,274> x/2g $rsp+0x40+(10*16) 0x7ffd1741d260: 0x0000000000000000 0x0000000000000000
In version 6.6 (above) the XMM0
register was not saved by the XSAVEC
instruction, whereas in version 6.5 (below) it was saved correctly:
98% 26,859,274> x/2g $rsp+0x40+(10*16) 0x7ffd1741d260: 0x3f618961b5740d10 0x0000000000000000
How could the XSAVEC
have failed to save the XMM0
register? One possibility is that the requested-feature flags in the RAX
register omitted the XSTATE_SSE
flag. But this wasn’t the case:
98% 26,860,035> p/x $rax $16 = 0xee
The other possibility was that something had gone wrong with the translation of the XSAVEC
instruction. This is where I have to confess that I’ve been concealing some complexity from you! The transcripts above contain disassemblies of the program being executed. But these are not the instructions that were actually executed! In order to record or replay the non-deterministic behaviour of the program, LiveRecorder uses a “just-in-time” approach to translate the original code (which we call “A-code”) into the code that is actually executed (which we call “B-code”). So long as A-code is executing deterministic instructions, the translation is simple, but when A-code executes a non-deterministic instruction like SYSCALL
or XSAVEC
, we translate it into code that will allow us to record or replay the non-deterministic results. When everything is working correctly we can ignore this and pretend that A-code is executing, but when something goes wrong we have to peek under the hood. I used the umaintenance info bbs
command to look at the B-code translation of the A-code:
95% 26,072,993> umaint info bbs -d $pc Dump of assembler code from 0x2b0335972bd8 to 0x2b0335972bdd: => 0x00002b0335972bd8 <_dl_runtime_resolve_xsavec+104>: xsavec 0x40(%rsp) End of assembler dump. Dump of assembler code from 0x5f40045b5f80 to 0x5f40045b6030: 0x00005f40045b5f80: movabs %rax,0x3f80011ea0c0 0x00005f40045b5f8a: mov %rdx,-0x3174d49(%rip) # 0x5f4001441248 0x00005f40045b5f91: movabs 0x3f8000dc6d88,%rax 0x00005f40045b5f9b: mov %rdi,0x8(%rax) 0x00005f40045b5f9f: mov %rsi,0x10(%rax) 0x00005f40045b5fa3: mov %rcx,0x30(%rax) 0x00005f40045b5fa7: mov %r8,0x40(%rax) 0x00005f40045b5fab: mov %r9,0x48(%rax) 0x00005f40045b5faf: movabs 0x3f80011ea0c0,%rax 0x00005f40045b5fb9: lea 0x40(%rsp),%rsi 0x00005f40045b5fbe: movabs 0x3f8000dc6d88,%rax 0x00005f40045b5fc8: mov %rsp,0x80(%rax) 0x00005f40045b5fcf: lea 0x8(%rax),%rsp 0x00005f40045b5fd3: pushf 0x00005f40045b5fd4: movabs 0x3f80011ea0c0,%rax 0x00005f40045b5fde: and $0x3133097c,%eax 0x00005f40045b5fe4: and $0x323134,%edx 0x00005f40045b5fea: mov %rax,%r8 0x00005f40045b5fed: mov %rdx,%r9 0x00005f40045b5ff0: movabs 0x3f8000dc6d88,%rax 0x00005f40045b5ffa: movabs $0x5f4001441280,%rax 0x00005f40045b6004: decl (%rax) 0x00005f40045b6006: movabs $0x8c,%rdi 0x00005f40045b6010: movabs $0x2b0335972bdd,%rcx 0x00005f40045b601a: rex.W jmp *0x7(%rip) # 0x5f40045b6028 0x00005f40045b6021: nop 0x00005f40045b6022: nop 0x00005f40045b6023: nop 0x00005f40045b6024: nop 0x00005f40045b6025: nop 0x00005f40045b6026: movabs $0x3f8000196d10,%r14 End of assembler dump. Found 1 BB(s) overlapping the specified range. BB length: 5, A-code from 0x2b0335972bd8 - 0x2b0335972bdd, B-code from 0x5f40045b5f80 - 0x5f40045b6030, BBT_XSAVEC
There’s a lot of detail there, so here’s the important bit from the B-code:
0x00005f40045b6021: nop 0x00005f40045b6022: nop 0x00005f40045b6023: nop 0x00005f40045b6024: nop 0x00005f40045b6025: nop
These five bytes are where the XSAVEC
instruction should have been. No wonder the XMM0
register was not being saved: in fact, nothing was being saved! To summarize the results of the investigation so far:
XSAVE
event was replayed but the XSAVE
type was the default value;
XSAVE
had been executed;
XMM0
register had the wrong value;
XRSTOR
instruction;
XSAVE
area had zeroes in the XMM0
field;
XMM0
register had not been saved;
XSAVE
instruction had been executed;
NOP
s instead of an XSAVE
instruction.
To make further progress I needed to look at what went wrong in the just-in-time instruction translator. Translation takes place inside the process being debugged (the “child” process): after a block of B-code finishes executing, it jumps to a run-time management function which has the job of ensuring that the next block of A-code has been translated. Debugging this run-time management code is slightly tricky because the Unix ptrace mechanism allows a process to have only one debugger at a time, and in this situation the child already has a debugger, namely the instance of UDB that is replaying the recording. So the way we support debugging the child code is to multiplex our ptrace connection so that we can have two debuggers running simultaneously, one debugging the original code via the B-code translation, and the other debugging the management code in the same process. As you might imagine, this is a bit delicate when the process switches from original code to management code or vice versa. Luckily for me I didn’t need to spend much time looking at the management code.
The translation of XSAVEC
instructions takes place in the function xlate_XSAVEC()
, so after attaching a debugger to the “debug server” (the multiplexed debugging system described above) I set a breakpoint there:
(udebug) b xlate_XSAVEC Breakpoint 1 at 0x3f80001acbb0: file src/engine_impl/instr/xlate_x86_64.inc.c, line 3564. (udebug) c Continuing. Breakpoint 1, xlate_XSAVEC (...) at src/engine_impl/instr/xlate_x86_64.inc.c:3564 3564 if (instr_config()->simulate_xsavec) (udebug) list 3561 xlate_XSAVEC(pc_t src, pc_t effective_src, ASMGEN_PTR_DECL, 3562 size_t instr_len, bool to_xsaveopt, uint64_t recorded_xcr0) 3563 { 3564 if (instr_config()->simulate_xsavec) 3565 { 3566 return s_xlate_simulate_xsavec(src, ASMGEN_PTR, instr_len, 3567 false, recorded_xcr0); 3568 } 3569 3570 return s_common_xlate_XSAVE(src, effective_src, ASMGEN_PTR, (udebug) p instr_config()->simulate_xsavec $1 = true
The XSAVEC
translator takes one of two code paths, according to the simulate_xsavec
flag. This feature is designed to allow UDB to take a recording made on a CPU supporting the XSAVEC
instruction and replay it on a CPU lacking this feature. But this couldn’t explain what I was seeing, because the replay machine (my laptop in this case) has a Broadwell CPU, which supports the XSAVEC
instruction. So it seemed more likely that the simulate_xsavec
flag had been configured incorrectly.
The configuration of the instruction translation system takes place in a separate process, called “Nanny” because its job is to look after the various “child” processes involved in recording and replaying. Nanny and child processes share some regions of memory, so that after Nanny configures the instruction translation system, the children can access the configuration by calling instr_config()
. Nanny code is much easier to debug than child code, because it’s possible to use LiveRecorder to make a recording of Nanny, and then this can be replayed forwards and backwards to investigate the problem. The instruction translation system is configured in the function instr_configure()
, so I set a breakpoint there:
start 1> b instr_configure Breakpoint 1 at 0x3f80001997d0: file src/engine_impl/instr/instr_config.c, line 16. start 1> c Continuing. Breakpoint 1, instr_configure (ifs=ifs@entry=0x3f8001211a40) at src/engine_impl/instr/instr_config.c:16 16 *s_instr_config = *ifs; 1% 1,139,945> p ifs->simulate_xsavec $1 = true
To find out how this flag was set, I set a watchpoint on the memory location of the flag, and used the reverse-continue
command (rc
for short) to run backwards to the point where the flag was set:
1% 1,139,945> watch -l ifs->simulate_xsavec Hardware watchpoint 2: -location ifs->simulate_xsavec 1% 1,139,945> rc Continuing. Hardware watchpoint 2: -location ifs->simulate_xsavec Old value = true New value = false 0x00003f80001dc2f0 in s_set_global_state (mi=0x3f8001210ef0) at src/engine_impl/save_load/load.c:405 405 instr_config.simulate_xsavec = !(asm_xsave_supported() & ASM_CPUID_XSAVEC) ||
The “Old value” and “New value” labels are from the debugger’s point of view, not the debuggee’s, so when running backwards we have to mentally swap them: this means that when Nanny was recorded, it set the simulate_xsavec
flag to true at this point. Why did it do that? Well there were three possible reasons:
1% 1,139,944> list 400 * recording, and is not user configurable in any way. 401 */ 402 instr_config.simulate_adx = true; 403 404 #if defined(__i386__) || defined(__x86_64__) 405 instr_config.simulate_xsavec = !(asm_xsave_supported() & ASM_CPUID_XSAVEC) || 406 config_get()->simulate_instr.xsavec || 407 s_is_xsave_features_subset(mi->xsave_xcr0); 408 #endif 409
These three possibilities were:
!(asm_xsave_supported() & ASM_CPUID_XSAVEC)
: that is, XSAVEC
was not supported on the replay CPU, as discussed above. But this was not the case: the replay CPU did support this instruction.
config_get()->simulate_instr.xsavec
: that is, a global configuration flag requested simulation of XSAVEC
, for example for testing purposes. But this flag had not been set:
1% 1,139,944> p s_config->simulate_instr.xsavec $2 = false
s_is_xsave_features_subset(mi->xsave_xcr0)
: that is, the replay CPU lacked one or more of the features of the recording CPU. Here mi->xsave_xcr0
is a set of flags describing the XSAVE
capabilities of the processor, extracted from extended control register 0 on the machine where the recording was made, and saved in the recording. (Note that the function name is misleading since s_is_xsave_features_subset()
returns true if the recording CPU’s XSAVE
capabilities are not a subset of the replay CPU’s XSAVE
capabilities.)
So it was condition 3 that was satisfied:
1% 1,139,944> p/x mi->xsave_xcr0 $3 = 0x34353734310b2c 1% 1,139,944> rs s_is_xsave_features_subset (saved_xcr0=) at src/engine_impl/save_load/load.c:275 275 return (saved_xcr0 & ~replay_xcr0) != 0; 1% 1,139,943> list 270 uint64_t replay_xcr0; 271 272 int e = proc_get_xcr0(libc_heap, NULL, &replay_xcr0); 273 verify_warnf(e == 0, "failed to get XCR0 value: %s", debug_errno_string(e)); 274 275 return (saved_xcr0 & ~replay_xcr0) != 0; 276 } 277 #endif 1% 1,139,943> p/x replay_xcr0 $4 = 0x1f 1% 1,139,943> p/x 0x34353734310b2c & ~replay_xcr0 $5 = 0x34353734310b20
The value in mi->xsave_xcr0
looked suspicious: extended control register 0 should not have so many set bits. (At this point I failed to notice that the value of mi->xsave_xcr0 looks very like ASCII-encoded text: in particular, the bytes 34 35 37 34 31
encode the string "45741"
. This would have been a useful hint as to the underlying cause of the bug! But the good thing about working backwards through program execution with UDB is that you don’t have to be so clever.) Where did this value come from? I set a watchpoint on the location of mi->xsave_xcr0
and reverse-continued to find the point where it was set:
1% 1,139,943> del 2 1% 1,139,943> watch -l mi->xsave_xcr0 Hardware watchpoint 3: -location mi->xsave_xcr0 1% 1,139,943> rc Continuing. Hardware watchpoint 3: -location mi->xsave_xcr0 Old value = 14695210004056876 New value = 14695210002221868 0x00003f800014f2a0 in TrioWriteNumber (self=self@entry=0x3f8001211100, number=1, flags=flags@entry=0, width=width@entry=0, precision=-1, base=10) at 3rd_party/util/libc_sprintf.c:2038 2038 *pointer-- = digits[number % base]; 0% 13,781>
What happened here? This does not look like an update to mi->xsave_xcr0
! But it’s important to remember that a location watchpoint watches a value at a specific location in memory, and that memory location can be reused for different data structures during the execution of the program. So what this demonstrated was that mi->xsave_xcr0
was never updated at all, and so its value was whatever was at that location in memory before the mi
structure was allocated.
Note also that the reverse-continue
command ran a long way backwards: you can see that the bbcount in the prompt reduced from 1,139,943 all the way to 13,781. So this libc_snprintf()
call was very early in the program’s execution, long before it was loading the recording and updating the recording_misc_info_t
data structure. When a reverse-execution command takes you a long way from the code you were investigating it is often convenient to use the ugo undo
command (uu
for short) to get back to where you were:
0% 13,781> uu 0x00003f80001dc2f0 in s_set_global_state (mi=0x3f8001210ef0) at src/engine_impl/save_load/load.c:405 405 instr_config.simulate_xsavec = !(asm_xsave_supported() & ASM_CPUID_XSAVEC) || 1% 1,139,943>
So how come the most recent update to mi->xsave_xcr0
was the apparently unrelated update to the local variable pointer
in the libc_snprintf()
function? Could it be that mi
was allocated on the stack? It sure was:
1% 1,139,944> p mi $10 = (const recording_misc_info_t *) 0x3f8001210ef0 1% 1,139,944> p/x $sp $11 = 0x3f8001210ce0
So this finally explained the bug! The recording_misc_info_t
structure was a local variable in the function handle_command_load()
:
recording_misc_info_t misc_info;
Being a local variable, it was allocated on the stack, and since this is the C programming language, local variables are not initialized unless they have explicit initializers. The misc_info
structure was supposed to be initialized by a call to recording_read_misc_info()
to fill in the fields of the structure with data from the recording file. But this function only filled in the fields which were actually read from the file, and recording files did not have an xsave_xcr0
field until LiveRecorder version 6.0. (In older versions of the product, the replay CPU had to have a superset of the features of the recording CPU, but this condition proved onerous, and starting with version 6.0 we began to add support for replay via emulation where features were missing from the replay CPU.) To summarize the analysis:
XSAVE
event was replayed but the XSAVE
type was the default value;
XSAVE
was executed;
XMM0
register had the wrong value;
XRSTOR
instruction;
XSAVE
area had zeroes in the XMM0
field;
XMM0
register had not been saved;
XSAVE
instruction was executed;
NOP
s instead of an XSAVE
instruction;
s_xlate_simulate_xsavec()
;
simulate_xsavec
flag had been configured incorrectly;
s_is_xsave_features_subset(mi->xsave_xcr0)
returned true;
mi->xsave_xcr0
had the bogus value 0x34353734310b2c
;
xsave_xcr0
field.
This analysis satisfyingly explained the symptoms of the bug:
It only affected recordings made with LiveRecorder version 5.3 or earlier, because later recordings contained an xsave_xcr0
field that was used to initialize the corresponding field in the recording_misc_info_t
structure.
It randomly manifested or not, depending on changes to nearby code, because these changes affected the stack layout and so changed the bogus value in the xsave_xcr0
field in the recording_misc_info_t
structure. In particular, the dead code that I had removed in the commit that started the whole thing had contained an (unused) stack-allocated data structure whose removal pushed the recording_misc_info_t
structure upward in memory, causing it to overlap the memory used by the local variable pointer
in a very early libc_snprintf()
call.
The code introducing the bug had been committed in June 2020, but the bug had remained latent for a year and a half before I caused it to reveal itself.
I won’t claim that this is the largest ratio of investigation to diff that I’ve ever been involved with (it’s probably beaten by the last lousy bug) but it’s an impressive ratio nonetheless:
- recording_misc_info_t misc_info; + recording_misc_info_t misc_info = { 0 };
Of course, the fix is not the end of the matter. Applying the “five whys” principle, we ought to try to understand how the bug was introduced, and take appropriate action to reduce the risk of something similar happening in future. From general approaches to specific, here are the ideas I had:
The bug would have been caught immediately if we could have run Nanny under Valgrind. Valgrind’s “Memcheck” tool detects dangerous use of uninitialized values, and would have caught the dependency on the uninitialized xsave_xcr0
field. Unfortunately, we can’t run Nanny under Valgrind because Nanny is not launched from an executable, but forked from the original child process. When LiveRecorder was first developed in the early 2000s, there were technical reasons for doing it this way, but Linux’s capabilities have become much more powerful in the last couple of decades, so we should now refactor the Nanny–child relationship to make it possible (among other benefits) to run Nanny under Valgrind. (We do run other parts of the UDB software under Valgrind.)
The bug was not caught in review. We can’t expect reviewers to spot subtle action-at-a-distance bugs like this, but what we can do instead (following the “software inspection” principle) is to write down a rule (that all stack-allocated data structures must be initialized), and ask reviewers to apply the rule to new code. It’s easy to check the rule locally, and if it had been applied here, it would have fixed the problem without anyone needing to spot the bug.
A programmer might reasonably expect that a function named recording_read_misc_info()
fills in the whole of the recording_misc_info_t
structure. The fact that it only fills in the fields that were present in the recording, and leaves other fields uninitialized, is a bug magnet. We should update this function so that it updates the whole structure, either by moving initialization code into the function, or moving fields out of the structure.
A programmer might reasonably expect that a function named s_is_xsave_features_subset()
returns true if the record CPU’s XSAVE
capabilities are a subset of the replay CPU’s capabilities. We should rename the function or change its behaviour to correspond to the name.