In a previous article, Using GDB to time travel, I explained the basics of time travel with GNU Debugger (GDB) and how you can go back to see the past. However, as all fans of time travel media know, merely seeing the past is just the surface of what you can do.
In this article, we'll discuss more interesting things you can do, such as creating time loops and changing the past without paradoxes. I won’t explain new commands this time, but I will put a few of them together in new and exciting ways.
The inciting incident
We will use a toy example for this tutorial. In this case, imagine you are working on a game where there is a random chance for the player to hit the enemy, and you end up with the following code:
1 │ #include <stdio.h>
2 │ #include <stdlib.h>
3 │ #include <time.h>
4 │ unsigned char roll() { // returns 0-100, inclusive
5 │ int percent = rand() % 256;
6 │ return (percent * 100)/255;
7 │ }
8 │ int hit(int chance, float acc_boost, float evasion_boost) {
9 │ int r = roll();
10 │ acc_boost = (2 + acc_boost) / 2;
11 │ evasion_boost += 2;
12 │ evasion_boost = 2 / evasion_boost;
13 │ r *= acc_boost/evasion_boost;
14 │ return r < chance;
15 │ }
16 │ int main() {
17 │ srand(time(NULL));
18 │
19 │ int hits = 0;
20 │ if (hit(0, 0, 0))
21 │ hits ++; // Should always fail
22 │ if (hit(50, 0, 0))
23 │ hits ++; // Should hit 50% of the time
24 │ if (hit(100, 0, 0))
25 │ hits ++; // Should always hit
26 │
27 │ printf ("Number of hits: %d\n", hits);
28 │ return 0;
29 │ }
You would expect that testing this code would give you the value of 1 or 2 in roughly the same amounts. Let’s see what happens:
~ gcc -g random.c -o random
~ for i in {0..60}; do ./random; sleep 1; done
Number of hits: 2
Number of hits: 2
Number of hits: 1
Number of hits: 1
Number of hits: 2
Number of hits: 2
Number of hits: 2
Number of hits: 1
Number of hits: 0
Number of hits: 1
Wait a second, how did we get a zero? We should always have at least 1, because line 24 should always succeed.
This bug would be pretty difficult to debug since we only know about the problem after it has happened. Even recording, it would be a pain to get it, right?
Time loops
Time loops were first introduced to me by Vicențiu Ciorbaru as a passing comment in his FOSDEM talk (his comments on time loops come at around 4:27). Time loops can be extremely helpful in debugging sporadic failures. In this situation, if you could automate running and recording a test, and if the test passes, we restart the process, then our debugging would be that much easier.
While Vicențiu shows the command using RR, GDB can also do that using breakpoint commands. Let’s see how we could set up a time loop for this program. The first order of business is to identify three key locations: where the buggy code might start, somewhere that we are sure is only hit when the bug hasn’t triggered, and somewhere where we are sure the bug has triggered. I am using lines 19, 25, and 27 for this program. With those identified, let’s set breakpoints on all of them:
➜ ~ gdb random
Reading symbols from random...
(gdb) break 19
Breakpoint 1 at 0x401200: file t.c, line 19.
(gdb) break 25
Breakpoint 2 at 0x40120e: file t.c, line 25.
(gdb) break 27
Breakpoint 3 at 0x401212: file t.c, line 27.
Then using commands, we’ll set it so GDB automates the start of our testing. These are the three commands I’m using:
- Tell GDB to not report the breakpoint. Reporting it would just be unnecessary noise.
- Start recording the execution, in case this is the time where the bug happens.
- Continue the execution of the inferior.
The following code illustrates this:
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>record
>continue
>end
Next, we set it so that if the bug hasn’t triggered, GDB will automatically restart the inferior. This is why I could choose line 27 as the bug path. If no bug has happened, we won’t go past line 25.
The commands used to make this happen are as follows:
- Sleep for one second. This is very specific for this example, because my code uses the current time to seed a random number generator, so if the program ran multiple times in the same second, we would get the same output multiple times.
- Fully restart the inferior. This also causes GDB to stop recording and delete the history.
The following code illustrates this:
(gdb) command 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>python time.sleep(1)
>run
>end
Finally, we have to import the time Python library, because the previous commands use that to sleep. Then we can set the loop in motion for the first time as follows:
(gdb) python import time
(gdb) run
Starting program: /home/gwenthekween/random
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 2, main () at random.c:25
25 hits ++; // Should always hit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 2, main () at random.c:25
25 hits ++; // Should always hit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Now you’re free to go make some coffee or tea. In time loops, the slowdown of recording becomes very noticeable. In this case, the performance feels even worse because we wait 1 second after every iteration.
It is usually worth keeping an eye on the first execution of your program, because you may hit the maximum number of instructions and stop progressing. In which case, I would recommend trying to narrow down the recorded range further or using another technique.
As I explained in the previous article, reaching the maximum number of instructions will require at least 20Gb of RAM, likely 30 or more. So increasing that number is only feasible if your machine is very hefty. However, this program is small enough that you shouldn’t hit the max number.
Now that you have your coffee and the inferior finally hit breakpoint number 3, you should be able to review the faulty execution and debug as usual. This is an exercise for the readers.
One final thing I’ll note is that sometimes it can be a long process to start up the inferior or to gather unrelated data. So restarting the inferior may not be a good fit for all projects. If that is your case, continue reading because things are going to get interesting.
Changing the past
On a completely different note, since we already have time travel, changing the past is a natural thing that developers might want to do. Maybe the data never changes, but your program can’t cache the input, which makes testing slow. Or maybe compilation takes a long time, so it would be convenient if you could workshop solutions without needing to recompile. Maybe it’s some secret third reason.
No matter why, I can see the convenience of changing the past. But pop cultures love to tell us about all sorts of paradoxes. So GDB doesn’t like it when you try to change the past. If you try to do it—using the previous recording and having stepped backwards once—you will get the following error:
(gdb) set hits = 1
Because GDB is in replay mode, writing to memory will make the execution log unusable from this point onward. Write memory at address 0x7fffffffdd3c?(y or n) n
Process record canceled the operation.
That’s because GDB will not execute instructions again. When in the past, instead of truly executing the inferior, GDB will just swap the values saved in the registers or memory, to pretend like things are being executed. It will never actually let the CPU do anything until you’re back in the present and recording restarts.
So you can imagine that changing a value in the past will not propagate the effects to the next instructions, and will be saved as the way things were. Worse yet, because you changed the past, GDB doesn’t remember what it was before the change, corrupting the story and causing the dramatic message "[it] will be unusable from this point onward."
So why am I making you think of something that GDB can’t do? Well, I like this idea so much that I have opened a bug to ensure this doesn’t end up forgotten. But unfortunately, there are higher priority tasks for me right now. I will welcome any changes in this direction (or any support for it in the bug). More importantly, there is a bootleg way to do it anyway.
When you return to the past, GDB recreates everything exactly as it was back then. If at that point you tell GDB that you aren’t interested in reverse debugging anymore, by telling it record stop
, it will drop all its memory of past and future execution. It will leave you where you are, stranded in the past, only able to move forward by executing the inferior again.
At that point, you can change the state, and the instructions will execute again. Let’s see it in action, assuming you still have the time loop I explained in the previously:
Breakpoint 3, main () at random.c:27
27 printf ("Number of hits: %d\n", hits);
Missing rpms, try: dnf --enablerepo='*debug*' install glibc-debuginfo-2.40-21.fc41.x86_64
(gdb) reverse-step
hit (chance=100, acc_boost=1, evasion_boost=1) at random.c:15
15 }
(gdb) reverse-next
14 return r < chance;
(gdb) record stop
Process record is stopped and all execution logs are deleted.
(gdb) print r
$1 = 100
(gdb) set r = 99
(gdb) print r
$2 = 99
(gdb) finish
Run till exit from #0 hit (chance=100, acc_boost=1, evasion_boost=1)
at random.c:14
0x00000000004012ac in main () at random.c:24
24 if (hit(100, 0, 0))
Value returned is $3 = 1
(gdb) n
Breakpoint 2, main () at random.c:25
25 hits ++; // Should always hit
As you can see, by changing the value of r
in the past, we can change how the program executes, making the function return the correct value (which is a hint for the intrepid readers actually wanting to debug the program). In this debug example, I was just exploring what would happen if the result was just slightly different. However, if you were to try and workshop a solution in that case—because the problem is the return statement—you could instead use the GDB return
command with a different expression, such as return r <= chance
.
Putting them together
Astute readers may have already thought of this, but to spell it out: there’s no reason why you can’t combine these 2 techniques to avoid a slow initiation. If executing the recorded section backwards is faster than waiting for the inferior to start, you can make the following changes to the commands for breakpoints 1 and 2:
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>record stop
>record
>continue
>end
(gdb) command 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>python time.wait(1)
>reverse-continue
>end
The changes should be fairly self-explanatory. For breakpoint 1: we’ll need to stop recording and restart to clear the history because we didn’t restart the inferior, instead of just starting like we did before. For breakpoint 2: instead of run, we just tell the inferior to execute backwards until it hits breakpoint 1.
If you plan to use this modifier time loop version, it is important to understand where non-deterministic things may happen in your program. As an example, this would not work for the example program because the RNG seed is set on line 17. But the breakpoint is set on line 19, so every run of the inferior would be the same.
Reverse debugging
Reverse debugging is a very versatile tool, and I hope that this article has helped expand your horizons on how to use it effectively and creatively. If you’d like a refresher on the basics, read my first article. If you are in need of more debugging tips, check out these articles about how to make full use of breakpoints and how to debug lambdas in GDB.