Debugging a live/stuck Ruby process with GDB, fully explained, plus many goodies!

September 11, 2019 -
Tags: c, debugging, linux, operating_systems, ruby, sysadmin

Debugging a live/stuck Ruby process is a well-known subject.

The way it’s generally exposed is simply a series of instructions and their outcome; given the expectation, this is fine of course, however, when I saw that a manual copy/paste operation was required, I decided to… step in (pun intended 😂).

This article adds only a few concepts, operatively speaking, but it clarifies all the concept involved, and employs neat approaches to accomplish the task. I will also employ several goodies available to Linux systems.

Contents:

Table of contents

Premises

In the following sections I will use the terms “process” and “program”. A process is the executed instance of a program; I use the terms respectively when I refer to the context of in-memory execution as opposite to the static structure of a binary.

Since we’ll work from multiple terminals, I’ll use the convention <terminal>:$ <command>, where <terminal> is ruby, where Ruby test scripts run, and <debug>, where we perform investigations.

Each Ruby script should run until a new one is introduced; all the commands in the debug terminal therefore use the last Ruby script exposed in the article.

The procedure provided in this article is guaranteed to run on Real Operating Systems™, which in turn, is not guaranteed to include systems advertised with people throwing hammers at screens.

A brief overview of debugging a process, and ptrace

From a systems perspective, debugging a process is a collaboration of two services:

  • the operating system debugging API(s), and
  • the debugger.

Both are required because the debugging API(s) are very simple. This is an appropriate design because debugging itself requires specialized processing that is not meaningful in the context of a kernel.

The debugging APIs design varies depending on the operating system. Windows and Linux have very different philosophies:

  • Windows uses a minimal amount of calls, at the core, to manage the debugging events, and leaving the other functionality, like reading a process’ memory, to the standard APIs;
  • Linux instead uses a system call, ptrace, that has much more functionality, even allowing the manipulation of a process’ file descriptors.

This is however, one side of the coin. The debugger, which in Linux is (typically) GDB, has a lot of work to do, as it needs to known how the program is structured, in order to interact with it.

A representative functionality is function call, as there are different so-called calling conventions. Aside the obvious concepts such as the function location, the debugger needs to know how to manage the parameters allocation:

  • which parameter is pushed in the stack first? leftmost or rightmost?
  • who resets the stack? the function callee or the function?
  • are all the parameters pushed in the stack, or are some passed via CPU registers?

Therefore, debuggers need specific support for programs being debugged. GDB supports, among the others, C, C++ and Golang. Note that this does not mean that any other language is not supported; for example, Ruby is written in C, and while we can’t invoke Ruby code directly, we can do it indirectly via the internal program functions.

Using pgrep

When dealing with PIDs, pgrep is a convenience replacement to the typical ps invocations; in general terms, it avoids having to process the ps output.

Our pgrep invocation is:

ruby:$ ruby -e "sleep"

debug:$ pgrep --newest ruby
22127

which returns only the PID, so that we can directly use it in scripting.

We use the --newest options (short form: -n) to select the newest matching processes, in case we have multiple matching processes running (therefore, in the context of this article, we assume that we don’t execute other extraneous Ruby processes after launching the target one).

A brief overview of file descriptors

Operating systems interact with files via file handles, which are the abstract representation of a file (or, more generally, an I/O resource).

In Unix, they’re called file descriptors.

If a process, say, writes to a log, there will be a file descriptor open. Let’s inspect one:

ruby:$ ruby -e 'File.open("/tmp/test.log", "w") { sleep }'

debug:$ ls -l /proc/$(pgrep -n ruby)/fd
total 0
lrwx------ 1 myuser myuser 64 Sep 17 09:56 0 -> /dev/pts/5
lrwx------ 1 myuser myuser 64 Sep 17 09:56 1 -> /dev/pts/5
lrwx------ 1 myuser myuser 64 Sep 17 09:56 2 -> /dev/pts/5
lr-x------ 1 myuser myuser 64 Sep 17 09:56 3 -> 'pipe:[188892]'
l-wx------ 1 myuser myuser 64 Sep 17 09:56 4 -> 'pipe:[188892]'
lr-x------ 1 myuser myuser 64 Sep 17 09:56 5 -> 'pipe:[188893]'
lrwx------ 1 myuser myuser 64 Sep 17 09:56 6 -> /dev/pts/5
l-wx------ 1 myuser myuser 64 Sep 17 09:56 7 -> 'pipe:[188893]'
l-wx------ 1 myuser myuser 64 Sep 17 09:56 8 -> /tmp/test.log

The descriptors 0, 1 and 2 are the standard POSIX descriptors for stdin, stdout and stderr; they’re symlinked to the current terminal:

ruby:$ tty
/dev/pts/5

Curious readers may try to overwrite the symlinks:

debug:$ sudo ln -sf /dev/null /proc/$(pgrep -n ruby)/fd/1
ln: failed to create symbolic link '/proc/6179/fd/1': No such file or directory

This doesn’t work; we’ll need GDB/ptrace for that ☺️.

The reverse search - finding which processes use a certain file, is performed via fuser:

debug:$ fuser /tmp/test.log
/tmp/test.log:        6179

Putting together Ruby and GDB (with fancy grep!)

As described in a previous section, we’ll use GDB to inspect the Ruby process, and call the Linux system calls and Ruby functions that may help.

The system calls involved are:

int open(const char *pathname, int flags);

Given a pathname for a file, open() returns a file descriptor […] The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.

and

int close(int fd);

Closes a file descriptor […]

something important to notice is that if a file descriptor is closed, a subsequent call to open will return that file descriptor. Their application will be explained in the actual debug procedure.

The Ruby C function we’ll use is rb_backtrace(). For fun, let’s find the prototype and the implementation in the Ruby source code:

debug:$ git clone https://github.com/ruby/ruby.git /tmp/ruby

debug:$ grep -P '\brb_backtrace\(' --include="*.h" -r !$
/tmp/ruby/include/ruby/intern.h:void rb_backtrace(void);

debug:$ grep -Pzo '(?s)void\srb_backtrace\(.+?\n\}' --include="*.c" -r !$
./vm_backtrace.c:void
rb_backtrace(void)
{
    vm_backtrace_print(stderr);
}

here we notice that the backtrace is printed to stderr. Readers can follow down the chain out of curiosity.

A few notes about the grep goodies:

  • -P: use only Real Regular Expressions™ (ie. the Perl format);
  • \b: metachacter for word boundary; in this case, we don’t want to match something like print_rb_backtrace(;
  • -z: match multiple lines, by treating the input as a single string joined via null character;
  • -o: print only the match (otherwise, when using -z, the entire file will be printed, because the match unit is the file, not the line);
  • (?s): match newlines with the dot (.) (see Perl Compatible Regular Expressions “dotall” option).

finally, don’t forget to use the non-greedy matcher (.+?), otherwise, the match will proceed until the last occurrent of \n\}!

Basic GDB usage

The starting point for a GDB session is to attach to another process:

ruby:$ ruby -e '1.upto(Float::INFINITY) { |i| $stdout.puts "#{i}. out"; $stderr.puts "#{i}. err"; sleep 2 }'

# short form: `-p <pid>`
debug:$ sudo gdb program $(pgrep -n ruby)

which will halt the attached process.

One can continue via continue (shortcut: c), and halt with Ctrl+C:

(gdb) continue
Continuing.
# Ctrl+C pressed
Thread 1 "ruby" received signal SIGINT, Interrupt.
[...]
(gdb)

The most basic functionality we can use is probably print (shortcut: p):

(gdb) print "abc"
$1 = "abc"
(gdb) p 2 * 3
$2 = 6

something immediately noticeable is that the result is associated with symbols ($) with increasing numbers. They are the “convenience variables”: for any operation (except when nothing is returned), the result is stored in a new instance, that be subsequently used:

(gdb) print $1
$3 = "abc"

since print() returns the value printed, the latter is assigned to a new convenience variable.

We can run a shell command via the shell command:

(gdb) shell ls -ld /tmp
drwxrwxrwt 17 root root 32768 Sep 18 10:38 /tmp

but the output can’t be captured.

Calling functions is one of the most important functionalities:

(gdb) call (void) rb_backtrace()

# result, in the ruby terminal:
  from -e:1:in `<main>'
  from -e:1:in `upto'
  from -e:1:in `block in <main>'
  from -e:1:in `sleep'

there are two notable things:

  1. in some cases we need to specify the function return type; as we’ve seen in the previous section, rb_backtrace has no return value (void), so we need to specify it;
  2. the backtrace is printed in the ruby terminal! in the following section we’ll take care of this 😉

Finally, GDB can execute a command specified from the commandline:

debug:$ sudo gdb --eval-command="p 123"
$1 = 1
(gdb) p $1
$2 = 1

note that the command is executed exactly like if it was by the user, so that the convenience variable is also instantiated, and can be used; this will come useful later.

The procedure

Now we have all the basics required to perform the procedure.

We’ve seen how to print the backtrace of a process, however in daemons/background processes stdout and stderr are typically redirected, for example, to a log file; this implies that if we execute rb_backtrace(), it will go somewhere that is not immediately visible.

Therefore, our procedure will be:

  1. find out the debug terminal device file;
  2. attach to the Ruby process;
  3. replace the stdout and stderr descriptors with the debug terminal;
  4. have fun!

Let’s start the debugging session:

debug:$ sudo gdb program $(pgrep -n ruby)

Now, remembering that:

  • rb_backtrace prints to stderr;
  • the stderr file descriptor is 2;
  • we use close() and open() system calls to work with file descriptors, all we need to do is:
(gdb) shell tty
/dev/pts/0
(gdb) call close(2)               # close the current stderr file descriptor
$1 = 0
(gdb) call open("/dev/pts/0", 1)  # open a descriptor to the debug terminal, in O_WRONLY mode (1)

(note that we open() in O_WRONLY mode, since no reads are performed from stderr)

aaaand… action!:

(gdb) call (void) rb_backtrace()
  from -e:1:in `<main>'
  from -e:1:in `upto'
  from -e:1:in `block in <main>'
  from -e:1:in `sleep'

if we now continue:

(gdb) c
Continuing.
3. err
4. err
5. err

the $stderr.puts() call from the Ruby program now goes to the debug terminal, as expected.

If we wanted to redirect also the stdout calls, we just open/close the file descriptor 1.

The procedure, neater!

The procedure works as expected, however, it requires something horrible: manual copy/paste (of the terminal device file pathname).

How to solve that?

Let’s do some shell trickery. We know that:

  • we can execute a GDB command on startup, and that the commandline option takes a regular string as value;
  • the return value of this command is associated to a convenience variable that we can use in the GDB context;
  • we can therefore interpolate the commandline option string in the shell!

So, we turn this:

debug:$ tty
/dev/pts/0
debug:$ sudo gdb --eval-command="p \"/dev/pts/0\""
$1 = "/dev/pts/0"
(gdb)

into this:

debug:$ sudo gdb --eval-command="p \"$(tty)\""
$1 = "/dev/pts/0"
(gdb)

and we can run a fully automatable sequence:

debug:$ sudo gdb --eval-command="p \"$(tty)\"" program $(pgrep -n ruby)
$1 = "/dev/pts/0"
(gdb) call close(2)
$2 = 0
(gdb) call open($1, 1)
$3 = 2
(gdb) call (void) rb_backtrace()
  from -e:1:in `<main>'
  from -e:1:in `upto'
  from -e:1:in `block in <main>'
  from -e:1:in `sleep'

Safety of messing with file descriptors

While the procedure works fine, it’s important to highlight that messing with file descriptors is not a safe operation. There are 4⁸ possible things that can go wrong, the simplest example being a buffer not flushed (a more extended discussion can be found on Stack overflow).

In this article, the contexts are:

  • a test Ruby process;
  • hypthetically, a hung application server.

even if case #2 is a production environment, a hung process is typically killed.

All in all, one needs to balance the risk of the procedure, with the worst case scenario of the given context.

Having said that, somebody anyway routinely use this strategy for switching log files on the fly - this article gave the reader all the tools to fully understand it.

Other GDB/Ruby tools

The GDB/Ruby potential is unlimited; in this section I’ll briefly examine two concepts.

Evaluating Ruby statements

The function rb_backtrace() is only one of the many available in the Ruby C API. We can for example print debug information and execute eval(); the related APIs are:

void rb_p(VALUE obj);                                       // debug print within C code
VALUE rb_eval_string(const char*);                          // evaluate the given string in an isolated binding
VALUE rb_eval_string_protect(const char *str, int *pstate); // same as above, but store the return value in *pstate instead of raising an exception

Now, the VALUE data type is not available in this GDB context; additionally, the underlying data type is set at compile-time:

debug:$ git clone https://github.com/ruby/ruby.git /tmp/ruby
debug:$ grep -P 'typedef .*VALUE;' --before 1 --include="*.h" -r !$
./include/ruby/ruby.h-#if defined HAVE_UINTPTR_T && 0
./include/ruby/ruby.h:typedef uintptr_t VALUE;
--
./include/ruby/ruby.h-#elif SIZEOF_LONG == SIZEOF_VOIDP
./include/ruby/ruby.h:typedef unsigned long VALUE;
--
./include/ruby/ruby.h-#elif SIZEOF_LONG_LONG == SIZEOF_VOIDP
./include/ruby/ruby.h:typedef unsigned LONG_LONG VALUE;

Commonly, the underlying data type is unsigned long, an at least 32 bits in size integer type (running configure on my machine, it’s resolved to 64 bits), therefore, we’ll use that as return data type on calls:

(gdb) call (void) rb_p((unsigned long) rb_eval_string_protect("puts 1", (int*)0))
# will print the following to the stdout of the associated terminal 
123
nil

note that we discard the pstate variable, by passing a null pointer ((int*)0), therefore, ignoring errors.

An alternative option to redirecting stdout/stderr

Using the eval API above, we can redirect stdout/stderr by reassigning $stdout/$stderr to a logfile:

(gdb) call (unsigned long) rb_eval_string("$stdout = File.open('/tmp/ruby.log', 'w'); $stdout.sync = true")
(gdb) call (unsigned long) rb_eval_string("$stdout.puts 'to the debug log!'")

This approach is referenced in some websites; while it does redirect the $stdout output, it causes the Ruby process to hang (on my machines), likely because it corrupts the interpreter internal state.

Note that Ruby makes available also the constants STDOUT/STDERR, pointing to the original device files. It’s not possible to know if they have been overwritten by the target program; it’s bad practice, but it’s still possible.

Other interesting sources

An interesting article, which uses GDB in depth to debug a Ruby issue is Finding a Ruby bug with GDB.

Conclusion

In this article, I’ve taken a trivial concept that is often presented as a sequence to copy/paste, and extended the concepts (both in breadth and depth), and the tools employed; this subject is, in particular, rather cross-cutting, so there’s plenty of things to dive in.

Enjoy debugging production! 😬

Ticketsolve is hiring!

Ticketsolve, the company I work for, and who sponsors this blog, is hiring a remote Rails developer! See the About page for contacting me 😄