Categories: CodingLinux

Coding and decoding crash dump handlers

All software has bugs. Even if you could possibly write the perfect bug free software all the layers down have bugs. Even CPUs as can be seen with the recent Meltdown and Spectre bugs. This means unfortunately sometimes software will crash. When this happens it is useful to capture as much information as possible to try and stop it happening again.

One of the first things I did when coming back to work from the holiday break is code a new crash dump handler to be used in MariaDB ColumnStore. This will spit out a stack trace for the current thread into a file upon a crash. It is very useful for daemons to try and find the root cause of a problem without running through a debugger.

Compiler Options

The first thing you will want to do is enable useful debugging symbols and frame pointers to your binary compilations. This may add a tiny overhead to binary execution, a few percent at most but it is worth it to be able to run a postmortem on crashes. The useful options are “-g” and “-fno-omit-frame-pointer”.

Crash Handler

This is a basic crash handler, it will dump the crash data into a file with the filename of the PID of the process in /tmp. You will likely want to expand on this to add more information and error handling. The important thing is to try and avoid mallocs as much as possible:

#include <execinfo.h>

void fatalHandler(int sig)
{
  char filename[128];
  void* addrs[128];
  snprintf(filename, 128, "/tmp/%d.log", getpid());
  FILE* logfile = fopen(filename, "w");
  char s[30];
  struct tm tim;
  time_t now;
  now = time(NULL);
  tim = *(localtime(&now));
  strftime(s, 30, "%F %T", &tim);
  fprintf(logfile, "Date/time: %s\n", s);
  fprintf(logfile, "Signal: %d\n\n", sig);
  fflush(logfile);
  int fd = fileno(logfile);
  int count = backtrace(addrs, sizeof(addrs) / sizeof(addrs[0]));
  backtrace_symbols_fd(addrs, count, fd);
  fclose(logfile);
  struct sigaction sigact;
  memset(&sigact, 0, sizeof(sigact));
  sigact.sa_handler = SIG_DFL;
  sigaction(sig, &sigact, NULL);
  raise(sig);
}

This opens the file, writes the current time/date into it as well as the signal number that generated the crash. It then gets the backtrace and writes it into the file. We then reset the signal handler to default. You’ll need some more headers than this example, but execinfo.h, which is part of glibc, provides the backtrace functionality.

Adding to Application

Somewhere near the beginning of your ‘main’ function you need to add signal handler hooks, you’ll need to include ‘signal.h’ for this to work:

  struct sigaction crsh;
  memset(&crsh, 0, sizeof(crsh));
  crsh.sa_handler = fatalHandler;
  sigaction(SIGSEGV, &crsh, 0);
  sigaction(SIGABRT, &crsh, 0);
  sigaction(SIGFPE, &crsh, 0);

Testing

Once compiled and running an easy way to test this is to send a signal to an application to tell it that it has crashed. You can do this with “kill -11 <PID>”. You should find the crash dump in /tmp.

Analysing

The crash dump file will have a list of function calls and address offsets. This may be useful but you can use the same binaries to generate source line numbers. The following is an example from a MariaDB ColumnStore binary:

Date/time: 2018-01-03 15:47:16
Signal: 6

[0x5573f0e18014]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fda7d43a390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fda7b98b428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fda7b98d02a]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7fda7c2ce84d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7fda7c2cc6b6]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7fda7c2cc701]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7fda7c2cc919]
/usr/local/mariadb/columnstore/lib/libmessageqcpp.so.1(_ZN11messageqcpp18MessageQueueClient5setupEb+0x194)[0x7fda7ea19e84]
/usr/local/mariadb/columnstore/lib/libmessageqcpp.so.1(_ZN11messageqcpp18MessageQueueClientC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPN6config6ConfigEb+0xd6)[0x7fda7ea1b566]
/usr/local/mariadb/columnstore/lib/libjoblist.so.1(_ZN7joblist21DistributedEngineComm5SetupEv+0x665)[0x7fda816663a5]
/usr/local/mariadb/columnstore/lib/libjoblist.so.1(_ZN7joblist21DistributedEngineCommC2EPNS_15ResourceManagerEb+0x1ed)[0x7fda8166807d]
/usr/local/mariadb/columnstore/lib/libjoblist.so.1(_ZN7joblist21DistributedEngineComm8instanceEPNS_15ResourceManagerEb+0x4a)[0x7fda816681fa]
[0x5573f0e01fdb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fda7b976830]
[0x5573f0e056d9]

The first useful line in this dump is:

/usr/local/mariadb/columnstore/lib/libmessageqcpp.so.1(_ZN11messageqcpp18MessageQueueClient5setupEb+0x194)[0x7fda7ea19e84]

We use the C++ mangled function with the tool ‘nm’ to get the base address:

nm /usr/local/mariadb/columnstore/lib/libmessageqcpp.so | grep _ZN11messageqcpp18MessageQueueClient5setupEb

0000000000011cf0 T _ZN11messageqcpp18MessageQueueClient5setupEb

Then in a hex calculator we add the offset from the stack dump (0x194) to 0x11cf0 which ‘nm’ provided above. This gives us 0x11e84. We can pass this to the utility ‘addr2line’ to get the line number:

addr2line -e /usr/local/mariadb/columnstore/lib/libmessageqcpp.so 0x11e84

/home/linuxjedi/Programming/Git/mariadb-columnstore-server/mariadb-columnstore-engine/utils/messageqcpp/messagequeue.cpp:170 (discriminator 2)

That line in the source is:

throw runtime_error(msg);

This uncaught exception is exactly what triggered this crash.

LinuxJedi

View Comments

  • In some cases nm needs -d option (dynamic instead .symtab section)

Recent Posts

Reviving an Amiga 600: From Dead Video to a Clean Boot

I managed to score an Amiga 600 motherboard which was faulty for £41. This weekend…

2 weeks ago

The Amiga 1200 That Fought Back: The Faults I Missed the First Time

I recently repaired an Amiga 1200 with a difficult to find fault. Unfortunately, it came…

3 weeks ago

Why Recapping Isn’t Always the Cure: And Amiga 1200 Repair Story

I often see on places such as Facebook that an Amiga owner will show a…

1 month ago

KDE Plasma Automatic Time Zone

I have been a full time KDE Plasma user for quite a while now. Whilst…

1 month ago

The wolfDemo Board Story: From Idea to Reality

I work building open-source cybersecurity solutions for wolfSSL. These solutions often involve embedded environments, which…

2 months ago

Upgrading the RAM Detective: A Firmware Adventure with RAMCHECK

The firmware in my RAMCHECK is very old, there were many updates since then. Unfortunately,…

5 months ago