Linux

Protocol reverse engineering with tcpdump

Sometimes network protocols don’t entirely behave as documented. Other times there is no documentation at all beyond code. Either way you can sometimes find a need to sniff the traffic of a connection to find out what is really going on.

Whilst I have been working on MariaDB ColumnStore for a year now there are still some parts of the codebase I know little about. I recently had to write some code that worked with the network protocol of ColumnStore, but there were a few parts that were difficult to understand exactly what was happening just by looking at the code. This is where tcpdump came in.

tcpdump is a powerful tool to help you sniff the raw packet data for network connections. It can be very verbose giving parts of the TCP/IP handshake, headers, etc… This is way more than I often need for reverse engineering network protocols so I use tcpflow to filter the results. The final command looks a little like this:

sudo tcpdump -i lo -l -w - port <PORT> | tcpflow -D -C -r -

Breaking this down we are listening on localhost interface with a line buffered output to pipe using raw packets. We then use tcpflow to just show the hex data when reading from the pipe.

If we look at port 8616 (DBRM controller) for ColumnStore the end result can look a little like this during a small insert query:

0000: 37c1 fb14 0500 0000 3100 0000 00 7.......1....

0000: 37c1 fb14 0600 0000 0000 0000 0000 7.............

0000: 37c1 fb14 0100 0000 2d 7.......-

0000: 37c1 fb14 0d00 0000 00bd 1d00 0000 0000 0000 0000 00 7....................

0000: 37c1 fb14 0100 0000 34 7.......4

0000: 37c1 fb14 0500 0000 0029 0000 00 7........)...

0000: 37c1 fb14 9100 0000 1a05 0000 0000 102d 0000 0000 0000 0000 0000 0000 80ff ffff 7..............-................
0020: ffff ffff 7ffe ffff ff00 202d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff .......... -....................
0040: 7ffe ffff ff00 302d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ......0-........................
0060: ff00 502d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ff00 702d ..P-..........................p-
0080: 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ff .........................

From observing the ColumnStore messaging code I know that “37c1 fb14” is an uncompressed packet header and the next 4 bytes are the packet length. The next byte is usually packet type (or response) which we can lookup some ENUMs to discover. From there we can figure out the rest packet contents. I won’t go into details here but on some occasions it required printing off this data and using highlighters to figure out the parts of the packet.

This method has been extremely useful for other things in the past as well such as debugging MySQL’s replication protocol. It is definitely part of my toolset for working on network daemons. If there are any similar tools you use please put them in the comments below. I’m always interested in improving my workflow and toolset.

Image credit: Terry Robinson, used under a Creative Commons license

LinuxJedi

Recent Posts

Why Recapping Isn’t Always the Cure: And Amiga 1200 Repair Story

I often see on places such as Facebook that an Amiga owner will show a…

2 weeks ago

KDE Plasma Automatic Time Zone

I have been a full time KDE Plasma user for quite a while now. Whilst…

3 weeks ago

The wolfDemo Board Story: From Idea to Reality

I work building open-source cybersecurity solutions for wolfSSL. These solutions often involve embedded environments, which…

4 weeks ago

Upgrading the RAM Detective: A Firmware Adventure with RAMCHECK

The firmware in my RAMCHECK is very old, there were many updates since then. Unfortunately,…

4 months ago

The Ultimate RAM Detective: Meet the Innoventions RAMCHECK

Whilst repairing vintage machines, a lot of RAM passes by my benches. Most of it…

4 months ago

Vintage Speed Demon: Fixing an ARK1000VL Graphics Card

According to some, the ARK1000VL is considered the fastest VLB graphics card chip you can…

5 months ago