Protocol reverse engineering with tcpdump

Sometimes network protocols don’t entirely behave as documented. Other times there is no documentation at all beyond code. Either way you can sometimes find a need to sniff the traffic of a connection to find out what is really going on.

Whilst I have been working on MariaDB ColumnStore for a year now there are still some parts of the codebase I know little about. I recently had to write some code that worked with the network protocol of ColumnStore, but there were a few parts that were difficult to understand exactly what was happening just by looking at the code. This is where tcpdump came in.

tcpdump is a powerful tool to help you sniff the raw packet data for network connections. It can be very verbose giving parts of the TCP/IP handshake, headers, etc… This is way more than I often need for reverse engineering network protocols so I use tcpflow to filter the results. The final command looks a little like this:

sudo tcpdump -i lo -l -w - port <PORT> | tcpflow -D -C -r -

Breaking this down we are listening on localhost interface with a line buffered output to pipe using raw packets. We then use tcpflow to just show the hex data when reading from the pipe.

If we look at port 8616 (DBRM controller) for ColumnStore the end result can look a little like this during a small insert query:

0000: 37c1 fb14 0500 0000 3100 0000 00 7.......1....

0000: 37c1 fb14 0600 0000 0000 0000 0000 7.............

0000: 37c1 fb14 0100 0000 2d 7.......-

0000: 37c1 fb14 0d00 0000 00bd 1d00 0000 0000 0000 0000 00 7....................

0000: 37c1 fb14 0100 0000 34 7.......4

0000: 37c1 fb14 0500 0000 0029 0000 00 7........)...

0000: 37c1 fb14 9100 0000 1a05 0000 0000 102d 0000 0000 0000 0000 0000 0000 80ff ffff 7..............-................
0020: ffff ffff 7ffe ffff ff00 202d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff .......... -....................
0040: 7ffe ffff ff00 302d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ......0-........................
0060: ff00 502d 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ff00 702d ..P-..........................p-
0080: 0000 0000 0000 0000 0000 0000 80ff ffff ffff ffff 7ffe ffff ff .........................

From observing the ColumnStore messaging code I know that “37c1 fb14” is an uncompressed packet header and the next 4 bytes are the packet length. The next byte is usually packet type (or response) which we can lookup some ENUMs to discover. From there we can figure out the rest packet contents. I won’t go into details here but on some occasions it required printing off this data and using highlighters to figure out the parts of the packet.

This method has been extremely useful for other things in the past as well such as debugging MySQL’s replication protocol. It is definitely part of my toolset for working on network daemons. If there are any similar tools you use please put them in the comments below. I’m always interested in improving my workflow and toolset.

Image credit: Terry Robinson, used under a Creative Commons license

Published by

LinuxJedi

I am a Senior Software Engineer for MariaDB leading the development of MariaDB ColumnStore.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s