How MariaDB ColumnStore’s filenames work

Unlike most storage engines, MariaDB ColumnStore does not store its data files in the datadir. Instead these are stored in the Performance Modules in what appears to be a strange numbering system. In this post I will walk you through deciphering the number system.

If you are still using InfiniDB with MySQL, the system is exactly the same as outlined in this post, but the default path that the data is stored in will be a little different.

The default path for the data to be stored is /usr/local/mariadb/columnstore/data[dbRoot] where “dbRoot” is the DB root number selected when the ColumnStore system was configured.

From here onwards we are looking at directories with three digits ending in “.dir”. Every filename will be nested in similar to 000.dir/000.dir/003.dir/233.dir/000.dir/FILE000.cdf.

Now, to understand this you first need to understand how ColumnStore’s storage works. As the name implies every column of a table is stored separately. These columns are broken up into “extents” of 2^15 (roughly 8M) entries either 1 or 2 extents (depending on how much data you have) will make up a segment file. Each segment file is given a segment ID and a collection of four segments is given a partition ID. In addition to all this every column is given an “Object ID”.

You can find the object ID for every column using the information_schema.columnstore_columns table and details about every extent, including the partition and segment IDs using the information_schema.columnstore_extents table. This will be useful when working out the file names.

The following is how to work out a filename from an object ID. It should be noted that object IDs are 32bit and the output of each of these parts is converted to decimal:

Part 1: The top byte from the object ID (object ID >> 24)
Part 2: The next byte from the object ID ((object ID & 0x00ff0000) >> 16)
Part 3: The next byte from the object ID ((object ID & 0x0000ff00) >> 8)
Part 4: The last byte from the object ID (object ID & 0x000000ff)
Part 5: The partition ID
Part 6 (the filename): The segment ID

Each part here apart from the final part is a directory appended with “.dir”. The filename is prepended with FILE and appended with “.cdf”. There is of course a much easier way of finding out this information. The information_schema.columnstore_files table will give you the filename for each object/partition/segment combination currently in use as well as the file size information.

Image credit: Marcin Wichary, used under a Creative Commons license

Published by

LinuxJedi

I am a Senior Software Engineer for MariaDB leading the development of MariaDB ColumnStore.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s