Databases

How MariaDB ColumnStore’s filenames work

Unlike most storage engines, MariaDB ColumnStore does not store its data files in the datadir. Instead these are stored in the Performance Modules in what appears to be a strange numbering system. In this post I will walk you through deciphering the number system.

If you are still using InfiniDB with MySQL, the system is exactly the same as outlined in this post, but the default path that the data is stored in will be a little different.

The default path for the data to be stored is /usr/local/mariadb/columnstore/data[dbRoot] where “dbRoot” is the DB root number selected when the ColumnStore system was configured.

From here onwards we are looking at directories with three digits ending in “.dir”. Every filename will be nested in similar to 000.dir/000.dir/003.dir/233.dir/000.dir/FILE000.cdf.

Now, to understand this you first need to understand how ColumnStore’s storage works. As the name implies every column of a table is stored separately. These columns are broken up into “extents” of 2^15 (roughly 8M) entries either 1 or 2 extents (depending on how much data you have) will make up a segment file. Each segment file is given a segment ID and a collection of four segments is given a partition ID. In addition to all this every column is given an “Object ID”.

You can find the object ID for every column using the information_schema.columnstore_columns table and details about every extent, including the partition and segment IDs using the information_schema.columnstore_extents table. This will be useful when working out the file names.

The following is how to work out a filename from an object ID. It should be noted that object IDs are 32bit and the output of each of these parts is converted to decimal:

Part 1: The top byte from the object ID (object ID >> 24)
Part 2: The next byte from the object ID ((object ID & 0x00ff0000) >> 16)
Part 3: The next byte from the object ID ((object ID & 0x0000ff00) >> 8)
Part 4: The last byte from the object ID (object ID & 0x000000ff)
Part 5: The partition ID
Part 6 (the filename): The segment ID

Each part here apart from the final part is a directory appended with “.dir”. The filename is prepended with FILE and appended with “.cdf”. There is of course a much easier way of finding out this information. The information_schema.columnstore_files table will give you the filename for each object/partition/segment combination currently in use as well as the file size information.

Image credit: Marcin Wichary, used under a Creative Commons license

LinuxJedi

Share
Published by
LinuxJedi

Recent Posts

Upgrading the RAM Detective: A Firmware Adventure with RAMCHECK

The firmware in my RAMCHECK is very old, there were many updates since then. Unfortunately,…

13 hours ago

The Ultimate RAM Detective: Meet the Innoventions RAMCHECK

Whilst repairing vintage machines, a lot of RAM passes by my benches. Most of it…

2 weeks ago

Vintage Speed Demon: Fixing an ARK1000VL Graphics Card

According to some, the ARK1000VL is considered the fastest VLB graphics card chip you can…

4 weeks ago

Using rr On Newer Intel CPUs

If, like me, you have a newer Intel hybrid CPU, with P-Cores and E-Cores, you…

1 month ago

The Legend Continues: Amiga 1000 Keyboard Revival

I have restored the boxed Amiga 1000 main unit and the mice that came with…

1 month ago

Amiga 4000 Repair: This one was just weird

I was recently sent an Amiga 4000 motherboard repair. It should have been quite straightforward,…

1 month ago