Unicode 7 in CentOS 7 TUI Code

I’m in the middle of developing a project in C which uses NCurses and Unicode 7+ characters. This has been working out great in macOS and Fedora 31 where I have been doing most of my testing. But in CentOS I have been having big problems getting the characters to render. This post goes into why and how to fix it.

Precursor

So, if you want NCurses to work with Unicode / wide characters at all you first need to make sure you are linking against ncursesw instead of ncurses, this adds support for all the wide functions. You also need to do the following before initscr() to set the locale correctly and use the correct headers and functions:

#include <locale.h>
#define NCURSES_WIDECHAR 1
#include <ncursesw/curses.h>

...

setlocale(LC_ALL, "");

You should then be able to use NCurses wide characters. It is worth noting that in macOS the default NCurses is not compiled with wide character support, so you need to use the version from homebrew.

Normal stdout

The first hurdle I hit was just trying to get wprintf() and alike to print anything at all. It turns out that CentOS (at least via SSH) will not print wide characters by default. You need to enable support for this by adding this early on in your program:

fwide(stdout, 1);
wprintf(L"Can you see me? 🖥️ \n");

You could of course do a regular printf() of the character and hope it works out. For the most part this should be fine, but you will need to pad with a space afterwards since this is a double-width character.

NCurses

“Great!” I thought. I’ll just plug this into my NCurses based code and it will all print fine… nope! Normal characters echo fine and in fact do not need the fwide() above, but Unicode “emoji” characters were not printing to screen. After doing a bit of research I found the problem comes down to glibc, and this blog post summarises it well.

Basically NCurses uses a function called wcwidth() to figure out how wide characters are when they are printed. The problem is with glibc < 2.22 for Unicode 7+ characters it returned a width of -1 for each character, basically making them invisible.

The NCurses Fix

This means there are two ways of fixing this:
1. Link to a newer version of glibc (which will give all sorts of headaches for distribution).
or
2. Replace wcwidth() with our own implementation.

Of course I opted for the second option and luckily in the blog post above there is a link to an implementation of wcwidth() that worked. The blog post, however, is discussing using LD_PRELOAD to replace the function at runtime. I want this done at compile time, so it requires a little bit of work.

For this I am using CMake but you can do this in any build system, they key parts are:
1. Detect glibc version
2. If glibc < 2.22 also compile wcwidth.c

I found a handy CMake file here which does the glibc detection which solves one part of the puzzle.

In my codebase I put wcwidth.c into a directory called extern/wcwidth and had to add the following lines below #include <wchar.h> to add the missing function prototypes so that it would compile:

int mk_wcwidth(wchar_t ucs);
int mk_wcswidth(const wchar_t *pwcs, size_t n);
int mk_wcwidth_cjk(wchar_t ucs);
int mk_wcswidth_cjk(const wchar_t *pwcs, size_t n);

From there we can now add the missing cmake. I added glibc.cmake to the cmake directory of my source and did the following in CMakeLists.txt (where project_sources is the list of source files for my project):

IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
    IF (GLIBC_VERSION LESS "2.22")
        INCLUDE(cmake/glibc.cmake)
        CHECK_GLIBC_VERSION()
        MESSAGE(STATUS "GLIBC ${GLIBC_VERSION} found, this is less than 2.22, adding wcwidth() override for Unicode 7")
        SET(project_sources ${project_sources} extern/wcwidth/wcwidth.c)
        SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Dmk_wcwidth=wcwidth -Dmk_wcswidth=wcswidth")
    ENDIF ()
ENDIF ()

And that is it. Compiling this in Fedora and macOS still works as normal, in CentOS 7 it adds wcwidth.c and the ‘-D’ directives replace the glibc wcwidth() functions with those in the source file. Now everything renders correctly no matter which current *nix based OS is used!

Published by

LinuxJedi

Lead Software Engineer / Manager for the MariaDB Corporation and an Open Source Software advocate.

One thought on “Unicode 7 in CentOS 7 TUI Code”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.