Coding

Unicode 7 in CentOS 7 TUI Code

I’m in the middle of developing a project in C which uses NCurses and Unicode 7+ characters. This has been working out great in macOS and Fedora 31 where I have been doing most of my testing. But in CentOS I have been having big problems getting the characters to render. This post goes into why and how to fix it.

Precursor

So, if you want NCurses to work with Unicode / wide characters at all you first need to make sure you are linking against ncursesw instead of ncurses, this adds support for all the wide functions. You also need to do the following before initscr() to set the locale correctly and use the correct headers and functions:

#include <locale.h>
#define NCURSES_WIDECHAR 1
#include <ncursesw/curses.h>

...

setlocale(LC_ALL, "");

You should then be able to use NCurses wide characters. It is worth noting that in macOS the default NCurses is not compiled with wide character support, so you need to use the version from homebrew.

Normal stdout

The first hurdle I hit was just trying to get wprintf() and alike to print anything at all. It turns out that CentOS (at least via SSH) will not print wide characters by default. You need to enable support for this by adding this early on in your program:

fwide(stdout, 1);
wprintf(L"Can you see me? 🖥️ \n");

You could of course do a regular printf() of the character and hope it works out. For the most part this should be fine, but you will need to pad with a space afterwards since this is a double-width character.

NCurses

“Great!” I thought. I’ll just plug this into my NCurses based code and it will all print fine… nope! Normal characters echo fine and in fact do not need the fwide() above, but Unicode “emoji” characters were not printing to screen. After doing a bit of research I found the problem comes down to glibc, and this blog post summarises it well.

Basically NCurses uses a function called wcwidth() to figure out how wide characters are when they are printed. The problem is with glibc < 2.22 for Unicode 7+ characters it returned a width of -1 for each character, basically making them invisible.

The NCurses Fix

This means there are two ways of fixing this:
1. Link to a newer version of glibc (which will give all sorts of headaches for distribution).
or
2. Replace wcwidth() with our own implementation.

Of course I opted for the second option and luckily in the blog post above there is a link to an implementation of wcwidth() that worked. The blog post, however, is discussing using LD_PRELOAD to replace the function at runtime. I want this done at compile time, so it requires a little bit of work.

For this I am using CMake but you can do this in any build system, they key parts are:
1. Detect glibc version
2. If glibc < 2.22 also compile wcwidth.c

I found a handy CMake file here which does the glibc detection which solves one part of the puzzle.

In my codebase I put wcwidth.c into a directory called extern/wcwidth and had to add the following lines below #include <wchar.h> to add the missing function prototypes so that it would compile:

int mk_wcwidth(wchar_t ucs);
int mk_wcswidth(const wchar_t *pwcs, size_t n);
int mk_wcwidth_cjk(wchar_t ucs);
int mk_wcswidth_cjk(const wchar_t *pwcs, size_t n);

From there we can now add the missing cmake. I added glibc.cmake to the cmake directory of my source and did the following in CMakeLists.txt (where project_sources is the list of source files for my project):

IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
    IF (GLIBC_VERSION LESS "2.22")
        INCLUDE(cmake/glibc.cmake)
        CHECK_GLIBC_VERSION()
        MESSAGE(STATUS "GLIBC ${GLIBC_VERSION} found, this is less than 2.22, adding wcwidth() override for Unicode 7")
        SET(project_sources ${project_sources} extern/wcwidth/wcwidth.c)
        SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Dmk_wcwidth=wcwidth -Dmk_wcswidth=wcswidth")
    ENDIF ()
ENDIF ()

And that is it. Compiling this in Fedora and macOS still works as normal, in CentOS 7 it adds wcwidth.c and the ‘-D’ directives replace the glibc wcwidth() functions with those in the source file. Now everything renders correctly no matter which current *nix based OS is used!

LinuxJedi

View Comments

Share
Published by
LinuxJedi

Recent Posts

Working with the AVR Dx family

In recent years, Microchip has launched a new range of AVR chips. They appear to…

4 days ago

My WordPress Slack ban

A couple of days ago, I got banned from the WordPress community Slack. As this…

5 days ago

Issues I found during WordPress.com to .org migration

Whilst migrating from wordpress.com to an installation of the open source WordPress, I hit some…

1 week ago

Why I moved my blog

Unfortunately, there is a war going on within the WordPress community. To stay as far…

1 week ago

Four new Amiga products for September 2024

Since the June Amiga Expo, I have been developing some new Amiga related products. I…

3 weeks ago

Repairing an Amiga that caught on fire!

Karl at Retro32 likes to challenge me, and this time he had an interesting one.…

1 month ago