Introduction: Display Unicode in Arduino

This instructables show how to display Unicode text in Arduino.

Supplies

Any Arduino dev board with Arduino_GFX supported display.

Ref.:

https://github.com/moononournation/Arduino_GFX

Step 1: Unicode & UTF-8

Unicode defines 144k+ characters covering 159 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.
Unicode can be implemented by different character encodings. The Unicode standard defines Unicode Transformation Formats (UTF): UTF-8, UTF-16, and UTF-32, and several other encodings.

For better backward compatible Reason Arduino IDE, most recent OS and web page using UTF-8 encoding.

UTF-8 was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well.

Ref.:

https://en.wikipedia.org/wiki/Unicode

https://en.wikipedia.org/wiki/UTF-8

https://en.wikipedia.org/wiki/ASCII

Step 2: Why Need UTF-8?

Some projects can present well only using ASCII characters without Unicode.

And some require Unicode for multiple languages support, e.g.:

  • Bluetooth device that showing your mobile notification
  • RSS reader display that feed latest news
  • Domestic weather report panel
  • Social network comments dashboard
  • ebook
  • and more text displaying projects

Step 3: Extended ASCII

Arduino_GFX inherited from Adafruit_GFX, default using a classic fixed-space bitmap font since Adafruit_GFX 1.0. This font called glcdfont, sized at 5 x 7 pixels, containing 128 ASCII characters and 128 Extended ASCII characters. You can view all the characters in AsciiTable example.

This is the story before enable UTF-8 encoding, Arduino_GFX can toggle UTF-8 encoding by function:

gfx->setUTF8Print(true);

After enabled UTF-8 encoding, the extended ASCII characters cannot be used. But you can use the corresponding UTF-8 encoded characters instead.

For example, printing the degree celsius sign in extended ASCII is:

gfx->print("\xF8""C");

Since Arduino IDE can direct using UTF-8 encoding string, so printing same sign in UTF-8 is:

gfx->print("°C");

Or:

gfx->print("℃");

Depends on which character glyph included in the selected UTF-8 font file.

Step 4: Font Data Size

As mentioned, unicode containing over 144k characters, it is not easy to pack all in an Arduino program.

Unifont is one of font type that containing most common defined UTF-8 characters. In latest unifont_jp-14.0.02 version, it contains 57389 glyphs, and the BCF format font file sized 9.4 MB.

Common AVR family dev board only have 32 KB flash store the program; ESP8266 has 4 MB flash but still limited the program to around 1 MB; RTL8720DN can store 2 MB program; ESP32 Huge APP mode can store 3 MB program; Raspberry Pi Pico can store 2 MB program (some variations can store up to 16 MB).


Ref.:

https://en.wikipedia.org/wiki/GNU_Unifont

http://unifoundry.com/pub/unifont/unifont-14.0.02/font-builds/

Step 5: U8g2 Font

Arduino_GFX adopted U8g2 font format as UTF-8 solution. U8g2 font support UTF-8 encoding, and also U8g2 provide some tools to convert font file to Arduino source file.

bdfconv is one of the U8g2 provided tools, it can convert the unifont bdf font file to Arduino source file. The output binary is in compressed format and also bdfconv can select the encoding range to output, both feature can reduce the data size.

Ref.:

https://github.com/olikraus/u8g2/wiki/u8g2fontformat

https://github.com/olikraus/u8g2/tree/master/tools/font/bdfconv

Step 6: Select Font Subset

Since we cannot simply squeeze a full set of Unifont glyphs into limited program space, we need select a subset of glyphs that will be used in specific project.

U8g2 already prepared lots of unifont subset for various languages, e.g.:

  • u8g2_font_unifont_t_polish
  • u8g2_font_unifont_t_vietnamese1
  • u8g2_font_unifont_t_chinese2
  • u8g2_font_unifont_t_japanese1
  • u8g2_font_unifont_t_korean1

Some languages still cannot fit all glyphs in Arduino, so it has different size subsets for different requirements, e.g. Chinese font has 3 subsets:

  • u8g2_font_unifont_t_chinese1 - sized 14,178 bytes
  • u8g2_font_unifont_t_chinese2 - sized 20,225 bytes
  • u8g2_font_unifont_t_chinese3 - sized 37,502 bytes

You can refer U8g2 Github Wiki for more details:

https://github.com/olikraus/u8g2/wiki/fntgrpunifont

Step 7: Arduino_GFX Prepared Font Files

As mentioned in previous steps, some MCU can store program size up to 1-3 MB. We can tailor-made a font file that can display as much glyphs as possible. Here are some extra font files prepared in Arduino_GFX:

  • u8g2_font_unifont_h_utf8
  • u8g2_font_unifont_t_chinese
  • u8g2_font_unifont_t_chinese4
  • u8g2_font_unifont_t_cjk

The source BDF font bitmap is using unifont_jp-14.0.02 and the converting tool is U8g2 provided bdfconv.

Step 8: Custom Font: U8g2_font_unifont_h_utf8

This font included all glyphs in unifont_jp-14.0.02.
Number of Glyph: 57,389
Data size: 2,250,360 bytes
Converting script:
bdfconv -v -f 1 -b 1 -m "0-1114111" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_h_utf8.h -n u8g2_font_unifont_h_utf8

Note:

Since the font data itself is over 2 MB, only ESP32 family Huge app mode can store the program. Some specific version of Raspberry Pi Pico have more than 2 MB flash but I have not tested it yet.

Step 9: Custom Font: U8g2_font_unifont_t_chinese

This font included all Chinese character range glyphs.

Number of Glyph: 22,145

Data Size: 979,557 bytes

Converting script:

bdfconv -v -f 1 -m "32-127,11904-12351,19968-40959,63744-64255,65280-65376" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_chinese.h -n u8g2_font_unifont_t_chinese

Step 10: Custom Font: U8g2_font_unifont_t_chinese4

Since ESP8266 have 1 MB program size limit, all Chinese characters still cannot fit in it. It is required another subset narrow down to common used character only.

The common used characters list came from 常用國字標準字體表 in 字集 and 字表:中国常用字 in GlyphWiki.

Number of Glyph: 7,199

Data Size: 298,564 Bytes

Converting script:

bdfconv -v -f 1 -M common.txt unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_chinese4.h -n u8g2_font_unifont_t_chinese4

Step 11: Custom Font: U8g2_font_unifont_t_cjk

This font contains all Chinese, Japanese and Korean characters. Those 3 languages shared 92,865 CJK Unified Ideographs, so it is handy that can use one font file for display 3 different languages.

Number of Glyph: 41364

Data Size: 1,704,862 Bytes

Converting script:

bdfconv -v -f 1 -m "32-127,4352-4607,11904-12255,12288-19903,19968-40943,43360-43391,44032-55203,55216-55295,63744-64255,65072-65103,65280-65519" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_cjk.h -n u8g2_font_unifont_t_cjk

Ref.:

https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

https://stackoverflow.com/questions/56310609/what-the-chinese-japanese-and-korean-characters-are-in-unicode

Step 12: Software Preparation

Arduino IDE

Download and install Arduino IDE if you are not yet do it:

https://www.arduino.cc/en/main/software

Arduino_GFX Library

Open Arduino IDE Library Manager by select "Tools" menu -> "Manager Libraries...". Search "GFX for various displays" and press "install" button.

You may refer my previous instructables for more information about Arduino_GFX.

Step 13: Unicode Example

Arduino_GFX provided various Unicode example in U8g2Font sub folder. In Arduino IDE, select "File" menu -> "Examples" -> "GFX Library for Arduino" -> "U8g2Font". 4 out of 5 examples are Unicode examples:

  • U8g2FontPrintUTF8 - print Hello World in various languages with U8g2 built-in fonts
  • U8g2FontUTF8Chinese - print a sample Chinese article with the font file u8g2_font_unifont_t_chinese
  • U8g2FontUTF8FullCJK - print a simple greeting message in Chinese, Japanese and Korean with the font file u8g2_font_unifont_t_cjk
  • U8g2FontUTF8FullUnifont - print Hello World in 74 languages with the font file u8g2_font_unifont_h_utf8
  • U8g2RssReader - print an online RSS feed message with font file u8g2_font_unifont_t_chinese4


Step 14: Happy Texting!

Now your Arduino projects have just broken the ASCII text limit! Enjoy!