What is the difference between C.UTF-8 and en_US.UTF-8 locales?

Question:

I’m migrating a Python application from an Ubuntu server with a en_US.UTF-8 locale to a new Debian server which comes with C.UTF-8 already set by default. I’m trying to understand if there could be any impact from this change.

Asked By: Marcelo

||

Answers:

There might be some impact as they differ in sorting orders, upper-lower case relationships, collation orders, thousands separators, default currency symbol and more.

C.utf8 = POSIX standards-compliant default locale. Only strict ASCII characters are valid, extended to allow the basic use of UTF-8

en_US.utf8 = American English UTF-8 locale.

Though I’m not sure about the specific effect you might encounter, but I believe you can set the locale and encoding inside your application if needed.

Answered By: adambg

In general C is for computer, en_US is for people in US who speak English (and other people who want the same behaviour).

The for computer means that the strings are sometime more standardized (but still in English), so an output of a program could be read from an other program. With en_US, strings could be improved, alphabetic order could be improved (maybe by new rules of Chicago rules of style, etc.). So more user-friendly, but possibly less stable. Note: locales are not just for translation of strings, but also for collation (alphabetic order, numbers (e.g. thousand separator), currency (I think it is safe to predict that $ and 2 decimal digits will remain), months, day of weeks, etc.

In your case, it is just the UTF-8 version of both locales.

In general it should not matter. I usually prefer en_US.UTF-8, but usually it doesn’t matter, and in your case (server app), it should only change log and error messages (if you use locale.setlocale(). You should handle client locales inside your app. Programs that read from other programs should set C before opening the pipe, so it should not really matter.

As you see, probably it doesn’t matter. You may also use POSIX locale, also define in Debian. You get the list of installed locales with locale -a.

Note: Micro-optimization will prescribe C/C.UTF-8 locale: no translation of files (gettext), and simple rules on collation and number formatting, but this should visible only on server side.

Answered By: Giacomo Catenazzi

Here are some reasons why I added LC_TIME=C.UTF-8 in /etc/default/locale, in case it helps someone:

It provides a 24-hour clock instead of AM/PM in Firefox for HTML5 input type=time (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/time) and uses a datepicker in the format DD/MM/YYYY instead of MM/DD/YYYY for HTML5 input type=date (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date).

It allows to use YYYY-MM-DD international date format (ISO 8601) with a 24-hour clock when replying to emails in Thunberbird.

Previously, it was possible with LC_TIME=en_DK.UTF-8 (http://kb.mozillazine.org/Date_display_format) but there is a bug currently and it stopped working (https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c155).

Edit: Now even the LC_TIME=C.UTF-8 workaround does not work for Thunberbird (https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c197) but at least en_IE.UTF-8 provides the European date format DD/MM/YYYY instead of MM/DD/YYYY.

Answered By: baptx

I can confirm there is effect on different locales (C.UTF8 vs en_US.UTF8). I recently deployed one python program into a new server, and it performed differently. The old and new servers are both Ubuntu 18 servers, and the only difference is the locale (C.UTF8 vs en_US.UTF8). After setting the locale in new server as C.UTF8, they behave the same now.

It is easy to set the locale for a single application in Linux environment. You just need to add export LANG=C.UTF8; before your application. Assume you execute you application as python myprogram.py, then you type:

export LANG=C.UTF8; python myprogram.py

Answered By: Ben Lin
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.