sphinx gettext does not match textdomains in subdirectories when gettext_compact=False

Question:

I’m currently working on a complex documentation project with python sphinx.
My next step is to enable internationalization.

Project overview (simplified):

doc
  build     # contains sphinx build output
  images    # contains image resources
  locales   # gnu gettext structure (simplified)
    enLC_MESSAGESindex.po+mo
    enLC_MESSAGESarticlesconnect.po+mo
    deLC_MESSAGESindex.po+mo
    deLC_MESSAGESarticlesconnect.po+mo
  source
    _static
    articles
      connect.rst
      commission.rst
    troubleshoot
      bugs.rst
    reference
      generated.rst
    about.rst
    conf.py  # contains sphinx configuration
    index.rst
    terminology.rst
  Makefile
Workbench  # contains work contained in generated reference

Localization options in conf.py:

locale_dirs = [
    '../locales/'
]
gettext_compact = False

Rule in Makefile to create html output

html:
    sphinx-build -M html "source" "build" -Dlanguage="de" -v

Rule in Makefile to create *.pot files:

gettext:
    sphinx-build -b gettext "source" "buildgettext"

Rule in Makefile to update localizations:

update_po:
    sphinx-intl update -p "buildgettext" -Dlanguage="en" -Dlanguage="de"

As you may already can tell from the directory structure and path delimiter: I am using Windows 10.

Cutout from build output for make html containing localization output

Running Sphinx v4.2.0
loading translations [de]... done
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 15 changed, 0 removed

My problem is the following:

Sphinx does not match localized strings in textdomains that are contained in a subdirectory of LC_MESSAGES.

I’ve configured sphinx gettext with gettext_compact=False because I want to have separate translation files for each document.
This makes it easier for our team’s workflow to manage translations and progress.

When generating *.pot files using command make gettext I’m using the same configuration.

Now when I generate html/pdf output only the toplevel documents textdomains are processed correctly and localized strings are substituted in the resulting document.
Also no errors are thrown during loading of the translations (as you can see in the cutout above). The number of files also matches the number of documents – I assume until here everything works fine.

I am wondering if this has something to do with windows using a different path seperator than unix? Maybe gettext doesn’t find the correct textdomain because "articles/connect" != "articlesconnect".
Or am I just missing something? I assumed that the make update_po command produces a valid file/directory structure under LC_MESSAGES that gettext is able to process. Is this assumption correct? I haven’t found any information on this topic, yet.

Any help and/or ideas appreciated!

Asked By: walkslowly

||

Answers:

I have found the solution/cause.

My first assumption was that it might have to do with the locale_dirs entry in conf.py.
I moved the directory with *.po files containing sphinx-build localized strings to the location recommended in sphinx-intl docs.
Nothing changed.

When again inspecting the generated *.po files I noticed something weird (I guess).
Some msgid’s were contained in multiple *.po files.
It turned out that sphinx generates a *.po file for each *.rst document in the directory structure or at least for each document that is part of the document hierarchy.
When one document imports another via the include directive the texts of the included document are also treated as part of the including document.
And also the textdomain is matched that way when generating the documentation for a specific language.
This kinda makes sense because the include directive just inserts the contents of the included document in the current document…

To work around this, texts have to be translated in the *.po file of the including document. Texts translated in the *.po file of the actual document are ignored.
I think this behavior applies to the whole recursive stack of documents inluding other documents, but havent tested yet.

Hope someone else finds this useful.
I’m going to accept this answer as the correct answer.

Answered By: walkslowly

I wanted to comment but as a new user without reputation points, I can’t…

We encountered the same question you mentioned in your own answer: translators have to translate text of the included document in the INCLUDING document. This is posing great issues to our translation work, as the same content may be included in multiple places in the source language, and the translators have to translate the same content in all the including documents.

I was wondering if you’ve found a solution to this problem. Any comment would be greatly appreciated! Thank you in advance.

Answered By: June