issue when calling libreoffice for pdf generation from python of docx with charts

Question:

using debian 9.5, python 3.5, libreoffice 5.2, x86_64 arch.

I have a word file (docx) of 22 pages, which contains several charts.

When run from terminal using bash, the following command works correctly i.e. generates a pdf file of 22 pages:

/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx

output:

convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export

The issue is the following: the same external command executed from python using subprocess.run produces a pdf file of only one page, instead of 22 pages, with no error message.

No other instances of libreoffice are running.

cmd = '/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'

print(subprocess.run(cmd, shell=True, check=True))

this is the output of this python script:

convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export

CompletedProcess(args=’/usr/bin/libreoffice –headless –convert-to
pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)

Apparently, pdf generation was successfull, but only the first page of docx file was converted.

It seems that the generation of the pdf terminates when libreoffice, started from python, encounters the first chart.

Does libreoffice require the java runtime for generating pdf?

Could there be an issue with headless operations of libreoffice?

Any hint?

Update:

added the ‘env:UserInstallation’ option, when running from python the modified script:

cmd = '/usr/bin/libreoffice -env:UserInstallation=file:///home/marco/  --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'

print(subprocess.run(cmd, shell=True, check=True))

The output is the following, now it contains a warning about not finding a java runtime environment:

javaldx: Could not find a Java Runtime Environment!

Warning: failed to read path from javaldx

convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export

CompletedProcess(args=’/usr/bin/libreoffice
-env:UserInstallation=file:///home/marco/ –headless –convert-to pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)

Any idea on how to specify thorugh command line parameters where libreoffice can find the java runtime environment it needs?

Asked By: mrtexaz

||

Answers:

I found a solution, although it is not clear to me the technical reason:

this WORKS (complete pdf generation using libreoffice of docx file with charts):

PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx

this DOES NOT WORK (partial pdf generation using libreoffice of docx file with charts):

PATH=/home/marco/venv/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx

It seems that python virtualenv causes some sort of conflict with libreoffice. I used strace but found nothing useful.

So the solution for my case is to remove the virtualenv path from PATH environment variable when calling libreoffice from python, and this can be achieved by deactivating virtualenv:

marco@pc:~$ source venv/bin/activate
...
(venv) marco@pc:~$ deactivate && /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
Answered By: mrtexaz

Adding this for anybody else who runs into this issue, the reason you get the error message is because the path variable used by subprocess and passed to libreoffice is not sufficient to find the jre. I ran into this same issue, and changed it to the following which seems to fix it.

subprocess.run(cmd,env={'HOME':'/home/username'})

Answered By: Mr. T

If this is happening for you on AWS lambda, it’s because because libreoffice needs to create a dir in your user’s home directory, but on AWS lambda, the user’s home dir is read-only.

On lambdas, you can only write to temporary directories.

So, the solution is to set a temporary dir as your home directory while you call subprocess.run(...).

import subprocess
import tempfile

temp_dir = tempfile.TemporaryDirectory()
temp_dir_path = temp_dir.name

subprocess.run(
    f"soffice --headless --convert-to pdf {temp_dir_path}/input.xlsx --outdir {temp_dir_path}",
    shell=True,
    check=True,
    # libreoffice needs to create a dir called .cache/dconf in the HOME dir.
    # So HOME  must be writable. But on aws lambda, the default HOME is read-only.
    env={"HOME": temp_dir_path},
)
Answered By: Lucidity
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.