issue when calling libreoffice for pdf generation from python of docx with charts
Question:
using debian 9.5, python 3.5, libreoffice 5.2, x86_64 arch.
I have a word file (docx) of 22 pages, which contains several charts.
When run from terminal using bash, the following command works correctly i.e. generates a pdf file of 22 pages:
/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
output:
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export
The issue is the following: the same external command executed from python using subprocess.run produces a pdf file of only one page, instead of 22 pages, with no error message.
No other instances of libreoffice are running.
cmd = '/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'
print(subprocess.run(cmd, shell=True, check=True))
this is the output of this python script:
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export
CompletedProcess(args=’/usr/bin/libreoffice –headless –convert-to
pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)
Apparently, pdf generation was successfull, but only the first page of docx file was converted.
It seems that the generation of the pdf terminates when libreoffice, started from python, encounters the first chart.
Does libreoffice require the java runtime for generating pdf?
Could there be an issue with headless operations of libreoffice?
Any hint?
Update:
added the ‘env:UserInstallation’ option, when running from python the modified script:
cmd = '/usr/bin/libreoffice -env:UserInstallation=file:///home/marco/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'
print(subprocess.run(cmd, shell=True, check=True))
The output is the following, now it contains a warning about not finding a java runtime environment:
javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export
CompletedProcess(args=’/usr/bin/libreoffice
-env:UserInstallation=file:///home/marco/ –headless –convert-to pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)
Any idea on how to specify thorugh command line parameters where libreoffice can find the java runtime environment it needs?
Answers:
I found a solution, although it is not clear to me the technical reason:
this WORKS (complete pdf generation using libreoffice of docx file with charts):
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
this DOES NOT WORK (partial pdf generation using libreoffice of docx file with charts):
PATH=/home/marco/venv/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
It seems that python virtualenv causes some sort of conflict with libreoffice. I used strace but found nothing useful.
So the solution for my case is to remove the virtualenv path from PATH environment variable when calling libreoffice from python, and this can be achieved by deactivating virtualenv:
marco@pc:~$ source venv/bin/activate
...
(venv) marco@pc:~$ deactivate && /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
Adding this for anybody else who runs into this issue, the reason you get the error message is because the path variable used by subprocess and passed to libreoffice is not sufficient to find the jre. I ran into this same issue, and changed it to the following which seems to fix it.
subprocess.run(cmd,env={'HOME':'/home/username'})
If this is happening for you on AWS lambda, it’s because because libreoffice needs to create a dir in your user’s home directory, but on AWS lambda, the user’s home dir is read-only.
On lambdas, you can only write to temporary directories.
So, the solution is to set a temporary dir as your home directory while you call subprocess.run(...)
.
import subprocess
import tempfile
temp_dir = tempfile.TemporaryDirectory()
temp_dir_path = temp_dir.name
subprocess.run(
f"soffice --headless --convert-to pdf {temp_dir_path}/input.xlsx --outdir {temp_dir_path}",
shell=True,
check=True,
# libreoffice needs to create a dir called .cache/dconf in the HOME dir.
# So HOME must be writable. But on aws lambda, the default HOME is read-only.
env={"HOME": temp_dir_path},
)
using debian 9.5, python 3.5, libreoffice 5.2, x86_64 arch.
I have a word file (docx) of 22 pages, which contains several charts.
When run from terminal using bash, the following command works correctly i.e. generates a pdf file of 22 pages:
/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
output:
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_Export
The issue is the following: the same external command executed from python using subprocess.run produces a pdf file of only one page, instead of 22 pages, with no error message.
No other instances of libreoffice are running.
cmd = '/usr/bin/libreoffice --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'
print(subprocess.run(cmd, shell=True, check=True))
this is the output of this python script:
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_ExportCompletedProcess(args=’/usr/bin/libreoffice –headless –convert-to
pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)
Apparently, pdf generation was successfull, but only the first page of docx file was converted.
It seems that the generation of the pdf terminates when libreoffice, started from python, encounters the first chart.
Does libreoffice require the java runtime for generating pdf?
Could there be an issue with headless operations of libreoffice?
Any hint?
Update:
added the ‘env:UserInstallation’ option, when running from python the modified script:
cmd = '/usr/bin/libreoffice -env:UserInstallation=file:///home/marco/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx'
print(subprocess.run(cmd, shell=True, check=True))
The output is the following, now it contains a warning about not finding a java runtime environment:
javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx
convert /tmp/docx5/output.docx -> /tmp/docx5//output.pdf using filter
: writer_pdf_ExportCompletedProcess(args=’/usr/bin/libreoffice
-env:UserInstallation=file:///home/marco/ –headless –convert-to pdf –outdir /tmp/docx5/ /tmp/docx5/output.docx’, returncode=0)
Any idea on how to specify thorugh command line parameters where libreoffice can find the java runtime environment it needs?
I found a solution, although it is not clear to me the technical reason:
this WORKS (complete pdf generation using libreoffice of docx file with charts):
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
this DOES NOT WORK (partial pdf generation using libreoffice of docx file with charts):
PATH=/home/marco/venv/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
It seems that python virtualenv causes some sort of conflict with libreoffice. I used strace but found nothing useful.
So the solution for my case is to remove the virtualenv path from PATH environment variable when calling libreoffice from python, and this can be achieved by deactivating virtualenv:
marco@pc:~$ source venv/bin/activate
...
(venv) marco@pc:~$ deactivate && /usr/bin/libreoffice -env:UserInstallation=file:///tmp/docx5/ --headless --convert-to pdf --outdir /tmp/docx5/ /tmp/docx5/output.docx
Adding this for anybody else who runs into this issue, the reason you get the error message is because the path variable used by subprocess and passed to libreoffice is not sufficient to find the jre. I ran into this same issue, and changed it to the following which seems to fix it.
subprocess.run(cmd,env={'HOME':'/home/username'})
If this is happening for you on AWS lambda, it’s because because libreoffice needs to create a dir in your user’s home directory, but on AWS lambda, the user’s home dir is read-only.
On lambdas, you can only write to temporary directories.
So, the solution is to set a temporary dir as your home directory while you call subprocess.run(...)
.
import subprocess
import tempfile
temp_dir = tempfile.TemporaryDirectory()
temp_dir_path = temp_dir.name
subprocess.run(
f"soffice --headless --convert-to pdf {temp_dir_path}/input.xlsx --outdir {temp_dir_path}",
shell=True,
check=True,
# libreoffice needs to create a dir called .cache/dconf in the HOME dir.
# So HOME must be writable. But on aws lambda, the default HOME is read-only.
env={"HOME": temp_dir_path},
)