Batch convert PDFs to CSVs

Question:

What am I doing wrong?
Here is the code that I attempted:

import glob
import tabula

for filepath in glob.iglob('C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
    tabula.convert_into(filepath, pages="all", output_format='csv')

Error:

TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 6>()
      5 # transform the pdfs into excel files
      6 for filepath in glob.iglob(C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
----> 7     tabula.convert_into(filepath, pages="all", output_format='csv')

TypeError: convert_into() missing 1 required positional argument: 'output_path'
Asked By: Kenny

||

Answers:

it appears you have not defined the output_path location for your converted pdf

import glob import tabula

for filepath in glob.iglob(‘C:/Users/username/Downloads/folder with
space/myfolderwithpdfs/*.pdf’):
tabula.convert_into(filepath, pages="all", output_format=’csv’, output_path="C:/Users/username/Downloads/new Folder with CSvs")

Answered By: Gitago

This will read the pdf files in your Download folder then convert it into tabular using csv format.

import os
import glob
import tabula

path="/Users/username/Downloads/"
for filepath in glob.glob(path+'*.pdf'):
    name=os.path.basename(filepath)
    tabula.convert_into(input_path=filepath, 
                        output_path=path+name+".csv",
                        pages="all")
Answered By: jose_bacoy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.