Trying to read a docx file using FastAPI and python-docx library: AttributeError: 'bytes' object has no attribute 'seek' error

Question:

I’m using FastAPI (not async) and python-docx library, trying to read a docx file.
I’m getting an error while trying to read the docx file.

My code –

@app.post('/translate_docx', response_class=PlainTextResponse)
def translateDocx(docFile: UploadFile = File(...), fileExtension: str = Form(...)):
 
    if(fileExtension == 'docx'):
        raw_txt = readDocx(docFile.file.read())

    return raw_txt


def readDocx(file):
    doc = Document(file)
    txt = ""
    for para in doc.paragraphs:
        txt = txt + para.text
    return txt

Logs:

File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/pkgreader.py", line 32, in from_file
    phys_reader = PhysPkgReader(pkg_file)
  File "/translateProject/.venv/lib/python3.7/site-packages/docx/opc/phys_pkg.py", line 101, in __init__
    self._zipf = ZipFile(pkg_file, 'r')
    
  File "/usr/lib/python3.7/zipfile.py", line 1258, in __init__
    self._RealGetContents()
    
  File "/usr/lib/python3.7/zipfile.py", line 1321, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/usr/lib/python3.7/zipfile.py", line 259, in _EndRecData
    fpin.seek(0, 2)
    
AttributeError: 'bytes' object has no attribute 'seek'

What is wrong in my code ? Any help would be helpful.

Asked By: Gissipi_453

||

Answers:

Don’t .read() the file that is given to Document(). Just give it the filename or you can give it an open file that has it’s cursor set on offset 0. If the "file" you have is already a bytes object then you can use an io.BytesIO "in-memory" file to give to Document().

The docx_file parameter in Document(docx_file) can be a str file path or can be a file-like object (an open file created with open(...) or an in-memory file created with io.BytesIO), but it cannot be a bytes object (what is returned by file.read()).

Answered By: scanny

UploadFile’s object has a File property that contains _file which is io.BytesIO

This worked for me. It returns all the content of the document.

def endpoint(file : UploadFile = File(...)):

     doc = Document(file.file._file)
Answered By: Shuchita Bora
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.