Error: FloatObject (b'0.000000000000-14210855') invalid; use 0.0 instead while using PyPDF2

Question:

I am using function to count occurrences of given word in pdf using PyPDF2. While the function is running I get message in terminal:

FloatObject (b'0.000000000000-14210855') invalid; use 0.0 instead

My code:

def count_words(word):
    print()
    print('Counting words..')

    files = os.listdir('./pdfs')
    counted_words = []

    for idx, file in enumerate(files, 1):
        with open(f'./pdfs/{file}', 'rb') as pdf_file:
            ReadPDF = PyPDF2.PdfFileReader(pdf_file, strict=False)
            pages = ReadPDF.numPages

            words_count = 0

            for page in range(pages):
                pageObj = ReadPDF.getPage(page)
                data = pageObj.extract_text()
                words_count += sum(1 for match in re.findall(rf'b{word}b', data, flags=re.I))

            counted_words.append(words_count)
        
        print(f'File: {idx}')
    
    return counted_words

How to get rid of this message?

Asked By: iamstudentsorry

||

Answers:

See https://pypdf2.readthedocs.io/en/latest/user/suppress-warnings.html

import logging

logger = logging.getLogger("PyPDF2")
logger.setLevel(logging.ERROR)
Answered By: Martin Thoma

The PDF specification has never allowed scientific (exponent/mantissa) floats, which yours looks a little bit like. An unscrupulous PDF producer has output, therefore, a malformed PDF file. PyPDF’s choice to convert it to 0.0 seems a solid response.

Answered By: johnwhitington
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.