Is it possible to read information from an xlsx file without any libraries in python?

Question:

I am forced to ask this question

My mentor has given me a task to extract data from files with pure python, there were some txt file which were easy but there is a file with xlsx extension and I can’t find any where if it is possible to extract the data from it with pure python (I have been searching for more than 3 weeks now).

Please if it is not possible tell me so that I can show this to her with confidence because my mentor keeps insisting that it is possible and I should do it with pure python but she refuses to give me any clues and tips.

And If it is possible tell me how to do it or where to read more about it.

Asked By: M.Amin

||

Answers:

All MS Office files with extensions ending in x are in fact zip archives (so you can change the extension and unpack) and they typically contain a handful of XML files along with media (images, videos, etc.).

You can process all of these XML files as text, or using xml module from standard Python library – you can work with them in a slightly more advanced way.

The formats are complex but often times you can do basic things without going through thousands of pages of documentation.

Answered By: sophros

The short answer is no, the long answer is, you can unpack the .xls file and iterate through the resulting .xml "by hand".

Answered By: pietro_molina

Previous answers regarding unpacking/unzipping the XLSX file is the correct starting point. Thereafter you’ll need to know how the extracted files work together. It’s rather convoluted.

The best thing to do is be specific about exactly what data you want to extract then I’m sure you’ll get some sample code that shows how you achieve your objective

Answered By: user2668284

I struggled with this too but finally I found out a way! Turns out that the file.read() returns the bytes of the file, so you can try this:

file_path = "excel_test.xlsx"

with open(file_path, "rb") as xlsx_file:
    file_readed = xlsx_file.read()
    print(type(file_readed))
    print(file_readed)

Prints:
<class ‘bytes’> b’PKx03x04x14x00x08x08x08x00x1bx8fJVx00x00x00x00x00x00x00x00x00x00x00x00x0bx00x00x00_rels/.relsxadx92xcfJx031x10x87xef}x8ax90{wxb6x15Ddxb3xbdx88xd0x9bH}…’

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.