Is there a way to read multiple plain text files into a dataframe?

Question:

I have multiple plain text files that need to be saved in each row in a data frame. I want to make the data frame consist of two columns: the filenames and texts. The code below does not spit error message, but it creates a data frame that takes the file contents as column names, all put in the first row.

working code (revised following the suggestions @ Code different :

 from pathlib import Path

df = []
for file in Path("/content/").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
df
print(df)
  

the output:

Empty DataFrame
Columns: [                The Forgotten Tropical Ecosystem 
Index: []

[0 rows x 9712 columns]

How could the code be improved so that the texts are put in each row under the column heading ‘text’?

Asked By: Sangeun

||

Answers:

I have done this a lot at work and here’s how I typically do it:

from pathlib import Path

df = []
for file in Path("/content").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
Answered By: Code Different

Here is one possible answer to my question, which uses the dictionary function. My friend helped me with this and it works. Not really sure why the suggested answer would not work in my environment. But thanks anyway!

Code:

import os

# table format [file_name: text]
dictionary = {}
file_names = []
file_texts = []
for file_name in os.listdir('.'):
  if '.txt' in file_name:
    # Load the text file
    f = open(file_name, "r")
    # Read the text in the file
    text = f.read()

    file_names.append(file_name)
    file_texts.append(text)

dictionary["file_names"] = file_names
dictionary["file_texts"] = file_texts

import pandas as pd
pandas_dataframe = pd.DataFrame.from_dict(dictionary)

print(pandas_dataframe)
Answered By: Sangeun
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.