display all files in data frame using python pandas

Question

I am trying to create a data frame from a data set of 1000 .txt files,
then loop through the files and gets the title, Author, language, etc to form a single data frame.

from glob import glob

files = glob('dataset/*.txt')
files.sort()
files

for n in files:
    with open(n, 'r') as text_file:
        text = text_file.read()

    # These can be reused for each book    
    title = re.compile(r'Title: (.*)n')
    author = re.compile(r'Author: (.*)n')
    release_date = re.compile(r'Release Date: (.*)s')
    language = re.compile(r'Language: (.*)n')

    book_title = title.search(text).group(1)
#     book_author = author.search(text).group(1)
    book_language = language.search(text).group(1)
    book_release = release_date.search(text).group(1).split(' [')[0]

    books = pd.DataFrame({"Title": [book_title], "Author": [book_author], 
    "Release_Date": [book_release], "Language": [book_language]})
    
    books

this displays only a single data but when I use PRINT it displays all data but as separate data frames.

How do I display all these frames as one single data frame?

Asked By: watch dog

||

Source

Answer 1

Something like this may work, however, it must be said, that this program will generate an error if ANY of the book details are not found in ANY of the books.

Code:

book_title, book_author, book_release, book_language = [],[],[],[]

# These can be reused for each book    
title = re.compile(r'Title: (.*)n')
author = re.compile(r'Author: (.*)n')
release_date = re.compile(r'Release Date: (.*)s')
language = re.compile(r'Language: (.*)n')

for n in files:
    with open(n, 'r') as text_file:
        text = text_file.read()

    book_title.append(title.search(text).group(1))
    book_author.append(author.search(text).group(1))
    book_language.append(language.search(text).group(1))
    book_release.append(release_date.search(text).group(1).split(' [')[0])


books = pd.DataFrame({"Title": book_title, "Author": book_author, 
    "Release_Date": book_release, "Language": book_language})

Note:

To handle issues for when you are missing data from a book you could employ this type of technique:

    if author.search(text) is not None:
        book_author.append(author.search(text).group(1))
    else:
        book_author.append('-')

Answered By: ScottC

display all files in data frame using python pandas

Question:

Answers:

Code:

Note: