Convert Log file to Dataframe Pandas

Question:

I have log files, which have many lines in the form of :

<log uri="Brand" t="2017-01-24T11:33:54" u="Rohan" a="U" ref="00000000-2017-01" desc="This has been updated."></log>

I am trying to convert each line in the log file into a Data frame and store it in csv or excel format. I want only values of uri, t is nothing but time u for username and desc for description

Something like this

Columns :- uri Date Time User Description

        Brand  2017-01-24   11:33:54  Rohan   This has been updated.

and so on.

Asked By: Rejoy

||

Answers:

As mentionned by @Corralien in the comments, you can use some of functions (Beautifulsoup and find_all) to parse each line in your logfile separately, then use pandas.DataFrame constructor with a listcomp to make a DataFrame for each line :

import pandas as pd
import bs4 #pip install beautifulsoup4
​
with open("/tmp/logfile.txt", "r") as f:
    logFile = f.read()
​
soupObj = bs4.BeautifulSoup(logFile, "html5lib")
​
dfList = [pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])],
                        columns=["uri", "Date", "Time", "User", "Description"])
           for x in soupObj.find_all("log")]

#this bloc creates an Excel file for each df​
for lineNumber, df in enumerate(dfList, start=1):
    df.to_excel(f"logfile_{lineNumber}.xlsx", index=False)

Output :

print(dfList[0])

     uri        Date      Time   User             Description
0  Brand  2017-01-24  11:33:54  Rohan  This has been updated.

Update :
If you need a single dataframe/spreadsheet for the all the lines, use this :

with open("/tmp/logfile.txt", "r") as f:
    soupObj = bs4.BeautifulSoup(f, "html5lib")

df = pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])
                   for x in soupObj.find_all("log")],
                  columns=["uri", "Date", "Time", "User", "Description"])

df.to_excel("logfile.xlsx", index=False)
Answered By: Timeless
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.