pandas make lines of text file into columns by name

Question:

i have a text file that looks like this:

Tag: liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson
  Global path: /net/liosprod8.cvc-global.net/export/viewstore/liprod/liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson.vws
  Server host: liosprod8.cvc-global.net
  Region: cssall
  Active: NO
  View tag uuid:ccd335a4.fb8011eb.af37.00:50:56:bf:58:95
View on host: liosprod8.cvc-global.net
View server access path: /export/viewstore/liprod/liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson.vws
View uuid: ccd335a4.fb8011eb.af37.00:50:56:bf:58:95
View attributes: snapshot
View owner: tmn/liprod

Tag: liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson
  Global path: /net/liosprod8.cvc-global.net/export/viewstore/liprod/liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson.vws
  Server host: liosprod8.cvc-global.net
  Region: cssall
  Active: NO
  View tag uuid:dc2ff6f7.fb8311eb.bb47.00:50:56:bf:58:95
View on host: liosprod8.cvc-global.net
View server access path: /export/viewstore/liprod/liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson.vws
View uuid: dc2ff6f7.fb8311eb.bb47.00:50:56:bf:58:95
View attributes: snapshot
View owner: tmn/liprod

I want to make an excel to have the output like this:

output

The output should have put the datas from the lines into correct columns, and when he find new "Tag" to put in on a new line.

example:

Tag                                                GlobalPath                                  ...

liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson     /net/liosprod8.cvc-global.net/export/viewst...

liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson      //net/liosprod8.cvc-global.net/export/v.... 

I have also made a script, but its not having the expected output: https://paste.pythondiscord.com/obulosuvuy

Maybe if you know another variant using pandas or you know what i did wrong in my script…

Ty!

Asked By: Champs

||

Answers:

You made it overly complicated. If your file always have the same structure (records separated by an empty line and the same 11 fields for each record) you can just do something like that:

import pandas as pd

data=[]

with open ("input.txt") as file:
    content=file.read()
    records=content.split("nn")

    for r in records:
        line_data=[]
        for line in r.splitlines():
            value=line.split(":")[1].strip()
            line_data.append(value)
        data.append(line_data)

df=pd.DataFrame(data, columns=[
    "Tag",
    "Global path",
    "Server host",
    "Region",
    "Active",
    "View tag uuid",
    "View on host",
    "View server access path",
    "View uuid",
    "View attributes",
    "View owner"
])

df.to_excel("output.xlsx", index=False)

Basically what it does is separating the records by using .split("nn"), then for each line of them extract what is after ":" and put in in an appropriate data structure (a list of lists) that is used to build the DataFrame.

Answered By: Liutprand

You can use StringMethods.split to split into 2 cols then pandas.DataFrame.pivot to reshape :

Try this :

import pandas as pd

df = pd.read_fwf('test.txt', header=None)

df[['col1','col2']] = df[0].str.split(pat=":", n=1, expand=True)

cols = df['col1'].str.strip().drop_duplicates().tolist()

out = (
        df.pivot(columns="col1", values="col2")
          .assign(Tag = lambda x: x['Tag'].ffill())
          .groupby('Tag').sum()
          .replace(0, 'N/A')
          .reset_index()
          .rename_axis(axis=1, mapper=None)
          .rename(columns= lambda x: x.strip())
          .reindex(columns=cols)
        )

out.to_excel('your_excel_name.xlsx', index=False) #to save a spreadsheet

# Output :

print(out)

                                                          Tag                                                                                                Global path                Server host   Region Active                             View tag uuid               View on host                                                       View server access path                                  View uuid View attributes   View owner
0              liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson   /net/liosprod8.cvc-global.net/export/viewstore/liprod/liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson.vws   liosprod8.cvc-global.net   cssall     NO  ccd335a4.fb8011eb.af37.00:50:56:bf:58:95   liosprod8.cvc-global.net   /export/viewstore/liprod/liprod_Liosprod8_LIOS_12.3_NIGHT_linux8_hudson.vws   ccd335a4.fb8011eb.af37.00:50:56:bf:58:95        snapshot   tmn/liprod
1               liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson    /net/liosprod8.cvc-global.net/export/viewstore/liprod/liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson.vws   liosprod8.cvc-global.net   cssall     NO  dc2ff6f7.fb8311eb.bb47.00:50:56:bf:58:95   liosprod8.cvc-global.net    /export/viewstore/liprod/liprod_Liosprod8_LIOS_DF3_NIGHT_linux8_hudson.vws   dc2ff6f7.fb8311eb.bb47.00:50:56:bf:58:95        snapshot   tmn/liprod
2   liprod_simpleSlave_LIOS_86_LATEST_MNT_hudson_dynamic_view    /net/rhliprod.cvc-global.net/export/viewstore/liprod_simpleSlave_LIOS_86_LATEST_MNT_hudson_dynamic_view    rhliprod.cvc-global.net   cssall     NO  de5ddadb.3e0011ed.91ef.00:0c:29:84:11:a7    rhliprod.cvc-global.net   /export/viewstore/liprod_simpleSlave_LIOS_86_LATEST_MNT_hudson_dynamic_view   de5ddadb.3e0011ed.91ef.00:0c:29:84:11:a7             N/A   tmn/liprod
Answered By: abokey