How to combine multiple columns into one single block

Question:

I’m trying to achieve a dataframe transformation (kinda complicated for me) with Pandas, see image below.
The original dataframe source is an Excel sheet (here is an example) that looks exactly like the input here :

INPUT → OUTPUT (made by draw.io)

enter image description here

Basically, I need to do these transformations by order :

  1. Select (in each block) the first four lines + the last two lines
  2. Stack all the blocks together
  3. Drop the last three unnamed columns
  4. Select columns A and E
  5. Fill down the column A
  6. Create a new column N1 that holds a sequence of values (ID-01 to ID-06)
  7. Create a new column N2 that concatente the first value of the block and its number

And for that, I made this code who unfortunately return a [0 rows × 56 columns] dataframe :

import pandas as pd

myFile = r"C:Userswasp_96bDesktopExcelSheet.xlsx"

df1 = pd.read_excel(myFile, sheet_name = 'Sheet1')

df2 = (pd.wide_to_long(df1.reset_index(), 'A' ,i='index',j='value').reset_index(drop=True))

df2.ffill(axis = 0)
                       
df2.insert(2, 'N1', 'ID-' + str(range(1, 1 + len(df2))))

df2.insert(3, 'N2', len(df2)//5)

display(df2)

Do you have any idea or explanation for this scenario ?

Is there any other ways I can obtain the result I’m looking for ?

Asked By: wasp-96b

||

Answers:

The Column names in your code and in the data are not matching. However, from the data and the output you desire, I think I am able to solve your query. The code is very specific for the data you provided and you might need to change it later

CODE

import pandas as pd

myFile = "ExcelSheet.xlsx"

df = pd.read_excel(myFile, sheet_name='Sheet1')

# Forwad filling the column
df["Housing"] = df["Housing"].ffill()

# Select the first 4 lines and last two lines
df = pd.concat([df.head(4), df.tail(2)]).reset_index(drop=True)

# Drop the unneccsary columns
df = df.drop(columns=[col for col in df.columns if not (col.startswith("Elements") or col == "Housing")])
df.rename(columns={"Elements": "Elements.0"}, inplace=True)

# Stack all columns
df = pd.wide_to_long(df.reset_index(), stubnames=["Elements."], i="index", j="N2").reset_index("N2")
df.rename(columns={"Elements.": "Elements"}, inplace=True)

# Adding N1 and N2
df["N1"] = "ID_" + (df.index + 1).astype("str")
df["N2"] = df["Housing"] + "-" + (df["N2"] + 1).astype("str")

# Finishing up
df = df[["Housing", "Elements", "N1", "N2"]].reset_index(drop=True)

print(df.head(12))

OUTPUT(only first 12 rows)

   Housing Elements    N1      N2
0     OID1        1  ID_1  OID1-1
1     OID1   M-0368  ID_2  OID1-1
2     OID1      JUM  ID_3  OID1-1
3     OID1   NODE-1  ID_4  OID1-1
4     OID4    BTM-B  ID_5  OID4-1
5     OID4        1  ID_6  OID4-1
6     OID1        1  ID_1  OID1-2
7     OID1   M-0379  ID_2  OID1-2
8     OID1      JUM  ID_3  OID1-2
9     OID1   NODE-2  ID_4  OID1-2
10    OID4    BTM-B  ID_5  OID4-2
11    OID4        2  ID_6  OID4-2
Answered By: vignesh kanakavalli
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.