How to merge several columns into one column with several records using python and pandas?

Question:

I have a data which I need to transform in order to get 2 cols insted of 4 :


data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]

processed_data = ?

needed_df = pandas.DataFrame(processed_data, columns=['Number', 'Name'])


I expect the following behaviour:

['123', 'Billy']
['123', 'Bill']
['123', 'Bi']
['234', 'James']
['234', 'J']
['234', 'Ji']

I’ve tried to use for in for loop but getting the wrong result:

for row in df.iterrows():
    for col in df.columns:
        new_row = ...
        processed_df = pandas.concat(df, new_row)

Such a construction gives a too big result

The similar question using sql:

How to split several columns into one column with several records in SQL?

Asked By: Devcool

||

Answers:

Let use list comprehension to create pairs of Name and Number then create a new dataframe

pd.DataFrame([[x, z] for x, *y in data for z in y], columns=['Number', 'Name'])

  Number   Name
0    123  Billy
1    123   Bill
2    123     Bi
3    234  James
4    234      J
5    234     Ji
6    543  Floyd
7    543    Flo
8    543      F
Answered By: Shubham Sharma

Or, you can convert you exists data into a dataframe then perform pandas dataframe reshaping with melt:

import pandas as pd

data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]


df = pd.DataFrame(data)

df.melt(0).sort_values(0)

Output:

     0  variable  value
0  123         1  Billy
3  123         2   Bill
6  123         3     Bi
1  234         1  James
4  234         2      J
7  234         3     Ji
2  543         1  Floyd
5  543         2    Flo
8  543         3      F
Answered By: Scott Boston

Here is one possible way to merge several columns into one column with several records using Python and Pandas:

import pandas as pd

data = [['123', 'Billy', 'Bill', 'Bi'],
        ['234', 'James', 'J', 'Ji'],
        ['543', 'Floyd', 'Flo', 'F'],
]

df = pd.DataFrame(data, columns=['Number', 'Name1', 'Name2', 'Name3'])

df = df.melt(id_vars=['Number'], value_vars=['Name1', 'Name2', 'Name3'])

df = df.drop('variable', axis=1).rename(columns={'value': 'Name'})

print(df)
Answered By: simeon2941
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.