How to iterate over columns and connacenate two columns into one

Question:

I have a dataframe:

               Border #1 [from] Border #1 [to]  Border #2 [from] Border #2 [to]
index                                                                  
0                   BE           BE_AL                 PL              SK
1                   BE           BE_AL                 PL              SK

And I want to connect every two columns into one (I have many more columns), the desired result:

                   Border #1                Border #2
index                                                                  
0                   BE_BE_AL                 PL_SK
1                   BE_BE_AL                 PL_SK

For one column I could do:

df['Border#1']=df['Border #1 [from]']+'_'+df['Border #1 [to]']

but how can I do it for multiple columns?

Asked By: wychen

||

Answers:

Create MutliIndex by split by [ with space, so possible select both levels by DataFrame.xs and join by +:

df.columns = df.columns.str.strip(']').str.split('s+[', expand=True)
print (df)
  Border #1        Border #2    
       from     to      from  to
0        BE  BE_AL        PL  SK
1        BE  BE_AL        PL  SK

print (df.columns)
MultiIndex([('Border #1', 'from'),
            ('Border #1',   'to'),
            ('Border #2', 'from'),
            ('Border #2',   'to')],
           )

df = df.xs('from', axis=1, level=1) +'_'+ df.xs('to', axis=1, level=1)
print (df)
  Border #1 Border #2
0  BE_BE_AL     PL_SK
1  BE_BE_AL     PL_SK
Answered By: jezrael

You can group the columns and craft a new dataframe

groups = df.columns.str.replace(' [.+]', '', regex=True)
df2 = pd.concat({g: d.apply('_'.join, axis=1)
                 for g,d in df.groupby(groups, axis=1)}, axis=1)

output:

      Border #1 Border #2
index                    
0      BE_BE_AL     PL_SK
1      BE_BE_AL     PL_SK
Answered By: mozway

groupby axis=1 and join the columns

def function1(dd:pd.DataFrame):
    return dd.agg("_".join,axis=1)

df1.groupby(df1.columns.str[8],axis=1).apply(function1).add_prefix(df1.columns.max()[:8])

out:

    Border #1 Border #2
index                    
0      BE_BE_AL     PL_SK
1      BE_BE_AL     PL_SK
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.