Appending a large amount of columns into 1 column

Question:

I have a dataset of ~700 columns. I’d like to join all columns into a single column.

Input:

id | A   | B  | C   | D  | E  | F  | ... | Z
0  | yes | no | yes | no | yes| no | ... | no
1  | no  | no | yes | no | no | no | ... | no
2  | yes | yes| yes | yes| yes| no | ... | no

Output:

id | A   | B  | C   | D  | E  | F  | ... | Z  | joined_column
0  | yes | no | yes | no | yes| no | ... | no | yes no yes no yes no ... no
1  | no  | no | yes | no | no | no | ... | no | no no yes no no no ... no
2  | yes | yes| yes | yes| yes| no | ... | no | yes yes yes yes yes no ... no

I’ve used the following in the past. However, I’m looking for a way to scale this when you’re working with a large amount of columns.

def join(df):
list = []
    for i in range(0, df.shape[0]):
        list.append( str(df['A'][i]) + ' ' + str(df['B'][i]) + ' ' + str(df['C'][i]))
    return list
Asked By: Ant

||

Answers:

Perhaps you can use a list comprehension:

df['joined_column'] = [' '.join(item) for item in df.iloc[:,1:].values]

   id    A    B    C    D    E   F   Z              joined_column
0   0  yes   no  yes   no  yes  no  no    yes no yes no yes no no
1   1   no   no  yes   no   no  no  no      no no yes no no no no
2   2  yes  yes  yes  yes  yes  no  no  yes yes yes yes yes no no
Answered By: sophocles

You can use agg to aggregate data over a specific axis using a specific operation:

df['joined_column'] = df.agg(' '.join, axis=1)

Potentially use

df['joined'] = df.iloc[:,1:].agg(' '.join, axis=1)

if you don’t want to join the first (or any other column).

I used ' '.join to join the values with a space as separator.

axis is set to 1 because you want to join columns and not rows.

Answered By: M. Spiller

Given df:

     A    B    C    D    E   F
0  yes   no  yes   no  yes  no
1   no   no  yes   no   no  no
2  yes  yes  yes  yes  yes  no

Doing:

cols = df.columns

# As a string:
df['joined_column_str'] = df[cols].agg(' '.join, axis=1)

# As a list:
df['joined_column_list'] = df[cols].agg(list, axis=1)

Output:

     A    B    C    D    E   F       joined_column_str             joined_column_list
0  yes   no  yes   no  yes  no    yes no yes no yes no    [yes, no, yes, no, yes, no]
1   no   no  yes   no   no  no      no no yes no no no      [no, no, yes, no, no, no]
2  yes  yes  yes  yes  yes  no  yes yes yes yes yes no  [yes, yes, yes, yes, yes, no]
Answered By: BeRT2me

Considering the dataframe df

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...

One can use a custom lambda function inside pandas.DataFrame.apply as follows

df['joined_column'] = df.apply(lambda x: ' '.join(x.astype(str)), axis=1)

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  0 yes no yes no yes no yes no yes no yes no ye...
1   1   no   no  ...   no   no  1 no no yes no no no no no no no no no no no n...
2   2  yes  yes  ...  yes  yes  2 yes yes yes yes yes no yes yes yes yes yes y...

However, as one doesn’t want the id to appear in the joined_column, one can add x.drop('id') as

df['joined_column'] = df.apply(lambda x: ' '.join(x.drop('id').astype(str)), axis=1)

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...

Alternatively, for this last case, one can also leverage Numpy, more specifically numpy.array as follows

df['joined_column'] = np.array([' '.join(x.astype(str)) for x in df.drop('id', axis=1).values])

[Out]:

   id    A    B  ...    Y    Z                                      joined_column
0   0  yes   no  ...  yes   no  yes no yes no yes no yes no yes no yes no yes ...
1   1   no   no  ...   no   no  no no yes no no no no no no no no no no no no ...
2   2  yes  yes  ...  yes  yes  yes yes yes yes yes no yes yes yes yes yes yes...
Answered By: Gonçalo Peres

A possible solution:

df['joined_column'] = df.iloc[:,1:].add(' ').sum(axis=1).str.rstrip()

Yet another possible solution, using functools.reduce:

from functools import reduce

df['joined_column'] = reduce(lambda x,y: x+' '+y, [df[i] for i in df.columns[1:]])

Output:

   id    A    B    C    D    E   F   Z              joined_column
0   0  yes   no  yes   no  yes  no  no    yes no yes no yes no no
1   1   no   no  yes   no   no  no  no      no no yes no no no no
2   2  yes  yes  yes  yes  yes  no  no  yes yes yes yes yes no no
Answered By: PaulS