Create a summary column with info from other columns in same row (python/pandas)

Question:

If I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({
    'id': [1,2,3],
    'date': ['2021-09-08', '2021-07-06', '2021-03-04'],
    'finding': ['Yes', 'No', 'Yes'],
    'unecessary_col': [1,1,1]
})


    id  date        finding unecessary_col
0   1   2021-09-08  Yes     1
1   2   2021-07-06  No      1
2   3   2021-03-04  Yes     1

How can I create an additional ‘summary’ column with values and descriptions from different columns within the same row? Not all columns would be included in this summary. Ideal output below:

    id  date        finding unecessary_col  summary
0   1   2021-09-08  Yes     1               "ID: 1; Date: 2021-09-08; Finding: Yes"
1   2   2021-07-06  No      1               "ID: 2; Date: 2021-07-06; Finding: No"
2   3   2021-03-04  Yes     1               "ID: 3; Date: 2021-03-04; Finding: Yes"

Thanks in advance

Asked By: Dr Wampa

||

Answers:

You can use:

df['summary'] = df.to_dict(orient='records')

Result:

   id        date finding                                            summary
0   1  2021-09-08     Yes  {'id': 1, 'date': '2021-09-08', 'finding': 'Yes'}
1   2  2021-07-06      No   {'id': 2, 'date': '2021-07-06', 'finding': 'No'}
2   3  2021-03-04     Yes  {'id': 3, 'date': '2021-03-04', 'finding': 'Yes'}
Answered By: René

If you want a string for your summaries, you will need to perform some form of iteration, either an explicit for-loop or an implicit one via df.apply(axis=1)

explicit for loop df.iterrows()

df['summary'] = ['; '.join(f'{k}: {v}' for k, v in row.items()) for _, row in df.iterrows()]

print(df)
   id        date finding                                summary
0   1  2021-09-08     Yes  id: 1; date: 2021-09-08; finding: Yes
1   2  2021-07-06      No   id: 2; date: 2021-07-06; finding: No
2   3  2021-03-04     Yes  id: 3; date: 2021-03-04; finding: Yes

implicit for loop .apply(…, axis=1)

df['summary'] = df.apply(
    lambda row: '; '.join(f'{k}: {v}' for k, v in row.items()),
    axis=1
)

print(df)
   id        date finding                                summary
0   1  2021-09-08     Yes  id: 1; date: 2021-09-08; finding: Yes
1   2  2021-07-06      No   id: 2; date: 2021-07-06; finding: No
2   3  2021-03-04     Yes  id: 3; date: 2021-03-04; finding: Yes
Answered By: Cameron Riddell
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.