Convert Pandas DataFrame from Wide to Long Format

Question:

Given this DataFrame:

data = [['Tom', 'run', '2022-01-26', 'run', '2027-01-26'], ['Max', 'stop', '2020-11-16', 'run', '2022-04-26'], ['Bob', 'run', '2021-10-03', 'stop', '2022-01-26'], ['Ben', 'run', '2020-03-11', 'stop', '2013-01-26'], ['Eva', 'stop', '2017-11-16', 'run', '2015-01-26']]
df = pd.DataFrame(data, columns=['person', 'action_1', 'time_1', 'action_2', 'time_2'])


  person action_1      time_1 action_2      time_2
0    Tom      run  2022-01-26      run  2027-01-26
1    Max     stop  2020-11-16      run  2022-04-26
2    Bob      run  2021-10-03     stop  2022-01-26
3    Ben      run  2020-03-11     stop  2013-01-26
4    Eva     stop  2017-11-16      run  2015-01-26

I want it to look like:

  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26
Asked By: juststarted

||

Answers:

This can be done using pd.wide_to_long:

df = pd.wide_to_long(df, 
                     stubnames=['action', 'time'],
                     i='person',
                     j='num',
                     sep='_').reset_index()

Output:

  person  num action        time
0    Tom    1    run  2022-01-26
1    Max    1   stop  2020-11-16
2    Bob    1    run  2021-10-03
3    Ben    1    run  2020-03-11
4    Eva    1   stop  2017-11-16
5    Tom    2    run  2027-01-26
6    Max    2    run  2022-04-26
7    Bob    2   stop  2022-01-26
8    Ben    2   stop  2013-01-26
9    Eva    2    run  2015-01-26
Answered By: BeRT2me

One option with pivot_longer from pyjanitor – it can handle non-unique indices:

# pip install pyjanitor
import janitor

(df
.pivot_longer(
    index='person', 
    names_to = ['action', 'time'], 
    names_pattern = ['action', 'time'])
)
  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26

Another option, using .value:

(df
.pivot_longer(
    index='person', 
    names_to = '.value', 
    names_pattern = r"(.+)_.+")
) 
  person action        time
0    Tom    run  2022-01-26
1    Max   stop  2020-11-16
2    Bob    run  2021-10-03
3    Ben    run  2020-03-11
4    Eva   stop  2017-11-16
5    Tom    run  2027-01-26
6    Max    run  2022-04-26
7    Bob   stop  2022-01-26
8    Ben   stop  2013-01-26
9    Eva    run  2015-01-26
Answered By: sammywemmy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.