Get row values as column values

Question:

I have a single row data-frame like below

Num     TP1(USD)    TP2(USD)    TP3(USD)    VReal1(USD)     VReal2(USD)     VReal3(USD)     TiV1 (EUR)  TiV2 (EUR)  TiV3 (EUR)  TR  TR-Tag
AA-24   0       700     2100    300     1159    2877    30       30     47      10  5

I want to get a dataframe like the one below

ID  Price   Net     Range
1   0       300     30
2   700     1159    30
3   2100    2877    47

The logic here is that
a. there will be 3 columns names that contain TP/VR/TV. So in the ID, we have 1, 2 & 3 (these can be generated by extracting the value from the column names or just by using a range to fill)
b. TP1 value goes into first row of column ‘Price’,TP2 value goes into second row of column ‘Price’ & so on
c. Same for VR & TV. The values go into ‘Net’ & ‘Range columns
d. Columns ‘Num’, ‘TR’ & ‘TR=Tag’ are not relevant for the result.

I tried df.filter(regex='TP').stack(). I get all the ‘TP’ column & I can access individual values be index ([0],[1],[2]). I could not get all of them into a column directly.

I also wondered if there may be a easier way of doing this.

Asked By: moys

||

Answers:

Assuming 'Num' is a unique identifier, you can use pandas.wide_to_long:

pd.wide_to_long(df, stubnames=['TP', 'VR', 'TV'], i='Num', j='ID')

or, for an output closer to yours:

out = (pd
 .wide_to_long(df, stubnames=['TP', 'VR', 'TV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})
 )

output:

       ID  Price   Net  Range
Num                          
AA-24   1      0   300     30
AA-24   2    700  1159     30
AA-24   3   2100  2877     47
updated answer
out = (pd
 .wide_to_long(df.set_axis(df.columns.str.replace(r'(USD)$', '', regex=True),
                           axis=1),
               stubnames=['TP', 'VReal', 'TiV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VReal': 'Net', 'TiV': 'Range'})
 )

output:

       ID  Price   Net  Range
Num                          
AA-24   1      0   300     30
AA-24   2    700  1159     30
AA-24   3   2100  2877     47
Answered By: mozway

pivot_wider (see mozway’s answer) is probably best here from a pure pandas perspective, but if you need more flexibility, you could also melt and pivot:

import pandas as pd

# recreating your dataframe
df = pd.DataFrame(['AA-24', '0', '700', '2100', '300', '1159', '2877', '30', '30', '47', '10', '5'], 
                  index= ['Num', 'TP1(USD)', 'TP2(USD)', 'TP3(USD)', 'VReal1(USD)', 'VReal2(USD)', 'VReal3(USD)', 'TiV1(EUR)', 'TiV2(EUR)', 'TiV3(EUR)', 'TR', 'TR-Tag']).T

# reshaping the data
(df.melt(id_vars=['Num','TR', 'TR-Tag'])
 .assign(col=lambda x: x['variable'].str[:2], idx=lambda x: x['variable'].str.extract("([0-9])"))
 .pivot(values='value', columns='col', index='idx')
 .rename(columns={'TP': 'Price', 'VR': 'Net', 'Ti': 'Range'})
)

Perhaps surprisingly, this is also faster than wide_to_long. Benchmarking gives 7.76 ms ± 841 µs per loop for this method.

The wide_to_long approach from mozway:

(pd
 .wide_to_long(df.set_axis(df.columns.str.replace(r'([A-Z]{3})$', '', regex=True),
                           axis=1),
               stubnames=['TP', 'VReal', 'TiV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VReal': 'Net', 'TiV': 'Range'})
 )

benchmarks at 30.4 ms ± 3.07 ms per loop on my machine.

Umar.H’s answer using stack is the faster than both:

df1 = df.filter(regex='TP|VR|TV')
df1.columns = df1.columns
     .str.replace('(d+)', r' 1' ,regex=True).str.split(' ',expand=True)
df1.stack(1).rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})

Runs at 6.07 ms ± 156 µs per loop

If you don’t mind the additional import, sammywemmy’s answer using pyjanitor’s pivot_wider offers speed and an elegant syntax.

(df
.select_columns('TP*', 'VR*', 'Ti*')
.pivot_longer(index = None, 
              names_to = ('.value', 'ID'), 
              names_pattern = ('(.+)(d).+'))
.rename(columns = {'TP':'Price', 'VReal':'Net', 'TiV':'Range'})
)

benchmarks at 11.2 ms ± 229 µs per loop

and the names pattern approach:

df.pivot_longer(index = None, 
                names_to = ('Price', 'Net', 'Range'), 
                names_pattern = ('TP.*', 'VR.*', 'Ti.*'), 
                ignore_index = False)

is the fastest of the lot as tested, coming in at 3.53 ms ± 95 µs per loop.

(It is worth noting that this dataset is probably too small to care about speed, and the order may not be the same on larger datasets)

Answered By: s_pike

let us create a Multiindex then use .stack

df1 = df.filter(regex='TP|VR|TV')
#i couldn't figure out to split by 
#wordnumber without creating an additional whitespace split.
df1.columns = df1.columns
     .str.replace('(d+)', r' 1' ,regex=True).str.split(' ',expand=True)

#or more succinctly.
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract('(D+)(d+)'))   

print(df1)

  TP              VR              TV
   1    2     3    1     2     3   1   2   3
0  0  700  2100  300  1159  2877  30  30  47

df1.stack(1).rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})
    
     Price  Range   Net
0 1      0     30   300
  2    700     30  1159
  3   2100     47  2877
Answered By: Umar.H

IIUC, you can use:

df = pd.DataFrame({'TP1':[0], 'TP2':[700], 'TP3':[2100], 'VR1':[300], 'VR2':[1159], 'VR3':[2877], 'TV1':[30], 'TV2':[30], 'TV3':[47]})

pd.wide_to_long(df.reset_index(), ["TP", "VR", "TV"], i="index", j="Nr").droplevel('index').rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})

Result:

    Price   Net  Range
Nr                    
1       0   300     30
2     700  1159     30
3    2100  2877     47
Answered By: René

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.select_columns('TP*', 'VR*', 'Ti*')
.pivot_longer(index = None, 
              names_to = ('.value', 'ID'), 
              names_pattern = ('(.+)(d).+'))
.rename(columns = {'TP':'Price', 'VReal':'Net', 'TiV':'Range'})
)
  ID  Price   Net  Range
0  1      0   300     30
1  2    700  1159     30
2  3   2100  2877     47

In the above solution, regex pattern is used to extract the relevant sub labels in the columns ; .value determines which of the sub labels remain as headers.

Another solution, that might be useful is to pass a list of regular expressions to names_pattern parameter:

df.pivot_longer(index = None, 
                names_to = ('Price', 'Net', 'Range'), 
                names_pattern = ('TP.*', 'VR.*', 'Ti.*'), 
                ignore_index = False)

   Price   Net  Range
0      0   300     30
0    700  1159     30
0   2100  2877     47
Answered By: sammywemmy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.