Create dictionary with pairs from column from pandas dataframe using regex
Question:
I have the following dataframe
import pandas as pd
df = pd.DataFrame({'Original': [92,93,94,95,100,101,102],
'Sub_90': [99,98,99,100,102,101,np.nan],
'Sub_80': [99,98,99,100,102,np.nan,np.nan],
'Gen_90': [99,98,99,100,102,101,101],
'Gen_80': [99,98,99,100,102,101,100]})
I would like to create the following dictionary
{
'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80',
}
using regex
(because at my original data I also have Gen_70, Gen_60, ... , Gen_10
and Sub_70, Sub_60, ... , Sub_10
)
So I would like to create pairs of Sub
and Gen
for the same _number
and also pairs or the Original
with the Gen
s
How could I do that ?
Answers:
Use dictionary comprehension with replace
and sorting by number after _
:
d = {x:'Original' if x.startswith('Gen') else x.replace('Sub','Gen')
for x in sorted(df.columns.drop('Original'),
key=lambda x: (-int(x.split('_')[1]), x.split('_')[0]))}
print (d)
{'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80'}
You can use:
cols = df.sort_index(axis=1).columns
group = cols[::-1].str.extract(r'_(d+)', expand=False)
out = {a: b for l in map(list, cols.groupby(group).values())
for a,b in zip(l, ['Original']+l)}
This should work irrespective of the order of the input.
Output:
{'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80'}
You can use a simple loop:
l = sorted(df.columns[1:]) # exclude 'Original' column as it's common
d = {}
# split other columns into two sublist: one for Gen_XX and another for Sub_XX
for g, s in zip(l[:len(l)//2], l[len(l)//2:]):
d[g] = 'Original'
d[s] = g
Output:
>>> d
{'Gen_80': 'Original',
'Sub_80': 'Gen_80',
'Gen_90': 'Original',
'Sub_90': 'Gen_90'}
You can do:
gen_cols = df.filter(like='Gen_').columns
sub_cols = df.filter(like='Sub_').columns
d = dict(zip(sorted(sub_cols), sorted(gen_cols)))
d.update({g : 'Original' for g in gen_cols})
print(d)
{'Sub_80': 'Gen_80',
'Sub_90': 'Gen_90',
'Gen_90': 'Original',
'Gen_80': 'Original'}
I have the following dataframe
import pandas as pd
df = pd.DataFrame({'Original': [92,93,94,95,100,101,102],
'Sub_90': [99,98,99,100,102,101,np.nan],
'Sub_80': [99,98,99,100,102,np.nan,np.nan],
'Gen_90': [99,98,99,100,102,101,101],
'Gen_80': [99,98,99,100,102,101,100]})
I would like to create the following dictionary
{
'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80',
}
using regex
(because at my original data I also have Gen_70, Gen_60, ... , Gen_10
and Sub_70, Sub_60, ... , Sub_10
)
So I would like to create pairs of Sub
and Gen
for the same _number
and also pairs or the Original
with the Gen
s
How could I do that ?
Use dictionary comprehension with replace
and sorting by number after _
:
d = {x:'Original' if x.startswith('Gen') else x.replace('Sub','Gen')
for x in sorted(df.columns.drop('Original'),
key=lambda x: (-int(x.split('_')[1]), x.split('_')[0]))}
print (d)
{'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80'}
You can use:
cols = df.sort_index(axis=1).columns
group = cols[::-1].str.extract(r'_(d+)', expand=False)
out = {a: b for l in map(list, cols.groupby(group).values())
for a,b in zip(l, ['Original']+l)}
This should work irrespective of the order of the input.
Output:
{'Gen_90': 'Original',
'Sub_90': 'Gen_90',
'Gen_80': 'Original',
'Sub_80': 'Gen_80'}
You can use a simple loop:
l = sorted(df.columns[1:]) # exclude 'Original' column as it's common
d = {}
# split other columns into two sublist: one for Gen_XX and another for Sub_XX
for g, s in zip(l[:len(l)//2], l[len(l)//2:]):
d[g] = 'Original'
d[s] = g
Output:
>>> d
{'Gen_80': 'Original',
'Sub_80': 'Gen_80',
'Gen_90': 'Original',
'Sub_90': 'Gen_90'}
You can do:
gen_cols = df.filter(like='Gen_').columns
sub_cols = df.filter(like='Sub_').columns
d = dict(zip(sorted(sub_cols), sorted(gen_cols)))
d.update({g : 'Original' for g in gen_cols})
print(d)
{'Sub_80': 'Gen_80',
'Sub_90': 'Gen_90',
'Gen_90': 'Original',
'Gen_80': 'Original'}