Move specific column information to a new row under the current row
Question:
Consider this df:
data = { 'Name Type': ["Primary", "Primary", "Primary"],
'Full Name': ["John Snow", "Daenerys Targaryen", "Brienne Tarth"],
'AKA': ["Aegon Targaryen", None, None],
'LQAKA': ["The Bastard of Winterfell", "Mother of Dragons", None],
'Other': ["Info", "Info", "Info"]}
df = pd.DataFrame(data)
I need to move akas and lqakas if they are not None below each Primary name and also assign the Name Type to be AKA or LQAKA. If it is None, no row should be created. There are many other columns like column other that should keep info in the same row as Primary name. So the expected result would be:
Name Type
Full Name
Other
Primary
John Snow
Info
AKA
Aegon Targaryen
LQAKA
The Bastard of Winterfell
Primary
Daenerys Targaryen
Info
LQAKA
Mother of Dragons
Primary
Brienne Tarth
Info
Answers:
You can melt
+dropna
+sort_index
and post-process:
# reshape, remove None, and reorder
out = (df.melt(['Name Type', 'Other'], ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
)
# identify rows with "Full Name"
m = out['variable'].ne('Full Name')
# mask unwanted entries
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, 'Other'] = ''
Variant with stack
, that will directly reshape in the desired order and drop the None automatically:
out = df.set_index(['Name Type', 'Other']).stack().reset_index(name='Full Name')
m = out['level_2'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('level_2')
out.loc[m, 'Other'] = ''
Output:
Name Type Other Full Name
0 Primary Info John Snow
1 AKA Aegon Targaryen
2 LQAKA The Bastard of Winterfell
3 Primary Info Daenerys Targaryen
4 LQAKA Mother of Dragons
5 Primary Info Brienne Tarth
handling many columns
Assuming you can identify the 3 columns to melt
, you could modify the above approach to:
cols = df.columns.difference(['Full Name', 'AKA', 'LQAKA'])
out = (df.melt(cols, ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
)
m = out['variable'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, cols[1:]] = ''
out = out[['Name Type', 'Full Name']+list(cols[1:])]
Output:
Name Type Full Name Other Other2 Other3
0 Primary John Snow Info info2 info3
1 AKA Aegon Targaryen
2 LQAKA The Bastard of Winterfell
3 Primary Daenerys Targaryen Info info2 info3
4 LQAKA Mother of Dragons
5 Primary Brienne Tarth Info info2 info3
exploding lists if any in the input
Adding an explode
step:
import pandas as pd
data = {'Name Type': ["Primary", "Primary", "Primary"],
'Full Name': ["John Snow", "Daenerys Targaryen", "Brienne Tarth"],
'AKA': [["Aegon Targaryen", "Lord Snow"], None, None],
'LQAKA': ["The Bastard of Winterfell", ["Mother of Dragons", "The Unburnt"],
None],
'Other': ["Info", "Info", "Info"]}
df = pd.DataFrame(data)
cols = df.columns.difference(['Full Name', 'AKA', 'LQAKA'])
out = (df.melt(cols, ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
.explode('Full Name', ignore_index=True)
)
m = out['variable'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, cols[1:]] = ''
out = out[['Name Type', 'Full Name']+list(cols[1:])]
Output:
Name Type Full Name Other
0 Primary John Snow Info
1 AKA Aegon Targaryen
2 AKA Lord Snow
3 LQAKA The Bastard of Winterfell
4 Primary Daenerys Targaryen Info
5 LQAKA Mother of Dragons
6 LQAKA The Unburnt
7 Primary Brienne Tarth Info
Consider this df:
data = { 'Name Type': ["Primary", "Primary", "Primary"],
'Full Name': ["John Snow", "Daenerys Targaryen", "Brienne Tarth"],
'AKA': ["Aegon Targaryen", None, None],
'LQAKA': ["The Bastard of Winterfell", "Mother of Dragons", None],
'Other': ["Info", "Info", "Info"]}
df = pd.DataFrame(data)
I need to move akas and lqakas if they are not None below each Primary name and also assign the Name Type to be AKA or LQAKA. If it is None, no row should be created. There are many other columns like column other that should keep info in the same row as Primary name. So the expected result would be:
Name Type | Full Name | Other |
---|---|---|
Primary | John Snow | Info |
AKA | Aegon Targaryen | |
LQAKA | The Bastard of Winterfell | |
Primary | Daenerys Targaryen | Info |
LQAKA | Mother of Dragons | |
Primary | Brienne Tarth | Info |
You can melt
+dropna
+sort_index
and post-process:
# reshape, remove None, and reorder
out = (df.melt(['Name Type', 'Other'], ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
)
# identify rows with "Full Name"
m = out['variable'].ne('Full Name')
# mask unwanted entries
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, 'Other'] = ''
Variant with stack
, that will directly reshape in the desired order and drop the None automatically:
out = df.set_index(['Name Type', 'Other']).stack().reset_index(name='Full Name')
m = out['level_2'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('level_2')
out.loc[m, 'Other'] = ''
Output:
Name Type Other Full Name
0 Primary Info John Snow
1 AKA Aegon Targaryen
2 LQAKA The Bastard of Winterfell
3 Primary Info Daenerys Targaryen
4 LQAKA Mother of Dragons
5 Primary Info Brienne Tarth
handling many columns
Assuming you can identify the 3 columns to melt
, you could modify the above approach to:
cols = df.columns.difference(['Full Name', 'AKA', 'LQAKA'])
out = (df.melt(cols, ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
)
m = out['variable'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, cols[1:]] = ''
out = out[['Name Type', 'Full Name']+list(cols[1:])]
Output:
Name Type Full Name Other Other2 Other3
0 Primary John Snow Info info2 info3
1 AKA Aegon Targaryen
2 LQAKA The Bastard of Winterfell
3 Primary Daenerys Targaryen Info info2 info3
4 LQAKA Mother of Dragons
5 Primary Brienne Tarth Info info2 info3
exploding lists if any in the input
Adding an explode
step:
import pandas as pd
data = {'Name Type': ["Primary", "Primary", "Primary"],
'Full Name': ["John Snow", "Daenerys Targaryen", "Brienne Tarth"],
'AKA': [["Aegon Targaryen", "Lord Snow"], None, None],
'LQAKA': ["The Bastard of Winterfell", ["Mother of Dragons", "The Unburnt"],
None],
'Other': ["Info", "Info", "Info"]}
df = pd.DataFrame(data)
cols = df.columns.difference(['Full Name', 'AKA', 'LQAKA'])
out = (df.melt(cols, ignore_index=False)
.dropna(subset='value')
.sort_index(kind='stable', ignore_index=True)
.rename(columns={'value': 'Full Name'})
.explode('Full Name', ignore_index=True)
)
m = out['variable'].ne('Full Name')
out.loc[m, 'Name Type'] = out.pop('variable')
out.loc[m, cols[1:]] = ''
out = out[['Name Type', 'Full Name']+list(cols[1:])]
Output:
Name Type Full Name Other
0 Primary John Snow Info
1 AKA Aegon Targaryen
2 AKA Lord Snow
3 LQAKA The Bastard of Winterfell
4 Primary Daenerys Targaryen Info
5 LQAKA Mother of Dragons
6 LQAKA The Unburnt
7 Primary Brienne Tarth Info