Combining specific rows that have NaN for a different column on python using pandas
Question:
I am hoping this question makes sense.
I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table
Some of the chemical names are split into multiple rows and I need each name in its won row. I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.
EDIT: after running
dt.head(15).to_dict()
{‘Unnamed: 0’: {6: ‘1’,
7: nan,
8: ‘2’,
9: ‘3’,
10: nan,
11: nan,
12: ‘4’,
13: ‘5’,
14: nan,
15: nan,
16: ‘6’,
17: ‘7’,
18: ‘8’,
19: ‘9’,
20: nan},
‘Phenolics’: {6: ‘Dihydroquercetin’,
7: ‘7,30-dimethyl ether’,
8: ‘Artelin’,
9: ‘Esculin 7-‘,
10: ‘methylether’,
11: ‘(methylesculin)’,
12: ‘Esculin’,
13: ‘Scopoletin (7-‘,
14: ‘hydroxy-6-‘,
15: ‘methoxycoumarin)’,
16: ‘Axillarin’,
17: ‘Esculetin’,
18: ‘Isoscopoletin’,
19: ‘6-Beta-D-glucosyl-7-‘,
20: ‘methoxycoumarin’}}
Can anyone help me?
Thank you!
Answers:
df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})
A one-line solution
df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()
I am hoping this question makes sense.
I have a table I extracted from a PDF of chemical names that I am trying to format and I am having issues it looks like this: table
Some of the chemical names are split into multiple rows and I need each name in its won row. I did notice the chemicals whose names are split into multiple rows have an NaN in the first column.
EDIT: after running
dt.head(15).to_dict()
{‘Unnamed: 0’: {6: ‘1’,
7: nan,
8: ‘2’,
9: ‘3’,
10: nan,
11: nan,
12: ‘4’,
13: ‘5’,
14: nan,
15: nan,
16: ‘6’,
17: ‘7’,
18: ‘8’,
19: ‘9’,
20: nan},
‘Phenolics’: {6: ‘Dihydroquercetin’,
7: ‘7,30-dimethyl ether’,
8: ‘Artelin’,
9: ‘Esculin 7-‘,
10: ‘methylether’,
11: ‘(methylesculin)’,
12: ‘Esculin’,
13: ‘Scopoletin (7-‘,
14: ‘hydroxy-6-‘,
15: ‘methoxycoumarin)’,
16: ‘Axillarin’,
17: ‘Esculetin’,
18: ‘Isoscopoletin’,
19: ‘6-Beta-D-glucosyl-7-‘,
20: ‘methoxycoumarin’}}
Can anyone help me?
Thank you!
df["group"] = df["Unnamed: 0"].ffill()
df.groupby("group").agg({"Phenolics": "".join})
A one-line solution
df = df.fillna(method='ffill').groupby('Unnamed: 0')['Phenolics'].apply(' '.join).reset_index()