How to extract information from one column to create a new column in a pandas data frame

Question

I have a lot of excel files, I want to combine, but in a first step, I’m trying to manipulate the files.
My data more or less looks like this:

session	type	role
parliament: 12
1	standing	member
1	standing	member
parliament: 13
1	standing	member
2	standing	member

Now, what I’m trying to do, is to add a new column containing the parliament information from the session column, while at the same time keeping all the other information as it is. So my final excel should look like this:

session	type	role	parliament
1	standing	member	12
1	standing	member	12
1	standing	member	13
2	standing	member	13

Can you guys please help me understanding how to solve this?

EDIT:
Here’ a slice of my data in dictionary form

{'Session': {0: 'Parliament: 28', 1: 1, 2: 1, 3: 1, 4: 1},
 'Composition': {0: nan, 1: 'Senate', 2: 'Senate', 3: 'Senate', 4: 'Senate'},
 'Type': {0: nan, 1: 'Standing', 2: 'Standing', 3: 'Standing', 4: 'Standing'},
 'Role': {0: nan, 1: 'Chair', 2: 'Member', 3: 'Member', 4: 'Member'},
 'Organization': {0: nan,
  1: 'Committee of Selection',
  2: 'Standing Committee on Banking and Commerce',
  3: 'Standing Committee on Finance',
  4: 'Standing Committee on Immigration and Labour'},
 'Political Affiliation': {0: nan,
  1: 'Liberal Party of Canada',
  2: 'Liberal Party of Canada',
  3: 'Liberal Party of Canada',
  4: 'Liberal Party of Canada'}}

Asked By: futur3boy

||

Source

Answer 1

You can groupby each partliament group using cumsum(), and then just restructure the data in the apply function to match the final output you want:

(df.groupby(df.session.str.contains('parliament').cumsum())
   .apply(lambda s: s[1:].assign(parliament=s.head(1).session.item().strip('parliament: ')))
   .reset_index(drop=True))

  session      type    role parliament
0       1  standing  member         12
1       1  standing  member         12
2       1  standing  member         13
3       2  standing  member         13

Answered By: rafaelc

Answer 2

here is one way to do it

df[['txt','parliament']]=df['Session'].str.split(':', expand=True).ffill()
df=df[~df['Type'].isnull()]
df.drop(columns='txt')

Answered By: Naveed

Answer 3

You can extract the number after parliament: then front fill the value:

out = (df[~df['session'].str.startswith('parliament')]
           .join(df['session'].str.extract(r':s(?P<parliament>d+)').ffill()))
print(out)

# Output
  session      type    role parliament
1       1  standing  member         12
2       1  standing  member         12
4       1  standing  member         13
5       2  standing  member         13

Answered By: Corralien

How to extract information from one column to create a new column in a pandas data frame

Question:

Answers: