Splitting the columns and naming in pandas
Question:
I have a data file that contains the columns of time interval such as 0-10,11-20,21-30 …., 81-90. There are also additional two columns FH and SH. The following sample dataframe represent part of my data:
df = pd.DataFrame()
df['Team'] = ['A','B','C']
df['0-10'] = ['4-0','2-2','3-2']
df ['11-20']= ['2-1','2-2','3-0']
df ['21-30'] = ['2-1','1-1','2-2']
df ['FH'] = ['5-3','6-6','5-5']
df ['SH'] = ['2-3','3-2','3-3']
What I wanted to do is to split the element under the time intervals (0-10,11-20, 21-30)
that means i will have two different columns for each time interval [‘0-10’] to [‘0-10F’ and ‘0-10A’] and the value under ‘0-10F’ will be 4 and ‘0-10A’ will be 0 for Team A,
I will do the same for other time interval ’11-20′ to ’11-20F’ and ’11-20A’
I could write code for each column seperately as follow:
df ['0-10F'] = df['0-10'].str.split('-').str[0]
df ['0-10A'] = df['0-10'].str.split('-').str[1]
df ['11-20F'] = df['11-20'].str.split('-').str[0]
df ['11-20A'] = df['11-20'].str.split('-').str[1]
df ['21-30F'] = df['21-30'].str.split('-').str[0]
df ['21-30A'] = df['21-30'].str.split('-').str[1]
Is there any better way to write it for all columns with one generic codes. The following is the expected output:
Answers:
Use DataFrame.filter
for columns names with -
, loop by each column and create new DataFrame with Series.str.split
and if necessary convert values to integers:
for c in df.filter(like='-').columns:
df[[f'{c}F', f'{c}A']] = df[c].str.split('-', expand=True).astype(int)
print (df)
Team 0-10 11-20 21-30 FH SH 0-10F 0-10A 11-20F 11-20A 21-30F
0 A 4-0 2-1 2-1 5-3 2-3 4 0 2 1 2
1 B 2-2 2-2 1-1 6-6 3-2 2 2 2 2 1
2 C 3-2 3-0 2-2 5-5 3-3 3 2 3 0 2
21-30A
0 1
1 1
2 2
I have a data file that contains the columns of time interval such as 0-10,11-20,21-30 …., 81-90. There are also additional two columns FH and SH. The following sample dataframe represent part of my data:
df = pd.DataFrame()
df['Team'] = ['A','B','C']
df['0-10'] = ['4-0','2-2','3-2']
df ['11-20']= ['2-1','2-2','3-0']
df ['21-30'] = ['2-1','1-1','2-2']
df ['FH'] = ['5-3','6-6','5-5']
df ['SH'] = ['2-3','3-2','3-3']
What I wanted to do is to split the element under the time intervals (0-10,11-20, 21-30)
that means i will have two different columns for each time interval [‘0-10’] to [‘0-10F’ and ‘0-10A’] and the value under ‘0-10F’ will be 4 and ‘0-10A’ will be 0 for Team A,
I will do the same for other time interval ’11-20′ to ’11-20F’ and ’11-20A’
I could write code for each column seperately as follow:
df ['0-10F'] = df['0-10'].str.split('-').str[0]
df ['0-10A'] = df['0-10'].str.split('-').str[1]
df ['11-20F'] = df['11-20'].str.split('-').str[0]
df ['11-20A'] = df['11-20'].str.split('-').str[1]
df ['21-30F'] = df['21-30'].str.split('-').str[0]
df ['21-30A'] = df['21-30'].str.split('-').str[1]
Is there any better way to write it for all columns with one generic codes. The following is the expected output:
Use DataFrame.filter
for columns names with -
, loop by each column and create new DataFrame with Series.str.split
and if necessary convert values to integers:
for c in df.filter(like='-').columns:
df[[f'{c}F', f'{c}A']] = df[c].str.split('-', expand=True).astype(int)
print (df)
Team 0-10 11-20 21-30 FH SH 0-10F 0-10A 11-20F 11-20A 21-30F
0 A 4-0 2-1 2-1 5-3 2-3 4 0 2 1 2
1 B 2-2 2-2 1-1 6-6 3-2 2 2 2 2 1
2 C 3-2 3-0 2-2 5-5 3-3 3 2 3 0 2
21-30A
0 1
1 1
2 2