How to split and rename columns at the same instance in pandas?
Question:
I have a dataframe called thyroid_df
.
>>> thyroid_df
0 1 2
0 chr1:233276815:A:G A 0.277632
1 chr2:217427435:C:G C 0.357674
2 chr3:169800667:T:G T 0.207014
3 chr5:1279675:C:T T 0.182322
4 chr5:112150207:A:T A 0.314811
5 chr8:32575278:G:T G 0.277632
6 chr9:97775520:A:C A 0.524729
7 chr10:103934543:C:T T 0.343590
8 chr14:36063370:G:C G 0.329304
9 chr14:36269155:C:T T 0.593327
10 chr15:67165147:G:C C 0.207014
11 chr15:67163292:C:T T 0.215111
I want to split the first column by :
and rename the column names. I tried this below which doesn’t seem to work. I want to rename column names with the same line of code.
thyroid_df[0].str.split(':', 3, expand=True).rename(columns = ["CHROM", "POS_GRCh38", "REF", "Effect_allele"])
Answers:
This should work. Then you could drop columns 0 if you no longer need it.
df[["CHROM", "POS_GRCh38", "REF", "Effect_allele"]] = df[0].str.split(':', 3, expand=True)
You can do:
new_cols = ["CHROM", "POS_GRCh38", "REF", "Effect_allele"]
thyroid_df[0].str.split(':',expand=True).rename(columns={i:new_cols[i] for i in range(4)})
output:
CHROM POS_GRCh38 REF Effect_allele
0 chr1 233276815 A G
1 chr2 217427435 C G
2 chr3 169800667 T G
3 chr5 1279675 C T
4 chr5 112150207 A T
5 chr8 32575278 G T
6 chr9 97775520 A C
7 chr10 103934543 C T
8 chr14 36063370 G C
9 chr14 36269155 C T
10 chr15 67165147 G C
11 chr15 67163292 C T
I have a dataframe called thyroid_df
.
>>> thyroid_df
0 1 2
0 chr1:233276815:A:G A 0.277632
1 chr2:217427435:C:G C 0.357674
2 chr3:169800667:T:G T 0.207014
3 chr5:1279675:C:T T 0.182322
4 chr5:112150207:A:T A 0.314811
5 chr8:32575278:G:T G 0.277632
6 chr9:97775520:A:C A 0.524729
7 chr10:103934543:C:T T 0.343590
8 chr14:36063370:G:C G 0.329304
9 chr14:36269155:C:T T 0.593327
10 chr15:67165147:G:C C 0.207014
11 chr15:67163292:C:T T 0.215111
I want to split the first column by :
and rename the column names. I tried this below which doesn’t seem to work. I want to rename column names with the same line of code.
thyroid_df[0].str.split(':', 3, expand=True).rename(columns = ["CHROM", "POS_GRCh38", "REF", "Effect_allele"])
This should work. Then you could drop columns 0 if you no longer need it.
df[["CHROM", "POS_GRCh38", "REF", "Effect_allele"]] = df[0].str.split(':', 3, expand=True)
You can do:
new_cols = ["CHROM", "POS_GRCh38", "REF", "Effect_allele"]
thyroid_df[0].str.split(':',expand=True).rename(columns={i:new_cols[i] for i in range(4)})
output:
CHROM POS_GRCh38 REF Effect_allele
0 chr1 233276815 A G
1 chr2 217427435 C G
2 chr3 169800667 T G
3 chr5 1279675 C T
4 chr5 112150207 A T
5 chr8 32575278 G T
6 chr9 97775520 A C
7 chr10 103934543 C T
8 chr14 36063370 G C
9 chr14 36269155 C T
10 chr15 67165147 G C
11 chr15 67163292 C T