Remove part of the column name of a dataframe using a regular expression in Python

Question

I have a dataframe "counts" and I would like to change the name of the second column using a regular expression because I have multiple files with this "extra information", so I have:

| GeneID |  /home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam   |
| -------- | -------------- |
|  Ciclev10010164m.g.v1.0    | 2            |
|  Ciclev10007306m.g.v1.0    | 647            |
|  Ciclev10009318m.g.v1.0   | 39            |
|  Ciclev...   | ...           |
|  Ciclev10007306m.g.v1.0    | 112            |

I tried with the following code with no success:

for col in counts1:
  counts1.rename(columns={col:col.upper().replace("/home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam","SRR[d]{6}")},inplace=True)

How can I obtain a df with the following format?

| GeneID |  SRR1212121   |
| -------- | -------------- |
|  Ciclev10010164m.g.v1.0    | 2            |
|  Ciclev10007306m.g.v1.0    | 647            |
|  Ciclev10009318m.g.v1.0   | 39            |
|  Ciclev...   | ...           |
|  Ciclev10007306m.g.v1.0    | 112            |

Asked By: Rodrigo Machado

||

Source

Answer 1

You could try:

df.columns = df.columns.str.extract(r'((?<=/)SRRd+|^[^/]+$)', expand=False)

regex:

(?<=/)SRRd+  # match SDD + digits if preceded by "/"
^[^/]+$       # else match full string if it doesn't contain "/"

Answered By: mozway

Remove part of the column name of a dataframe using a regular expression in Python

Question:

Answers: