Removing all characters after curtain character only if digit is before that character

Question:

I want to remove all characters after ‘r’
in all of columns in dataframe. All columns of df are like this

1     311 889r/r +29.61%~sektor +21.56%
2     98 921r/r +218.42%~sektor +14.42%
3                                   NaN
4      37 215r/r +27.47%~sektor +11.80%
5                                   NaN
6       57 734r/r +5.28%~sektor -34.58%
7        89 883r/r -5.62%~sektor +2.74%
8       28 136r/r -7.12%~sektor +22.40%
9     385 084r/r +32.89%~sektor +32.89%
...

I’m trying to have only simple characters like:

 1       311 889
 2        98 921
 3           NaN
 4        37 215

Here are also dtypes of all my df columns

Unnamed: 0            object
2004  (gru 04)        object
2005  (gru 05)        object
2006  (gru 06)        object
2007  (gru 07)        object
2008  (gru 08)        object
2009  (gru 09)        object
2010  (gru 10)        object
2011  (gru 11)        object
2012  (gru 12)        object
2013  (gru 13)        object
2014  (gru 14)        object
2015  (gru 15)        object
2016  (gru 16)        object
2017  (gru 17)        object
2018  (gru 18)        object
2019  (gru 19)        object
2020  (gru 20)        object
2021  (gru 21)        object
2022/Q1  (mar 22)     object
Unnamed: 20          float64
dtype: object
Asked By: Manawryn

||

Answers:

Try Below code

import pandas as pd
import numpy as np 

df = pd.DataFrame({'col':['57 734r/r +5.28%~sektor -34.58%', '89 883r/r -5.62%~sektor +2.74%', 'Aktywa obrotowe']})
df['new_col'] = df.col.str.split('[a-zA-Z].*', expand=True)[[0]].replace('',np.nan)
df

Output:

enter image description here

Answered By: Abhishek
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.