PANDAS Python | Contain specific value in specific position
Question:
I’m trying to select just the rows that on the column "Cuenta" contain "05" in the third and fourth position , for example : 51050300 , 51050600
Año
Periodo
Cuenta
2023
1
51050300
2023
2
51053900
2023
1
74359570
2023
2
74452500
2023
6
51050300
2023
7
51050600
2023
7
52351005
2023
7
52353505
2023
7
52159500
I’m using this code:
pattern=r'..05*'
df[df['Cuenta'].str.contains(pattern)]
But it doesn´t work, How can I do it?
Answers:
You have to change your pattern:
pattern = '^..05' # ^ means from the begin string
>>> df['Cuenta'].astype(str).str.contains(pattern)
0 True
1 True
2 False
3 False
4 True
5 True
6 False
7 False
8 False
Name: Cuenta, dtype: bool
Or like this:
df[df['Cuenta'].astype(str).str[2:4] == '05']
Output:
Año Periodo Cuenta
0 2023 1 51050300
1 2023 2 51053900
4 2023 6 51050300
5 2023 7 51050600
For fun, assuming an integer column, an arithmetic solution would be:
m = df['Cuenta'].floordiv(10**(np.ceil(np.log10(df['Cuenta'])-1)-3)).mod(100).eq(5)
out = df.loc[m]
Or, if the number of digits is fixed:
m = df['Cuenta']//10000%100 == 5
How it works:
df.assign(n_digits=np.ceil(np.log10(df['Cuenta'])-1)+1,
first_4=lambda d: d['Cuenta'].floordiv(10**(d['n_digits']-4)),
digits_3_4=lambda d: d['first_4'].mod(100)
)
Año Periodo Cuenta n_digits first_4 digits_3_4
0 2023 1 51050300 8.0 5105.0 5.0
1 2023 2 51053900 8.0 5105.0 5.0
2 2023 1 74359570 8.0 7435.0 35.0
3 2023 2 74452500 8.0 7445.0 45.0
4 2023 6 51050300 8.0 5105.0 5.0
5 2023 7 51050600 8.0 5105.0 5.0
6 2023 7 52351005 8.0 5235.0 35.0
7 2023 7 52353505 8.0 5235.0 35.0
8 2023 7 52159500 8.0 5215.0 15.0
9 2024 8 12051 5.0 1205.0 5.0
I’m trying to select just the rows that on the column "Cuenta" contain "05" in the third and fourth position , for example : 51050300 , 51050600
Año | Periodo | Cuenta |
---|---|---|
2023 | 1 | 51050300 |
2023 | 2 | 51053900 |
2023 | 1 | 74359570 |
2023 | 2 | 74452500 |
2023 | 6 | 51050300 |
2023 | 7 | 51050600 |
2023 | 7 | 52351005 |
2023 | 7 | 52353505 |
2023 | 7 | 52159500 |
I’m using this code:
pattern=r'..05*'
df[df['Cuenta'].str.contains(pattern)]
But it doesn´t work, How can I do it?
You have to change your pattern:
pattern = '^..05' # ^ means from the begin string
>>> df['Cuenta'].astype(str).str.contains(pattern)
0 True
1 True
2 False
3 False
4 True
5 True
6 False
7 False
8 False
Name: Cuenta, dtype: bool
Or like this:
df[df['Cuenta'].astype(str).str[2:4] == '05']
Output:
Año Periodo Cuenta
0 2023 1 51050300
1 2023 2 51053900
4 2023 6 51050300
5 2023 7 51050600
For fun, assuming an integer column, an arithmetic solution would be:
m = df['Cuenta'].floordiv(10**(np.ceil(np.log10(df['Cuenta'])-1)-3)).mod(100).eq(5)
out = df.loc[m]
Or, if the number of digits is fixed:
m = df['Cuenta']//10000%100 == 5
How it works:
df.assign(n_digits=np.ceil(np.log10(df['Cuenta'])-1)+1,
first_4=lambda d: d['Cuenta'].floordiv(10**(d['n_digits']-4)),
digits_3_4=lambda d: d['first_4'].mod(100)
)
Año Periodo Cuenta n_digits first_4 digits_3_4
0 2023 1 51050300 8.0 5105.0 5.0
1 2023 2 51053900 8.0 5105.0 5.0
2 2023 1 74359570 8.0 7435.0 35.0
3 2023 2 74452500 8.0 7445.0 45.0
4 2023 6 51050300 8.0 5105.0 5.0
5 2023 7 51050600 8.0 5105.0 5.0
6 2023 7 52351005 8.0 5235.0 35.0
7 2023 7 52353505 8.0 5235.0 35.0
8 2023 7 52159500 8.0 5215.0 15.0
9 2024 8 12051 5.0 1205.0 5.0