Loosing negative sign when extracting data from a dataframe

Question:

I extract temperature from a website into a dataframe. It looks like this:

           Temp      Prec
0     3 / -4 °C         -
1    1 / -17 °C         -
2   -7 / -18 °C         -
3     6 / -8 °C         -
4      8 / 1 °C         -
5      8 / 0 °C   1.3  mm
6      8 / 0 °C   7.0  mm
7     6 / -1 °C         -
8      4 / 0 °C   4.0  mm
9      5 / 2 °C  23.8  mm
10     6 / 1 °C         -
11    5 / -1 °C         -
12    4 / -1 °C         -
13     7 / 0 °C  10.6  mm
14     7 / 1 °C  29.7  mm

Then I use this code to extract the temperature in the format I want:

df2['Temp'] = df2['Temp'].str.extract('(d+)') + 'C'

and I get this result:

   Temp      Prec
0    3C         -
1    1C         -
2    7C         -
3    6C         -
4    8C         -
5    8C   1.3  mm
6    8C   7.0  mm
7    6C         -
8    4C   4.0  mm
9    5C  23.8  mm
10   6C         -
11   5C         -
12   4C         -
13   7C  10.6  mm
14   7C  29.7  mm

I have lost the negative sign (like on row 2) when it’s a temperature below zero. How can I keep the negative sign?

Asked By: Rich

||

Answers:

Without regex, go for rsplit and slice with str :

df["Temp"] = df["Temp"].str.rsplit("/", n=1).str[-1]

And regarding the regex approach, you can include °C in the captured group :

df["Temp"] = df["Temp"].str.extract("(-?d+s*°C)", expand=False)


Output :

print(df)

      Temp     Prec
0    -4 °C        -
1   -17 °C        -
2   -18 °C        -
3    -8 °C        -
4     1 °C        -
5     0 °C   1.3 mm
6     0 °C   7.0 mm
8     0 °C   4.0 mm
9     2 °C  23.8 mm
10    1 °C        -
11   -1 °C        -
12   -1 °C        -
13    0 °C  10.6 mm
14    1 °C  29.7 mm
Answered By: Timeless
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.