Python- problem converting negative numbers to floats, issues with hyphen encoding

Question:

I have a Pandas dataframe that I’ve read from a file – pd.read_csv() – and I’m having trouble converting a column with string values to float.

Firstly, I’m not entirely sure why pandas is even reading the column as string files to begin with – all the values are numeric. The problem seems to be with the hyphen minus sign for the negative numbers. There are other threads on this topic that mention how em-dash can mess things up (here, for example)

However, when I try converting the hyphen type, it still gives me an error. For example,

df['Verified_m'] = df['Verified_m'].str.replace("U00002013", "-").astype(float)

doesn’t change anything; all the values start with the '-' hyphen, so it’s not actually replacing anything. It still gives me the error:

ValueError: could not convert string to float: '-'

I’ve tried replacing all of the hyphens with a numeric value to see if that would work, and I’m able to convert to float (example: df['Verified_m'] = df['Verified_m'].str.replace("-", "0").astype(float) . But I’d like to retain the negative values in the dataset. Does anyone know what’s wrong with my hyphens?

Asked By: Feesh

||

Answers:

Try this:

df['Verified_m'] = df['Verified_m'].str.replace("U00002013", "-").str.replace(r'^-$', '0', regex=True).astype(float)

After converting the em-dashes to hyphens, it converts a lone - to zero.

Answered By: Barmar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.