# ValueError: could not convert string to float: id

## Question:

I’m running the following python script:

```
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
result=stats.ttest_ind(list1,list2)
print result[1]
```

However I got the errors like:

```
ValueError: could not convert string to float: id
```

I’m confused by this.

When I try this for only one line in interactive section, instead of for loop using script:

```
>>> from scipy import stats
>>> import numpy as np
>>> f=open('data2.txt','r').readlines()
>>> w=f[1].split()
>>> l1=w[1:8]
>>> l2=w[8:15]
>>> list1=[float(x) for x in l1]
>>> list1
[5.3209183842, 4.6422726719, 4.3788135547, 5.9299061614, 5.9331108706, 5.0287087832, 4.57...]
```

It works well.

Can anyone explain a little bit about this?

Thank you.

## Answers:

This error is pretty verbose:

```
ValueError: could not convert string to float: id
```

*Somewhere* in your text file, a line has the word `id`

in it, which can’t really be converted to a number.

Your test code works because the word `id`

isn’t present in `line 2`

.

If you want to catch that line, try this code. I cleaned your code up a tad:

```
#!/usr/bin/python
import os, sys
from scipy import stats
import numpy as np
for index, line in enumerate(open('data2.txt', 'r').readlines()):
w = line.split(' ')
l1 = w[1:8]
l2 = w[8:15]
try:
list1 = map(float, l1)
list2 = map(float, l2)
except ValueError:
print 'Line {i} is corrupt!'.format(i = index)'
break
result = stats.ttest_ind(list1, list2)
print result[1]
```

Obviously some of your lines don’t have valid float data, specifically some line have text `id`

which can’t be converted to float.

When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.

```
#!/usr/bin/python
import os,sys
from scipy import stats
import numpy as np
f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError,e:
print "error",e,"on line",i
result=stats.ttest_ind(list1,list2)
print result[1]
```

Your data may not be what you expect — it seems you’re expecting, but not getting, floats.

A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:

```
for i in range(0,N):
w=f[i].split()
l1=w[1:8]
l2=w[8:15]
try:
list1=[float(x) for x in l1]
list2=[float(x) for x in l2]
except ValueError, e:
# report the error in some way that is helpful -- maybe print out i
result=stats.ttest_ind(list1,list2)
print result[1]
```

My error was very simple: the text file containing the data had some **space** (so not visible) character on the last line.

As an output of grep, I had `45 `

instead of just `45`

.

Perhaps your numbers aren’t actually numbers, but letters masquerading as numbers?

In my case, the font I was using meant that “l” and “1” looked very similar. I had a string like ‘l1919’ which I thought was ‘11919’ and that messed things up.

I solved the similar situation with basic technique using pandas. First load the csv or text file using pandas.It’s pretty simple

```
data=pd.read_excel('link to the file')
```

Then set the index of data to the respected column that needs to be changed. For example, if your data has ID as one attribute or column, then set index to ID.

```
data = data.set_index("ID")
```

Then delete all the rows with “id” as the value instead of number using following command.

```
data = data.drop("id", axis=0).
```

Hope, this will help you.

For a Pandas dataframe with a column of numbers with commas, use this:

```
df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]
```

So values like `4,200.42`

would be converted to `4200.42`

as a float.

Bonus 1: This is *fast*.

Bonus 2: More space efficient if saving that dataframe in something like Apache Parquet format.

Shortest way:

`df["id"] = df['id'].str.replace(',', '').astype(float)`

– if ‘,’ is the problem

`df["id"] = df['id'].str.replace(' ', '').astype(float)`

– if blank space is the problem

Update empty string values with 0.0 values:

if you know the possible non-float values then update it.

```
df.loc[df['score'] == '', 'score'] = 0.0
df['score']=df['score'].astype(float)
```