How to Solve "simulations array must contain numerical values" error when my csv files are already in proper format?, using Jupyter Notebook

Question

I am trying to evaluate dataset of temperature that i extracted from GCM against my observed data. I used the same exact script for precipitation files as well and it worked well. But now when I run that same script for Temp it gives me error. The format in which i prepared my input files are exactly same as precipitation files. So it should work… The error is as following:
"TypeError: simulations array must contain numerical values"
All my input files aka Simulated file , Station file and output file which I am supposed to get are in .CSV format.
I am sharing the script below. Please have a look and help me out. and my simulated and observed files are here:https://drive.google.com/drive/folders/1u5kgCSVbReDzv1bgh1l_YJjmv1iwatlh?usp=sharing

Edit : The script is quite long so i am sharing the concerned parts. Plz lemme know if its not clear enough.

for file in os.listdir("input/sim"):
    if file.endswith(".csv"):
        simulated_data=pd.read_csv(os.path.join("input/sim", file))
    simulated_data['Date']=pd.to_datetime(simulated_data['Date'])
    simulated_data.index=simulated_data['Date']
    simulated_data.drop(['Date'],axis=1,inplace=True)
    simulated_data.index
       
    for s in lat_lon['Stations']:
        Ob_data=pd.DataFrame(Observed_data[str(s)])
        sim_data=pd.DataFrame(simulated_data[str(s)])
        for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
        nse = he.evaluator(he.nse,Ob_data, sim_data)
        nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
        R2=my_rsquare(Ob_data, sim_data)
        R22=pd.DataFrame(R2, columns=['R2'], index=[str(s)])
        MSE=mean_squared_error(Ob_data, sim_data)
        MSE1=pd.DataFrame(MSE, columns=['MSE'], index=[str(s)])
        RMSE=math.sqrt(MSE)
        RMSE1=pd.DataFrame(RMSE, columns=['RMSE'], index=[str(s)])
        corr = for_cor.corr()
        corr1=pd.DataFrame(corr.iloc[0,1], columns=['Pearson_R'], index=[str(s)])
        mae=mean_absolute_error(Ob_data, sim_data)
        mae1=pd.DataFrame(mae, columns=['MAE'], index=[str(s)])
        kge, r, alpha, beta = he.evaluator(he.kge, Ob_data, sim_data)
        kge_results=pd.DataFrame([kge], columns=['kge'],index=[str(s)])
        globals()['kge_'+str(s)]=kge_results
        Perf=pd.concat([nse1,R22,MSE1,corr1,kge_results,mae1,RMSE1],axis=1,copy=True)
        globals()['perform_'+str(s)]=Perf
    
    for s in lat_lon['Stations']:   
        All_stations=pd.concat([globals()['perform_'+str(s)] for s in lat_lon['Stations']],axis=0,copy=True)
        globals()['result']=All_stations
    
        final=pd.concat([result,lat_lon],axis=1,copy=True,sort=True)
        final.drop('Stations',axis=1,inplace=True)
    
        new_path = os.path.join("output/", file)
        final.to_csv(new_path)

        result['Stations']=result.index
        result.index=result['Stations']
        result.drop('Stations',axis=1,inplace=True)

and the Error log:

TypeError                                 Traceback (most recent call last)
Input In [49], in <cell line: 1>()
     11 sim_data=pd.DataFrame(simulated_data[str(s)])
     12 for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
---> 13 nse = he.evaluator(he.nse,Ob_data, sim_data)
     14 nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
     15 R2=my_rsquare(Ob_data, sim_data)

File D:Program FilesPythonANACONDAlibsite-packageshydroevalhydroeval.py:158, in evaluator(obj_fn, simulations, evaluation, axis, transform, epsilon)
    156     raise TypeError('simulations must be an array')
    157 if not np.issubdtype(simulations.dtype, np.number):
--> 158     raise TypeError('simulations array must contain numerical values')
    159 evaluation = np.asarray(evaluation)
    160 if not evaluation.shape:

TypeError: simulations array must contain numerical values

and this is how I have defined nse function:

#NSE Function

import statistics
import pandas as pd

def my_nse(arr1,arr2):
    
    numsum=densum=0
    
    my_new=pd.DataFrame()
    my_new['Observed_Discharge']=arr1
    my_new['Simulated_Discharge']=arr2
    
    mean_val_obs=statistics.mean(my_new['Observed_Discharge'])

    i=0
    while i<len(my_new['Simulated_Discharge'].values):
        
        num=(my_new['Observed_Discharge'][i])-(my_new['Simulated_Discharge'][i])
        num=num*num

        den=(my_new['Observed_Discharge'][i])-mean_val_obs
        den=den*den

        numsum=numsum+num
        densum=densum+den

        i=i+1

    cons=numsum/densum
    nse=1-cons
    
    return nse

Thank you

Asked By: Harith S

||

Source

Answer 1

In this cases you can try forcing the conversion of your DataFrame to a float32, since you want floating numbers.

I’ve created this script to do so:

import os 
import pandas as pd

data = pd.read_csv("Observed_data.csv")
df = pd.DataFrame(data)
df = df(labels="Date", axis=1)
df = df.astype('float32')

Actually while doing so I’ve encountered this error:

ValueError: could not convert string to float: '#VALUE!'

I had a look at the data you provided and I’ve seen that you actually have a '#VALUE!' string on line 204, column V429. It’ a mistake that you have to fix manually. Once you handle it you should be good. Having a string, even by mistake, among float data, made the whole column non-numerical.

Answered By: ClaudiaR

How to Solve "simulations array must contain numerical values" error when my csv files are already in proper format?, using Jupyter Notebook

Question:

Answers: