ValueError: could not convert string to float – machine learning

Question:

I’m working on a machine learning project to identify if a PCAP is an attack or not and I have to process the PCAP files and create a model and then predict.
a part of my code is like this:

train['is_train'] = np.random.uniform(0, 1, len(train)) <= .75
Train, Validate = train[train['is_train']==True], train[train['is_train']==False]
features = list(set(list(dataset.columns))-set(ID_col)-set(target_col)-set(other_col))

x_train = Train[list(features)].values
y_train = Train["class"].values
x_validate = Validate[list(features)].values
y_validate = Validate["class"].values
x_test = test[list(features)].values


random.seed(100)
rf = RandomForestClassifier(n_estimators=1000)
rf.fit(x_train, y_train)

and it’s how my x_train list contains:

[['172.27.224.250' 16 'TCP' ... 1532299481617 60 54200]
 ['172.27.224.251' 24 'TCP' ... 1532299483068 60 502]
 ['172.27.224.251' 24 'TCP' ... 1532299483069 60 502]
 ...
 ['172.27.224.251' 24 'TCP' ... 1532301279315 60 502]
 ['172.27.224.250' 16 'TCP' ... 1532301279324 60 49713]
 ['172.27.224.250' 24 'TCP' ... 1532301279335 66 49713]]

I got error ValueError: could not convert string to float: '172.27.224.250' in rf.fit(x_train, y_train)

which classifier should I use and how can I solve this problem?

Asked By: Farzaneh Jouyandeh

||

Answers:

You need to encode your categorical features into numeric values, there are few techniques like Label Encoding and One Hot Encoding which are a part of sklearn.preprocessing module which will allow you to do the encoding. So first identify the columns which are categorical in your train set and do the dummy encoding as mentioned in the above links and then apply .fit() method.

For more implementation details see Label encoder vs one hot encoder.