Keras / NN – Handling NaN, missing input

Question:

These days I’m trying to teach myself machine learning and I’m going though some issues with my dataset.

Some of my rows (I work with csv files that I create with some js script, I feel more confident doing that in js) are empty which is normal as I’m trying to build some guessing model but the issue is that it results in having nan values on my training set.

My NN was not training so I added a piece of code to remove them from my set, but now I have some issues where my model can’t work with input from different size.

So my question is: how do I handle missing data? (I basically have 2 rows and can only have the value from 1 and can’t merge them as it will not give good results)

I can remove it from my set, which would reduce the accuracy of my model in the end.

PS: if needed I’ll post some code when I come back home.

Asked By: Halt

Source

Answers:

You need to have the same input size during training and inference. If you have a few missing values (a few %), you can always choose to replace the missing values by a 0 or by the average of the column. If you have more missing values (more than 50%) you are probably better off ignoring the column completely. Note that this theoretical, the best way to make it work is to try different strategies on your data.

Answered By: Benjamin Breton