how to complete missing data in a dataframe

Question

i am using an API to download live stock market data.
this information a lot of the time is incomplete.
e.g;

                                 Open        High         Low       Close   Adj Close   Volume
Datetime
2022-02-16 15:00:00-05:00  172.872101  173.029999  172.839996  172.910004  172.910004        0
2022-02-16 15:01:00-05:00  172.899994  172.949997  172.779999  172.815002  172.815002   160249
2022-02-16 15:04:00-05:00  173.089996  173.320007  173.030106  173.315002  173.315002   311095
2022-02-16 15:05:00-05:00  173.320007  173.339996  173.164993  173.214996  173.214996   174639
2022-02-16 15:07:00-05:00  173.139999  173.179993  173.089996  173.160004  173.160004   135559

as you can tell by the timestamp , it skips a lot of information

my question is :
is there a way to complete that missing data to achieve something like this ?

                                 Open        High         Low       Close   Adj Close   Volume
Datetime
2022-02-16 15:00:00-05:00  172.872101  173.029999  172.839996  172.910004  172.910004        0
2022-02-16 15:01:00-05:00  172.899994  172.949997  172.779999  172.815002  172.815002   160249
2022-02-16 15:02:00-05:00  172.809998  172.990005  172.809998  172.979996  172.979996   119117
2022-02-16 15:03:00-05:00  172.970001  173.169998  172.964996  173.080093  173.080093   264624
2022-02-16 15:04:00-05:00  173.089996  173.320007  173.030106  173.315002  173.315002   311095
2022-02-16 15:05:00-05:00  173.320007  173.339996  173.164993  173.214996  173.214996   174639
2022-02-16 15:06:00-05:00  173.220001  173.220001  173.080002  173.139999  173.139999   124707
2022-02-16 15:07:00-05:00  173.139999  173.179993  173.089996  173.160004  173.160004   135559

Asked By: Ariel Tarayants

||

Source

Answer 1

There are lots of ways to do this. Go through the whole blog.
https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779

Drop the missing data if you’ve enough data for training.
Add the data using the techniques in the blog.

Answered By: Haider Ali

Answer 2

With resample to 1 minute periods then interpolate to fill the NaN values

df = df.resample('1T').interpolate(method='linear', limit_direction='forward', axis=0)

Answered By: jomavera

Answer 3

How to complete the missing data with simple arithmetic average and taking into account NaN. The column to be completed is "VILLARTEAGA". I’m sorry I’m new in this.

from sklearn.impute import SimpleImputer
import numpy as np
X = dfTDia.iloc[:, 2].values
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X)
X = imputer.transform(X)
X

Answered By: faquimbayal

how to complete missing data in a dataframe

Question:

Answers: