InfluxDB: How to deal with missing data?

Question:

Question Description

We are performing a lot of timeseries queries, these queries sometimes result in issues, they are usually performed through an API (Python) and sometimes result in complete failure due to data missing.

Due to this situation we are not sure where to educate ourselves and get the answer to this specific question on, how to deal with missing data in our timeseries (influxdb) database

Example

To describe a problem in an example..

We have some timeseries data, let’s say we measure the temperature of the room, now we have many rooms and sometimes sensors die or stop working for a week or two, then we replace them and so on, in that timeframe the data is missing.

Now we try to perform certain calculations, they fail, let’s say we want to calculate the temperature average per each day, now this will fail because some days we have no measurement input on the sensors.

One approach that we thought of is that we just interpolate the data for that day. Use the last and the first available and just place that value for the days that there is no data available.

This has many downsides, major one being due to fake data, you can’t trust it and for our processes that are a bit more serious we would prefer to not store fake data (or interpolated).

We were wondering what the possible alternatives were to this question and where can we find the resource to educate ourselves on such topic.

Asked By: innicoder

||

Answers:

Answer

The idea is that we fill the missing values, the gaps, with data that is null or None. This way we can use influxdb built-in fill.
https://docs.influxdata.com/influxdb/cloud/query-data/flux/fill/

enter image description here

Like in this example, we are able to fill null values and thereby perform any additional queries and actions on the data on analysis.

The link reference above contains all of the methodologies that we can use to resolve and fill in the missing data values.

Answered By: innicoder
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.