How can I fill in a missing values in range with Pandas?
Question:
I have a dataset with a number of values like below.
>>> a.head()
value freq
3 9 1
2 11 1
0 12 4
1 15 2
I need to fill in the values between the integers in the value column. For example, I need to insert one new row between 9 & 11 filled with zeroes, then another two between 12-15. The end result should be the dataset with 9-15 with ‘missing’ rows as zeroes across the board.
Is there anyway to insert a new row at an specific location without replacing data? The only methods I’ve found involve slicing the dataframe at a location then appending a new row and concatenating the remainder.
UPDATE: The index is completely irrelevant so don’t worry about that.
Answers:
You didn’t say what should happen to your Index, so I’m assuming it’s unimportant.
In [12]: df.index = df['value']
In [15]: df.reindex(np.arange(df.value.min(), df.value.max() + 1)).fillna(0)
Out[15]:
value freq
value
9 9 1
10 0 0
11 11 1
12 12 4
13 0 0
14 0 0
15 15 2
Another option is to create a second dataframe with values from min to max, and outer join this to your dataframe:
import pandas as pd
a = pd.DataFrame({'value':[9,11,12,15], 'freq':[1,1,4,2]})
# value freq
#0 9 1
#1 11 1
#2 12 4
#3 15 2
b = pd.DataFrame({'value':[x for x in range(a.value.min(), a.value.max()+1)]})
value
0 9
1 10
2 11
3 12
4 13
5 14
6 15
a = pd.merge(left=a, right=b, on='value', how='outer').fillna(0).sort_values(by='value')
# value freq
#0 9 1.0
#4 10 0.0
#1 11 1.0
#2 12 4.0
#5 13 0.0
#6 14 0.0
#3 15 2.0
I have a dataset with a number of values like below.
>>> a.head()
value freq
3 9 1
2 11 1
0 12 4
1 15 2
I need to fill in the values between the integers in the value column. For example, I need to insert one new row between 9 & 11 filled with zeroes, then another two between 12-15. The end result should be the dataset with 9-15 with ‘missing’ rows as zeroes across the board.
Is there anyway to insert a new row at an specific location without replacing data? The only methods I’ve found involve slicing the dataframe at a location then appending a new row and concatenating the remainder.
UPDATE: The index is completely irrelevant so don’t worry about that.
You didn’t say what should happen to your Index, so I’m assuming it’s unimportant.
In [12]: df.index = df['value']
In [15]: df.reindex(np.arange(df.value.min(), df.value.max() + 1)).fillna(0)
Out[15]:
value freq
value
9 9 1
10 0 0
11 11 1
12 12 4
13 0 0
14 0 0
15 15 2
Another option is to create a second dataframe with values from min to max, and outer join this to your dataframe:
import pandas as pd
a = pd.DataFrame({'value':[9,11,12,15], 'freq':[1,1,4,2]})
# value freq
#0 9 1
#1 11 1
#2 12 4
#3 15 2
b = pd.DataFrame({'value':[x for x in range(a.value.min(), a.value.max()+1)]})
value
0 9
1 10
2 11
3 12
4 13
5 14
6 15
a = pd.merge(left=a, right=b, on='value', how='outer').fillna(0).sort_values(by='value')
# value freq
#0 9 1.0
#4 10 0.0
#1 11 1.0
#2 12 4.0
#5 13 0.0
#6 14 0.0
#3 15 2.0