pandas.Series.interpolate() does nothing. Why?

Question:

I have a dataframe with DatetimeIndex. This is one of columns:

>>> y.out_brd
2013-01-01 11:25:00     0.04464286
2013-01-01 11:30:00            NaN
2013-01-01 11:35:00            NaN
2013-01-01 11:40:00    0.005952381
2013-01-01 11:45:00     0.01785714
2013-01-01 11:50:00    0.008928571
Freq: 5T, Name: out_brd, dtype: object

When I’m trying to use interpolate() on function I get absolutly nothing changes:

>>> y.out_brd.interpolate(method='time')
2013-01-01 11:25:00     0.04464286
2013-01-01 11:30:00            NaN
2013-01-01 11:35:00            NaN
2013-01-01 11:40:00    0.005952381
2013-01-01 11:45:00     0.01785714
2013-01-01 11:50:00    0.008928571
Freq: 5T, Name: out_brd, dtype: object

How to make it work?

Update:
the code for generating such a dataframe.

time_index = pd.date_range(start=datetime(2013, 1, 1, 3),
                       end=datetime(2013, 1, 2, 2, 59),
                       freq='5T')
grid_columns = [u'in_brd', u'in_alt', u'out_brd', u'out_alt']                           

df = pd.DataFrame(index=time_index, columns=grid_columns)

After that I fill cells with some data.

I have dataframe field_data with survey data about boarding and alighting on railroad, and station variable.
I also have interval_end function defined like this:

interval_end = lambda index, prec_lvl: index.to_datetime() 
                        + timedelta(minutes=prec_lvl - 1,
                                    seconds=59)

The code:

for index, row in df.iterrows():
    recs = field_data[(field_data.station_name == station)
                    & (field_data.arrive_time >= index.time())
                    & (field_data.arrive_time <= interval_end(
                                        index, prec_lvl).time())]
    in_recs_num = recs[recs.orientation == u'in'][u'train_number'].count()
    out_recs_num = recs[recs.orientation == u'out'][u'train_number'].count()

    if in_recs_num:
        df.loc[index, u'in_brd'] = recs[
                recs.orientation == u'in'][u'boarding'].sum()    / 
                (in_recs_num * CAR_CAPACITY)
        df.loc[index, u'in_alt'] = recs[
                recs.orientation == u'in'][u'alighting'].sum()   / 
                (in_recs_num * CAR_CAPACITY)
    if out_recs_num:
        df.loc[index, u'out_brd'] = recs[
                recs.orientation == u'out'][u'boarding'].sum()  / 
                (out_recs_num * CAR_CAPACITY)
        df.loc[index, u'out_alt'] = recs[
                recs.orientation == u'out'][u'alighting'].sum() / 
                (out_recs_num * CAR_CAPACITY)
Asked By: Mikhail Elizarev

||

Answers:

You need to convert your Series to have a dtype of float64 instead of your current object. Here’s an example to illustrate the difference. Note that in general object dtype Series are of limited use, the most common case being a Series containing strings. Other than that they are very slow since they cannot take advantage of any data type information.

In [9]: s = Series(randn(6), index=pd.date_range('2013-01-01 11:25:00', freq='5T', periods=6), dtype=object)

In [10]: s.iloc[1:3] = nan

In [11]: s
Out[11]:
2013-01-01 11:25:00   -0.69522
2013-01-01 11:30:00        NaN
2013-01-01 11:35:00        NaN
2013-01-01 11:40:00   -0.70308
2013-01-01 11:45:00    -1.5653
2013-01-01 11:50:00    0.95893
Freq: 5T, dtype: object

In [12]: s.interpolate(method='time')
Out[12]:
2013-01-01 11:25:00   -0.69522
2013-01-01 11:30:00        NaN
2013-01-01 11:35:00        NaN
2013-01-01 11:40:00   -0.70308
2013-01-01 11:45:00    -1.5653
2013-01-01 11:50:00    0.95893
Freq: 5T, dtype: object

In [13]: s.astype(float).interpolate(method='time')
Out[13]:
2013-01-01 11:25:00   -0.6952
2013-01-01 11:30:00   -0.6978
2013-01-01 11:35:00   -0.7005
2013-01-01 11:40:00   -0.7031
2013-01-01 11:45:00   -1.5653
2013-01-01 11:50:00    0.9589
Freq: 5T, dtype: float64
Answered By: Phillip Cloud

I am late but, this solved my problem.
You need to assign the outcome to some variable or itself.

y=y.out_brd.interpolate(method='time')
Answered By: Dimanjan

You could also fix this without changing the name of the data frame with the function "in place":

y.out_brd.interpolate(method='time', inplace=True)
Answered By: Santi Gil

Short answer from Phillip, which I missed the first time and came back to answer it:

You need to have a float series:

s.astype(float).interpolate(method='time')
Answered By: Rivet174
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.