Pandas groupby transform yields Series instead of DataFrame on empty DataFrames

Question:

Running this:

for periods in [8, 4, 0]:                                                                     
  print(f'--- periods {periods}')                                                             

  df = pandas.DataFrame(dict(                                                                 
    v1=numpy.arange(periods),                                                                 
    v2=numpy.arange(periods) * 2),                                                            
    index=pandas.date_range('2023-01-01', periods=periods, freq='6H'))                        
  dft = df.between_time('00:00', '06:00')                                                     
  dft = dft.reindex_like(df)                                                                  
  dfc = dft['v1'] > 3                                                                         
  df = df[dfc.groupby(dfc.index.date).transform(any)]                                         
  print(df)                                                                                   
  print(df.dtypes)                                                                            
  print(df.index)                                                                             
  print()                                                                                     

results in:

--- periods 8
                     v1  v2
2023-01-02 00:00:00   4   8
2023-01-02 06:00:00   5  10
2023-01-02 12:00:00   6  12
2023-01-02 18:00:00   7  14
v1    int64
v2    int64
dtype: object
DatetimeIndex(['2023-01-02 00:00:00', '2023-01-02 06:00:00',
               '2023-01-02 12:00:00', '2023-01-02 18:00:00'],
              dtype='datetime64[ns]', freq='6H')

--- periods 4
Empty DataFrame
Columns: [v1, v2]
Index: []
v1    int64
v2    int64
dtype: object
DatetimeIndex([], dtype='datetime64[ns]', freq='6H')

--- periods 0
Empty DataFrame
Columns: []
Index: []
Series([], dtype: object)
DatetimeIndex([], dtype='datetime64[ns]', freq='6H')

Why is the result for periods = 0 (i.e. empty DataFrame) a Series and not a DataFrame with columns v1 and v2?

Aside from checking whether df is empty beforehand, is there a way to return a DataFrame with both v1 and v2?

Asked By: levant pied

||

Answers:

Why is the result for periods = 0 (i.e. empty DataFrame) a Series and not a DataFrame with columns v1 and v2?

because the mask is now an empty series – it no longer contains any values, and so indexing an empty series will return an empty series

Aside from checking whether df is empty beforehand, is there a way to return a DataFrame with both v1 and v2?

Here is one way to achieve what you are asking for

df.loc[dfc.groupby(dfc.index.date).transform(any), ["v1", "v2"]]

This also works (without explicitly specifying the columns)

df = df.loc[dfc.groupby(dfc.index.date).transform(any), :]
Answered By: marwan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.