pandas.core.indexing.IndexingError: Too many indexers
Question:
I want to extract electricity consumption for Site 2
>>> df4 = pd.read_excel(xls, 'Elec Monthly Cons')
>>> df4
Site Unnamed: 1 2014-01-01 00:00:00 2014-02-01 00:00:00 2014-03-01 00:00:00 ... 2017-08-01 00:00:00 2017-09-01 00:00:00 2017-10-01 00:00:00 2017-11-01 00:00:00 2017-12-01 00:00:00
0 Site Profile JAN 2014 FEB 2014 MAR 2014 ... AUG 2017 SEP 2017 OCT 2017 NOV 2017 DEC 2017
1 Site 1 NHH 10344 NaN NaN ... NaN NaN NaN NaN NaN
2 Site 2 HH 258351 229513 239379 ... NaN NaN NaN NaN NaN
type
type(df4)
<class 'pandas.core.frame.DataFrame'>
My goal is to take out the numerical value but I do not know how to set the index properly. What I have tried so far does not work at all.
df1 = df.loc[idx[:,1:2],:]
But
raise IndexingError('Too many indexers')
pandas.core.indexing.IndexingError: Too many indexers
It seems that I do not understand indexing. Does the series type play any role?
df.head
<bound method NDFrame.head of Site Site 2
Unnamed: 1 HH
EDIT
print (df.index)
Index([ 'Site', 'Unnamed: 1', 2014-01-01 00:00:00,
2014-02-01 00:00:00, 2014-03-01 00:00:00, 2014-04-01 00:00:00,
2014-05-01 00:00:00, 2014-06-01 00:00:00, 2014-07-01 00:00:00,
How to solve this?
Answers:
In my opinion is necessary remove :
, because it means select all columns, but Series
have no column.
Also it seems no MultiIndex, so then need:
df1 = df.iloc[1:2]
There is problem first 2 rows are headers, so for MultiIndex DataFrame need:
df4 = pd.read_excel(xls, 'Elec Monthly Cons', header=[0,1], index_col=[0,1])
And then for select use:
idx = pd.IndexSlice
df1 = df.loc[:, idx[:,'FEB 2014':'MAR 2014']]
I got this error while using pd.apply
function wrongly (using the axis option), which returns one Pandas series per row (Pandas series has no columns)
Example
#Before apply
print(df.iloc[:,1]) # ok
df = df.apply(lambda row :(tokenizer(row[0]).input_ids,tokenizer(row[1]).input_ids), axis=1 )
print(df.iloc[:,1]) # NOT OK, throws pandas.core.indexing.IndexingError: Too many indexers
You can use iloc
for selecting a particular row of the data frame and get the nth element of the series with []
notation.
But you cannot do something like df.iloc[:,1]
, that is select all rows but containing only values of the second column.
I want to extract electricity consumption for Site 2
>>> df4 = pd.read_excel(xls, 'Elec Monthly Cons')
>>> df4
Site Unnamed: 1 2014-01-01 00:00:00 2014-02-01 00:00:00 2014-03-01 00:00:00 ... 2017-08-01 00:00:00 2017-09-01 00:00:00 2017-10-01 00:00:00 2017-11-01 00:00:00 2017-12-01 00:00:00
0 Site Profile JAN 2014 FEB 2014 MAR 2014 ... AUG 2017 SEP 2017 OCT 2017 NOV 2017 DEC 2017
1 Site 1 NHH 10344 NaN NaN ... NaN NaN NaN NaN NaN
2 Site 2 HH 258351 229513 239379 ... NaN NaN NaN NaN NaN
type
type(df4)
<class 'pandas.core.frame.DataFrame'>
My goal is to take out the numerical value but I do not know how to set the index properly. What I have tried so far does not work at all.
df1 = df.loc[idx[:,1:2],:]
But
raise IndexingError('Too many indexers')
pandas.core.indexing.IndexingError: Too many indexers
It seems that I do not understand indexing. Does the series type play any role?
df.head
<bound method NDFrame.head of Site Site 2
Unnamed: 1 HH
EDIT
print (df.index)
Index([ 'Site', 'Unnamed: 1', 2014-01-01 00:00:00,
2014-02-01 00:00:00, 2014-03-01 00:00:00, 2014-04-01 00:00:00,
2014-05-01 00:00:00, 2014-06-01 00:00:00, 2014-07-01 00:00:00,
How to solve this?
In my opinion is necessary remove :
, because it means select all columns, but Series
have no column.
Also it seems no MultiIndex, so then need:
df1 = df.iloc[1:2]
There is problem first 2 rows are headers, so for MultiIndex DataFrame need:
df4 = pd.read_excel(xls, 'Elec Monthly Cons', header=[0,1], index_col=[0,1])
And then for select use:
idx = pd.IndexSlice
df1 = df.loc[:, idx[:,'FEB 2014':'MAR 2014']]
I got this error while using pd.apply
function wrongly (using the axis option), which returns one Pandas series per row (Pandas series has no columns)
Example
#Before apply
print(df.iloc[:,1]) # ok
df = df.apply(lambda row :(tokenizer(row[0]).input_ids,tokenizer(row[1]).input_ids), axis=1 )
print(df.iloc[:,1]) # NOT OK, throws pandas.core.indexing.IndexingError: Too many indexers
You can use iloc
for selecting a particular row of the data frame and get the nth element of the series with []
notation.
But you cannot do something like df.iloc[:,1]
, that is select all rows but containing only values of the second column.