Assigning values to Pandas Multiindex DataFrame by index level
Question:
I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series. The series shares its index with the first level of the index of the dataframe.
import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s
out:
A B
bar one NaN NaN
two NaN NaN
three NaN NaN
baz one NaN NaN
foo one NaN NaN
two NaN NaN
bar True
baz False
foo True
dtype: bool
These don’t work:
df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error
expected output:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Answers:
Series (and dictionaries) can be used just like functions with map and apply (thanks to @normanius for improving the syntax):
df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values
Or similarly:
df['A'] = df.reset_index(level=0)['level_0'].map(s).values
Results:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
df.A = s
does not raise an error, but does nothing
Indeed this should have worked.Your point is actually related to mine.
ᐊᐊ The workaround ᐊᐊ
>>> s.index = pd.Index((c,) for c in s.index) # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Why does the above work ?
Because when you do directly df.A = s
without the workaround, you are actually trying to assign pandas.Index
-contained coordinates within a subclass instance,which somehow looks like a "counter-opposition" to the LS principle i.e. an instance of pandas.MultiIndex
. I mean, look for yourself:
>>> type(s.index).__name__
'Index'
whereas
>>> type(df.index).__name__
'MultiIndex'
Hence this workaround that consists in turning s
‘s index into a 1-dimensional pandas.MultiIndex
instance.
>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'
and nothing has perceptibly changed
>>> s
bar True
baz False
foo True
dtype: bool
A thought: From many views (mathematical, ontological) all this somehow shows that pandas.Index
should have been designed as a subclass of pandas.MultiIndex
, not the opposite, as it is currently.
You can use the join
method on the df
DataFrame, but you need to name the indexes and the series accordingly:
>>> df.index.names = ('lvl0', 'lvl1')
>>> s.index.name = 'lvl0'
>>> s.name = 'new_col'
Then the join method creates a new column in the DataFrame:
>>> df.join(s)
A B new_col
lvl0 lvl1
bar one NaN NaN True
two NaN NaN True
three NaN NaN True
baz one NaN NaN False
foo one NaN NaN True
two NaN NaN True
To assign it to an existing column:
>>> df['A'] = df.join(s)['new_col']
>>> df
A B
lvl0 lvl1
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series. The series shares its index with the first level of the index of the dataframe.
import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s
out:
A B
bar one NaN NaN
two NaN NaN
three NaN NaN
baz one NaN NaN
foo one NaN NaN
two NaN NaN
bar True
baz False
foo True
dtype: bool
These don’t work:
df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error
expected output:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Series (and dictionaries) can be used just like functions with map and apply (thanks to @normanius for improving the syntax):
df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values
Or similarly:
df['A'] = df.reset_index(level=0)['level_0'].map(s).values
Results:
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
df.A = s
does not raise an error, but does nothing
Indeed this should have worked.Your point is actually related to mine.
ᐊᐊ The workaround ᐊᐊ
>>> s.index = pd.Index((c,) for c in s.index) # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
A B
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN
Why does the above work ?
Because when you do directly df.A = s
without the workaround, you are actually trying to assign pandas.Index
-contained coordinates within a subclass instance,which somehow looks like a "counter-opposition" to the LS principle i.e. an instance of pandas.MultiIndex
. I mean, look for yourself:
>>> type(s.index).__name__
'Index'
whereas
>>> type(df.index).__name__
'MultiIndex'
Hence this workaround that consists in turning s
‘s index into a 1-dimensional pandas.MultiIndex
instance.
>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'
and nothing has perceptibly changed
>>> s
bar True
baz False
foo True
dtype: bool
A thought: From many views (mathematical, ontological) all this somehow shows that pandas.Index
should have been designed as a subclass of pandas.MultiIndex
, not the opposite, as it is currently.
You can use the join
method on the df
DataFrame, but you need to name the indexes and the series accordingly:
>>> df.index.names = ('lvl0', 'lvl1')
>>> s.index.name = 'lvl0'
>>> s.name = 'new_col'
Then the join method creates a new column in the DataFrame:
>>> df.join(s)
A B new_col
lvl0 lvl1
bar one NaN NaN True
two NaN NaN True
three NaN NaN True
baz one NaN NaN False
foo one NaN NaN True
two NaN NaN True
To assign it to an existing column:
>>> df['A'] = df.join(s)['new_col']
>>> df
A B
lvl0 lvl1
bar one True NaN
two True NaN
three True NaN
baz one False NaN
foo one True NaN
two True NaN