How to split index into multi-index with non-delimiter pandas
Question:
I have this data frame:
index 0
idxaa1cx1 some_text
idxbb2cx2 some_text
idxcc3cx3 some_text
I want to split the index into a multi index like so:
idx_1 idx_2 0
idxa a1cx1 some_text
idxb b2cx2 some_text
idxc c3cx3 some_text
I’ve tried this:
df.index = pd.MultiIndex.from_tuples([tuple(idx.split(idx[:3][-5:])) for idx in df.index])
which returns:
idx_1 idx_2 0
a1cx1 some_text
b2cx2 some_text
c3cx3 some_text
but the idx_1 column is blank. And I’ve also tried:
df.index = pd.MultiIndex.from_tuples([tuple({idx[:3]:idx[-5:]}) for idx in df.index])
which only returns:
idx_1 0
idxa some_text
idxb some_text
idxc some_text
and doesn’t return the dictionary’s “value”. My question is how can I split the index by an arbitrary length and get multiple columns?
Answers:
You were very close.
You can do:
df.index = pd.MultiIndex.from_tuples([((idx[3:],idx[-5:])) for idx in df.index])
Result:
>>> df.index
MultiIndex(levels=[[u'aa1cx1', u'bb2cx2', u'cc3cx3'], [u'a1cx1', u'b2cx2', u'c3cx3']],
labels=[[0, 1, 2], [0, 1, 2]])
The minimalist approach
df.index = [df.index.str[:4], df.index.str[-5:]]
df
0
index index
idxa a1cx1 some_text
idxb b2cx2 some_text
idxc c3cx3 some_text
On the other hand, if there IS a delimiter to split on (to help others):
newIndex = pd.MultiIndex.from_arrays(zip(*df.index.str.split(delim)))
I have this data frame:
index 0
idxaa1cx1 some_text
idxbb2cx2 some_text
idxcc3cx3 some_text
I want to split the index into a multi index like so:
idx_1 idx_2 0
idxa a1cx1 some_text
idxb b2cx2 some_text
idxc c3cx3 some_text
I’ve tried this:
df.index = pd.MultiIndex.from_tuples([tuple(idx.split(idx[:3][-5:])) for idx in df.index])
which returns:
idx_1 idx_2 0
a1cx1 some_text
b2cx2 some_text
c3cx3 some_text
but the idx_1 column is blank. And I’ve also tried:
df.index = pd.MultiIndex.from_tuples([tuple({idx[:3]:idx[-5:]}) for idx in df.index])
which only returns:
idx_1 0
idxa some_text
idxb some_text
idxc some_text
and doesn’t return the dictionary’s “value”. My question is how can I split the index by an arbitrary length and get multiple columns?
You were very close.
You can do:
df.index = pd.MultiIndex.from_tuples([((idx[3:],idx[-5:])) for idx in df.index])
Result:
>>> df.index
MultiIndex(levels=[[u'aa1cx1', u'bb2cx2', u'cc3cx3'], [u'a1cx1', u'b2cx2', u'c3cx3']],
labels=[[0, 1, 2], [0, 1, 2]])
The minimalist approach
df.index = [df.index.str[:4], df.index.str[-5:]]
df
0
index index
idxa a1cx1 some_text
idxb b2cx2 some_text
idxc c3cx3 some_text
On the other hand, if there IS a delimiter to split on (to help others):
newIndex = pd.MultiIndex.from_arrays(zip(*df.index.str.split(delim)))