quickest way to swap index with values
Question:
consider the pd.Series
s
s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s
A a
B b
C c
D d
E e
F f
G g
H h
I i
J j
dtype: object
What is the quickest way to swap index and values and get the following
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
Answers:
One posible solution is swap keys and values by:
s1 = pd.Series(dict((v,k) for k,v in s.iteritems()))
print (s1)
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
Another the fastest:
print (pd.Series(s.index.values, index=s ))
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
Timings:
In [63]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 146 µs per loop
In [71]: %timeit (pd.Series(s.index.values, index=s ))
The slowest run took 7.42 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 102 µs per loop
If length of Series
is 1M
:
s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s = pd.concat([s]*1000000).reset_index(drop=True)
print (s)
In [72]: %timeit (pd.Series(s.index, index=s ))
10000 loops, best of 3: 106 µs per loop
In [229]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
1 loop, best of 3: 1.77 s per loop
In [230]: %timeit (pd.Series(s.index, index=s ))
10 loops, best of 3: 130 ms per loop
In [231]: %timeit (pd.Series(s.index.values, index=s ))
10 loops, best of 3: 26.5 ms per loop
a2b = my_df
b2a = pd.Series(data = a2b.index, index = a2b.values)
If the series and index have names and you want to swap them as well:
srs_1 = pd.Series(list('ABC'), list('abc'), name='upper').rename_axis('lower')
# lower
# a A
# b B
# c C
# Name: upper, dtype: object
srs_2 = pd.Series(srs_1.index, index=srs_1)
# upper
# A a
# B b
# C c
# Name: lower, dtype: object
consider the pd.Series
s
s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s
A a
B b
C c
D d
E e
F f
G g
H h
I i
J j
dtype: object
What is the quickest way to swap index and values and get the following
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
One posible solution is swap keys and values by:
s1 = pd.Series(dict((v,k) for k,v in s.iteritems()))
print (s1)
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
Another the fastest:
print (pd.Series(s.index.values, index=s ))
a A
b B
c C
d D
e E
f F
g G
h H
i I
j J
dtype: object
Timings:
In [63]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 146 µs per loop
In [71]: %timeit (pd.Series(s.index.values, index=s ))
The slowest run took 7.42 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 102 µs per loop
If length of Series
is 1M
:
s = pd.Series(list('abcdefghij'), list('ABCDEFGHIJ'))
s = pd.concat([s]*1000000).reset_index(drop=True)
print (s)
In [72]: %timeit (pd.Series(s.index, index=s ))
10000 loops, best of 3: 106 µs per loop
In [229]: %timeit pd.Series(dict((v,k) for k,v in s.iteritems()))
1 loop, best of 3: 1.77 s per loop
In [230]: %timeit (pd.Series(s.index, index=s ))
10 loops, best of 3: 130 ms per loop
In [231]: %timeit (pd.Series(s.index.values, index=s ))
10 loops, best of 3: 26.5 ms per loop
a2b = my_df
b2a = pd.Series(data = a2b.index, index = a2b.values)
If the series and index have names and you want to swap them as well:
srs_1 = pd.Series(list('ABC'), list('abc'), name='upper').rename_axis('lower')
# lower
# a A
# b B
# c C
# Name: upper, dtype: object
srs_2 = pd.Series(srs_1.index, index=srs_1)
# upper
# A a
# B b
# C c
# Name: lower, dtype: object