dataframe, set index from list
Question:
Is it possible when creating a dataframe from a list, to set the index as one of the values?
import pandas as pd
tmp = [['a', 'a1'], ['b',' b1']]
df = pd.DataFrame(tmp, columns=["First", "Second"])
First Second
0 a a1
1 b b1
And how I’d like it to look:
First Second
a a a1
b b b1
Answers:
>>> pd.DataFrame(tmp, columns=["First", "Second"]).set_index('First', drop=False)
First Second
First
a a a1
b b b1
If you don’t want index name:
df = pd.DataFrame(tmp, columns=["First", "Second"], index=[i[0] for i in tmp])
Result:
First Second
a a a1
b b b1
Change it to list before assigning it to index
df.index = list(df["First"])
set_axis
To set arbitrary values as the index, best practice is to use set_axis
:
df = df.set_axis(['idx1', 'idx2'])
# First Second
# idx1 a a1
# idx2 b b1
set_index
(list vs array)
It’s also possible to pass arbitrary values to set_index
, but note the difference between passing a list vs array:
-
list — set_index
assigns these columns as the index:
df.set_index(['First', 'First'])
# Second
# First First
# a a a1
# b b b1
-
array (Series/Index/ndarray) — set_index
assigns these values as the index:
df = df.set_index(pd.Series(['First', 'First']))
# First Second
# First a a1
# First b b1
Note that passing arrays to set_index
is very contentious among the devs and may even get deprecated.
Why not just modify df.index
directly?
Directly modifying attributes is fine and is used often, but using methods has its advantages:
-
Methods provide better error checking, e.g.:
df = df.set_axis(['idx1', 'idx2', 'idx3'])
# ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
df.index = ['idx1', 'idx2', 'idx3']
# No error despite length mismatch
-
Methods can be chained, e.g.:
df.some_method().set_axis(['idx1', 'idx2']).another_method()
import pandas as pd
tmp = [['a', 'a1'], ['b',' b1']]
df = pd.DataFrame(tmp, columns=["First", "Second"]).set_axis([tmp[0][0],tmp[1][0]])
df
Is it possible when creating a dataframe from a list, to set the index as one of the values?
import pandas as pd
tmp = [['a', 'a1'], ['b',' b1']]
df = pd.DataFrame(tmp, columns=["First", "Second"])
First Second
0 a a1
1 b b1
And how I’d like it to look:
First Second
a a a1
b b b1
>>> pd.DataFrame(tmp, columns=["First", "Second"]).set_index('First', drop=False)
First Second
First
a a a1
b b b1
If you don’t want index name:
df = pd.DataFrame(tmp, columns=["First", "Second"], index=[i[0] for i in tmp])
Result:
First Second
a a a1
b b b1
Change it to list before assigning it to index
df.index = list(df["First"])
set_axis
To set arbitrary values as the index, best practice is to use set_axis
:
df = df.set_axis(['idx1', 'idx2'])
# First Second
# idx1 a a1
# idx2 b b1
set_index
(list vs array)
It’s also possible to pass arbitrary values to set_index
, but note the difference between passing a list vs array:
-
list —
set_index
assigns these columns as the index:df.set_index(['First', 'First']) # Second # First First # a a a1 # b b b1
-
array (Series/Index/ndarray) —
set_index
assigns these values as the index:df = df.set_index(pd.Series(['First', 'First'])) # First Second # First a a1 # First b b1
Note that passing arrays to
set_index
is very contentious among the devs and may even get deprecated.
Why not just modify df.index
directly?
Directly modifying attributes is fine and is used often, but using methods has its advantages:
-
Methods provide better error checking, e.g.:
df = df.set_axis(['idx1', 'idx2', 'idx3']) # ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
df.index = ['idx1', 'idx2', 'idx3'] # No error despite length mismatch
-
Methods can be chained, e.g.:
df.some_method().set_axis(['idx1', 'idx2']).another_method()
import pandas as pd
tmp = [['a', 'a1'], ['b',' b1']]
df = pd.DataFrame(tmp, columns=["First", "Second"]).set_axis([tmp[0][0],tmp[1][0]])
df