start index at 1 for Pandas DataFrame
Question:
I need the index to start at 1 rather than 0 when writing a Pandas DataFrame to CSV.
Here’s an example:
In [1]: import pandas as pd
In [2]: result = pd.DataFrame({'Count': [83, 19, 20]})
In [3]: result.to_csv('result.csv', index_label='Event_id')
Which produces the following output:
In [4]: !cat result.csv
Event_id,Count
0,83
1,19
2,20
But my desired output is this:
In [5]: !cat result2.csv
Event_id,Count
1,83
2,19
3,20
I realize that this could be done by adding a sequence of integers shifted by 1 as a column to my data frame, but I’m new to Pandas and I’m wondering if a cleaner way exists.
Answers:
Just set the index before writing to CSV.
df.index = np.arange(1, len(df) + 1)
And then write it normally.
Index is an object, and default index starts from 0
:
>>> result.index
Int64Index([0, 1, 2], dtype=int64)
You can shift this index by 1
with
>>> result.index += 1
>>> result.index
Int64Index([1, 2, 3], dtype=int64)
source: In Python pandas, start row index from 1 instead of zero without creating additional column
Working example:
import pandas as pdas
dframe = pdas.read_csv(open(input_file))
dframe.index = dframe.index + 1
Another way in one line:
df.shift()[1:]
This worked for me
df.index = np.arange(1, len(df)+1)
You can use this one:
import pandas as pd
result = pd.DataFrame({'Count': [83, 19, 20]})
result.index += 1
print(result)
or this one, by getting the help of numpy
library like this:
import pandas as pd
import numpy as np
result = pd.DataFrame({'Count': [83, 19, 20]})
result.index = np.arange(1, len(result)+1)
print(result)
np.arange
will create a numpy array and return values within a given interval which is (1, len(result)+1)
and finally you will assign that array to result.index
.
Fork from the original answer, giving some cents:
- if I’m not mistaken, starting from version 0.23, index object is
RangeIndex
type
From the official doc:
RangeIndex
is a memory-saving special case of Int64Index
limited to representing monotonic ranges. Using RangeIndex
may in some instances improve computing speed.
In case of a huge index range, that makes sense, using the representation of the index, instead of defining the whole index at once (saving memory).
Therefore, an example (using Series, but it applies to DataFrame also):
>>> import pandas as pd
>>>
>>> countries = ['China', 'India', 'USA']
>>> ds = pd.Series(countries)
>>>
>>>
>>> type(ds.index)
<class 'pandas.core.indexes.range.RangeIndex'>
>>> ds.index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> ds.index += 1
>>>
>>> ds.index
RangeIndex(start=1, stop=4, step=1)
>>>
>>> ds
1 China
2 India
3 USA
dtype: object
>>>
As you can see, the increment of the index
object, changes the start
and stop
parameters.
use this
df.index = np.arange(1, len(df)+1)
In my opinion best practice is to set the index with a RangeIndex
import pandas as pd
result = pd.DataFrame(
{'Count': [83, 19, 20]},
index=pd.RangeIndex(start=1, stop=4, name='index')
)
>>> result
Count
index
1 83
2 19
3 20
I prefer this, because you can define the range and a possible step
and a name
for the index in one line.
This adds a column that accomplishes what you want
df.insert(0,"Column Name", np.arange(1,len(df)+1))
Add ".shift()[1:]" while creating a data frame
data = pd.read_csv(r"C:Usersuserpathdata.csv").shift()[1:]
Following on from TomAugspurger’s answer, we could use list comprehension rather than np.arrange()
, which removes the requirement for importing the module: numpy
. You can use the following instead:
df.index = [i+1 for i in range(len(df))]
I need the index to start at 1 rather than 0 when writing a Pandas DataFrame to CSV.
Here’s an example:
In [1]: import pandas as pd
In [2]: result = pd.DataFrame({'Count': [83, 19, 20]})
In [3]: result.to_csv('result.csv', index_label='Event_id')
Which produces the following output:
In [4]: !cat result.csv
Event_id,Count
0,83
1,19
2,20
But my desired output is this:
In [5]: !cat result2.csv
Event_id,Count
1,83
2,19
3,20
I realize that this could be done by adding a sequence of integers shifted by 1 as a column to my data frame, but I’m new to Pandas and I’m wondering if a cleaner way exists.
Just set the index before writing to CSV.
df.index = np.arange(1, len(df) + 1)
And then write it normally.
Index is an object, and default index starts from 0
:
>>> result.index
Int64Index([0, 1, 2], dtype=int64)
You can shift this index by 1
with
>>> result.index += 1
>>> result.index
Int64Index([1, 2, 3], dtype=int64)
source: In Python pandas, start row index from 1 instead of zero without creating additional column
Working example:
import pandas as pdas
dframe = pdas.read_csv(open(input_file))
dframe.index = dframe.index + 1
Another way in one line:
df.shift()[1:]
This worked for me
df.index = np.arange(1, len(df)+1)
You can use this one:
import pandas as pd
result = pd.DataFrame({'Count': [83, 19, 20]})
result.index += 1
print(result)
or this one, by getting the help of numpy
library like this:
import pandas as pd
import numpy as np
result = pd.DataFrame({'Count': [83, 19, 20]})
result.index = np.arange(1, len(result)+1)
print(result)
np.arange
will create a numpy array and return values within a given interval which is (1, len(result)+1)
and finally you will assign that array to result.index
.
Fork from the original answer, giving some cents:
- if I’m not mistaken, starting from version 0.23, index object is
RangeIndex
type
From the official doc:
RangeIndex
is a memory-saving special case ofInt64Index
limited to representing monotonic ranges. UsingRangeIndex
may in some instances improve computing speed.
In case of a huge index range, that makes sense, using the representation of the index, instead of defining the whole index at once (saving memory).
Therefore, an example (using Series, but it applies to DataFrame also):
>>> import pandas as pd
>>>
>>> countries = ['China', 'India', 'USA']
>>> ds = pd.Series(countries)
>>>
>>>
>>> type(ds.index)
<class 'pandas.core.indexes.range.RangeIndex'>
>>> ds.index
RangeIndex(start=0, stop=3, step=1)
>>>
>>> ds.index += 1
>>>
>>> ds.index
RangeIndex(start=1, stop=4, step=1)
>>>
>>> ds
1 China
2 India
3 USA
dtype: object
>>>
As you can see, the increment of the index
object, changes the start
and stop
parameters.
use this
df.index = np.arange(1, len(df)+1)
In my opinion best practice is to set the index with a RangeIndex
import pandas as pd
result = pd.DataFrame(
{'Count': [83, 19, 20]},
index=pd.RangeIndex(start=1, stop=4, name='index')
)
>>> result
Count
index
1 83
2 19
3 20
I prefer this, because you can define the range and a possible step
and a name
for the index in one line.
This adds a column that accomplishes what you want
df.insert(0,"Column Name", np.arange(1,len(df)+1))
Add ".shift()[1:]" while creating a data frame
data = pd.read_csv(r"C:Usersuserpathdata.csv").shift()[1:]
Following on from TomAugspurger’s answer, we could use list comprehension rather than np.arrange()
, which removes the requirement for importing the module: numpy
. You can use the following instead:
df.index = [i+1 for i in range(len(df))]