x axis gets transformed to floats
Question:
I am trying to plot my data grouped by year, and for each year, i want to count the number of users. Below, i just transformed the date column from float to integer.
If you see the x-axis, my year ticker seems to have become a float and the each ticker is 0.5 tick apart.
How do i make this purely an integer?
Changing the groupby has the same result:
ticks are still 2 spaces apart after converting the year column to a string format
df['year'] = df['year'].astype(str)
Answers:
import matplotlib.pyplot as plt
# Use min and max to get the range of years to use in axis ticks
year_min = df['year'].min()
year_max = df['year'].max()
df['year'] = df['year'].astype(str) # Prevents conversion to float
plt.xticks(range(year_min, year_max, 1)) # Sets plot ticks to years within range
Hope this helps!
The expectation that using integer data will lead a matplotlib axis to only show integers is not justified. At the end, each axis is a numeric float axis.
The ticks and labels are determined by locators and formatters. And matplotlib does not know that you want to plot only integers.
Some possible solutions:
Tell the default locator to use integers
The default locator is a AutoLocator
, which accepts an attribute integer
. So you may set this attribute to True
:
ax.locator_params(integer=True)
Example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
ax = data.plot(x="year",y="count")
ax.locator_params(integer=True)
plt.show()
Using a fixed locator
You may just tick only the years present in the dataframe by using ax.set_ticks()
.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
data.plot(x="year",y="count")
plt.gca().set_xticks(data["year"].unique())
plt.show()
Convert year to date
You may convert the year column to a date. For dates much nicer ticklabeling takes place automatically.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
data["year"] = pd.to_datetime(data["year"].astype(str), format="%Y")
ax = data.plot(x="year",y="count")
plt.show()
In all cases you would get something like this:
a solution that worked for me was to first convert the column to int and in a second step again to a string:
df['year'].astype(int)
df['year'].astype(str)
This might be more or less a "quick and dirty" workaround for the usage of a locator.
I am trying to plot my data grouped by year, and for each year, i want to count the number of users. Below, i just transformed the date column from float to integer.
If you see the x-axis, my year ticker seems to have become a float and the each ticker is 0.5 tick apart.
How do i make this purely an integer?
Changing the groupby has the same result:
ticks are still 2 spaces apart after converting the year column to a string format
df['year'] = df['year'].astype(str)
import matplotlib.pyplot as plt
# Use min and max to get the range of years to use in axis ticks
year_min = df['year'].min()
year_max = df['year'].max()
df['year'] = df['year'].astype(str) # Prevents conversion to float
plt.xticks(range(year_min, year_max, 1)) # Sets plot ticks to years within range
Hope this helps!
The expectation that using integer data will lead a matplotlib axis to only show integers is not justified. At the end, each axis is a numeric float axis.
The ticks and labels are determined by locators and formatters. And matplotlib does not know that you want to plot only integers.
Some possible solutions:
Tell the default locator to use integers
The default locator is a AutoLocator
, which accepts an attribute integer
. So you may set this attribute to True
:
ax.locator_params(integer=True)
Example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
ax = data.plot(x="year",y="count")
ax.locator_params(integer=True)
plt.show()
Using a fixed locator
You may just tick only the years present in the dataframe by using ax.set_ticks()
.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
data.plot(x="year",y="count")
plt.gca().set_xticks(data["year"].unique())
plt.show()
Convert year to date
You may convert the year column to a date. For dates much nicer ticklabeling takes place automatically.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({"year" : [2010,2011,2012,2013,2014],
"count" :[1000,2200,3890,5600,8000] })
data["year"] = pd.to_datetime(data["year"].astype(str), format="%Y")
ax = data.plot(x="year",y="count")
plt.show()
In all cases you would get something like this:
a solution that worked for me was to first convert the column to int and in a second step again to a string:
df['year'].astype(int)
df['year'].astype(str)
This might be more or less a "quick and dirty" workaround for the usage of a locator.