strange year values on X axis

Question:

If I use the vega dataset “disasters” and make a straightforward chart, I get some weird values for year.

In Altair the code is:

import altair as alt
from vega_datasets import data

dis=data.disasters()

alt.Chart(dis).mark_bar().encode(
    x=alt.X('Year:T'),
    y=alt.Y('Deaths'),
    color='Entity'
)

enter image description here

(vega editor link)

Asked By: campo

||

Answers:

Year integer is not a standard time value.

In Vega-Lite you can add "format": {"parse": {"Year": "date: '%Y'"}} to the data block to specify custom date parsing format for the field "year".

See a working spec

In Altair, you can similarly specify format property of a *Data class (e.g., NamedData).

Answered By: kanitw

Adding to @kanitw’s answer: when you convert an integer to a datetime, the integer is treated as nanoseconds since the zero date. You can see this in pandas by executing the following:

>>> pd.to_datetime(dis.Year)
0   1970-01-01 00:00:00.000001900
1   1970-01-01 00:00:00.000001901
2   1970-01-01 00:00:00.000001902
3   1970-01-01 00:00:00.000001903
4   1970-01-01 00:00:00.000001905
Name: Year, dtype: datetime64[ns]

Altair/Vega-Lite uses a similar convention.

If you would like to parse the year as a date when loading the data, and then plot the year with Altair, you can do the following:

import altair as alt
from vega_datasets import data

dis=data.disasters(parse_dates=['Year'])

alt.Chart(dis).mark_bar().encode(
    x=alt.X('year(Year):T'),
    y=alt.Y('Deaths'),
    color='Entity'
)

example chart

First we parse the year column as a date by passing the appropriate pandas.read_csv argument to the loading function, and then use the year timeUnit to extract just the year from the full datetime.

If you are plotting data from a CSV URL rather than a pandas dataframe, Vega-Lite is smart enough to parse the CSV file based on the encoding you specify in the Chart, which means the following will give the same result:

dis=data.disasters.url

alt.Chart(dis).mark_bar().encode(
    x=alt.X('year(Year):T'),
    y=alt.Y('Deaths:Q'),
    color='Entity:N'
)

example chart

Answered By: jakevdp
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.