Fixing Python Bokeh Datetime Import Formatting

Question:

I’m exporting datasets from equipment logging software and am trying to use Bokeh (Python) as an interactive visual aide during analysis. Everything is working fine, except for the date/time which refuses to be imported in its current format (24/08/2022 01:40:32). I have data for every second for at least a month’s worth (So dropping the date wouldn’t work).

I’ve been playing about with Bokeh for a while now by simply ignoring the date/time and replacing it by a consecutive series (1,2,3…) and plotting it as such, but the time has come to fix my temporary solution and I just cant seem to figure out how to define the formatting or how to convert it. (Bokeh documentation)

Example code:

from bokeh.io import output_file, show # OUTPUT_FILE FOR EXPORT (NOT USED)
from bokeh.layouts import gridplot # MULTIPLOT
from bokeh.plotting import figure
from bokeh.palettes import Spectral4 # COLOUR PALETTE
import pandas as pd
import external_tags as tags # TAG DEFINITIONS USED FOR CSV IMPORTING

# import csv
df = pd.read_csv("AUGUST_PS_1MIN.csv")      # testset with 1 min intervals

# TOOLS
TOOLS = "box_zoom, box_select, crosshair, reset, hover"
Figure_Title = "TESTING AUTOMATING IMPORT WITHOUT MANUAL TWEAKING"

line_width = 1.5
alpha = 1
height = 500
x = df[tags.Date_Time_UTC[0]]

# These just redirect to my imported tag definitions TAG = ["column name", "friendly name"]
fig1a = tags.PS_MH_LOAD
fig1b = tags.PS_MH_WINCH_PWR
fig1c = tags.PS_PWR_MSB1
fig1d = tags.PS_PWR_MSB2

# FIGURE A (TOP LEFT)
s1 = figure(sizing_mode="stretch_width", height=height, title="LOAD", tools=TOOLS, x_axis_type='datetime')
s1.line(x, df[fig1a[0]], color=Spectral4[0], alpha=alpha, line_width=line_width, legend_label=fig1a[1])
s1.line(x, df[fig1b[0]], color=Spectral4[1], alpha=alpha, line_width=line_width, legend_label=fig1b[1])
s1.line(x, df[fig1c[0]], color=Spectral4[2], alpha=alpha, line_width=line_width, legend_label=fig1c[1])
s1.line(x, df[fig1d[0]], color=Spectral4[3], alpha=alpha, line_width=line_width, legend_label=fig1d[1])

#### some repetitive code has been omitted here for brevity

# Define the grid
# p = gridplot([[s1, s2],[s3, s4]])

# show the results
show(s1)

Example of a dataset

2022-08-26 04:03:52.000,0,30,30,894.70751953125,-63.785041809082,-0.497732371091843,2.14258599281311,0.0307948496192694,355.496154785156,0,0,0,2.38387619901914E-05,0,102.844131469727,0.040388036519289,0.703329265117645,0,0,0.0244150012731552,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106834815815091
2022-08-26 04:03:53.000,0,30,30,895.21142578125,-63.6380615234375,-0.550026297569275,2.14223098754883,0.0307948496192694,355.496154785156,0,0,0,1.45306594276917E-05,0,102.827079772949,0.0610153041779995,0.733967423439026,0,0,0.0245136469602585,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106870988383889
2022-08-26 04:03:54.000,0,30,30,895.726196289063,-63.6465072631836,-0.533430516719818,2.1423876285553,0.0307948496192694,355.496154785156,0,0,0,8.71746851771604E-06,0,102.834602355957,0.0816425681114197,0.764605581760406,0,0,0.0246122926473618,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106907160952687
2022-08-26 04:03:55.000,0,30,30,896.1552734375,-63.0882987976074,-0.534056782722473,2.14190745353699,0.0307948496192694,355.496154785156,0,0,0,5.21722904522903E-06,0,102.811561584473,0.10226983577013,0.795243740081787,0,0,0.024710938334465,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106943333521485
2022-08-26 04:03:56.000,0,30,30,895.727600097656,-63.0707931518555,-0.515181064605713,2.14224052429199,0.0307948496192694,355.496154785156,0,0,0,3.12787688017124E-06,0,102.827545166016,0.122897103428841,0.825881898403168,0,0,0.0248095821589231,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0106979506090283
2022-08-26 04:03:57.000,0,30,30,895.690246582031,-63.511173248291,-0.49309903383255,2.14326453208923,0.0307948496192694,355.496154785156,0,0,0,7.10703216100228E-06,0,102.876693725586,0.143524378538132,0.856520056724548,0,0,0.0249082278460264,0,0,0,0,0,0,0,0,0,0,1455.31506347656,0.0107015678659081

Any help would be appreciated. 🙂

tl;dr: how do I import/use the date and time in Bokeh when the source is formatted as follows: "2022-08-26 04:03:57"

UPDATE
I got it to be recognized as datetime! Still some kinks and formatting to figure out, but this is what did the trick for me:

x = df[tags.Date_Time_UTC[0]]
x = pd.to_datetime(x)

I also manually removed the trailing decimals from the seconds.

  2022-08-26 04:03:56.000 ->   2022-08-26 04:03:56

Further answers and tips are, of course, welcome. But I can continue for now!

Thanks for the help!

Asked By: MariTort

||

Answers:

bokeh version 2.4.3 seems to parse your second example date: bokeh.core.properties.Datetime().is_valid("2022-08-26 04:03:57") returns True. However, it doesn’t think your first example, "24/08/2022 01:40:32" is valid. This answer might help with that one, though? Using Bokeh datetime with Pandas

Answered By: DraftyHat

Because you have imported pandas the easiest way to parse a string to a datetime object is pd.to_datetime(). This function can also parse multiple formats using %f-string notation.

For example

pd.to_datetime('2022-01-01', format='%Y-%m-%d')

and

pd.to_datetime("01/01/2022 00:00:00", format='%d/%m/%Y %H:%M:%S')

will both result in the same datetime object.

If you want to parse a complete column of a pandas DataFrame you could use the .iloc method. Let’s say you want to parse the first column (zero based index).

df.iloc[:,0] = pd.to_datetime(df.iloc[:,0], format="%Y-%m-%d")

should work.

The example below is copied from here and if you want to read the bokeh tutorial, there is one which shows how to enable datetime axes.

Example

import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource

sample={'A':[pd.to_datetime(x, format='%Y-%m') for x in ['2012-01','2012-02','2012-03']],'B':[7,8,9]}
source = ColumnDataSource(sample)
p = figure(width=400, height=400,  x_axis_type='datetime')
p.line(x='A', y='B', source=source, line_width=2)
output_notebook()
show(p)

Output

line plot with datetime x-axis

FYI: The function pd.read_csv() has a argument parse_dates which calls pd.to_datetime while parsing the csv-file. But there are multiple options and the usage depends on the data. So you have to read the documentation because this would make this post really long.

Answered By: mosc9575
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.