Get Bokeh's selection in notebook

Question:

I’d like to select some points on a plot (e.g. from box_select or lasso_select) and retrieve them in a Jupyter notebook for further data exploration. How can I do that?

For instance, in the code below, how to export the selection from Bokeh to the notebook? If I need a Bokeh server, this is fine too (I saw in the docs that I could add “two-way communication” with a server but did not manage to adapt the example to reach my goal).

from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource

output_notebook()

x = [random() for x in range(1000)]
y = [random() for y in range(1000)]

s = ColumnDataSource(data=dict(x=x, y=y))
fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)

show(fig)
# Select on the plot
# Get selection in a ColumnDataSource, or index list, or pandas object, or etc.?

Notes

  • I saw some related questions on SO, but most answers are for outdated versions of Bohek, 0.x or 1.x, I’m looking for an answer for v>=2.
  • I am open for solutions with other visualization libraries like altair, etc.
Asked By: Keldorn

||

Answers:

To select some points on a plot and retrieve them in a Jupyter notebook, you can use a CustomJS callback.

Within the CustomJS callback javascript code, you can access the Jupyter notebook kernel using IPython.notebook.kernel. Then, you can use kernal.execute(python_code) to run Python code and (for example) export data from the javascript call to the Jupyter notebook.

So, a bokeh server is not necessary for two-way communication between the bokeh plot and the Jupyter notebook.

Below, I have extended your example code to include a CustomJS callback that triggers on a selection geometry event in the figure. Whenever a selection is made, the callback runs and exports the indices of the selected data points to a variable within the Jupyter notebook called selected_indices.

To obtain a ColumnDataSource that contains the selected data points, the selected_indices tuple is looped through to create lists of the selected x and y values, which are then passed to a ColumnDataSource constructor.

from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource
from bokeh.models.callbacks import CustomJS

output_notebook()

x = [random() for x in range(1000)]
y = [random() for y in range(1000)]

s = ColumnDataSource(data=dict(x=x, y=y))

fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)

# make a custom javascript callback that exports the indices of the selected points to the Jupyter notebook
callback = CustomJS(args=dict(s=s), 
                    code="""
                         console.log('Running CustomJS callback now.');
                         var indices = s.selected.indices;
                         var kernel = IPython.notebook.kernel;
                         kernel.execute("selected_indices = " + indices)
                         """)

# set the callback to run when a selection geometry event occurs in the figure
fig.js_on_event('selectiongeometry', callback)

show(fig)
# make a selection using a selection tool 

# inspect the selected indices
selected_indices

# use the indices to create lists of the selected values
x_selected, y_selected = [], []
for indice in selected_indices:
    x_val = s.data['x'][indice]
    y_val = s.data['y'][indice]
    x_selected.append(x_val)
    y_selected.append(y_val)
    
# make a column data souce containing the selected values
selected = ColumnDataSource(data=dict(x=x_selected, y=y_selected))

# inspect the selected data
selected.data
Answered By: Steve

If you have a bokeh server running, you can access the selection indices of a datasource via datasource.selection.indices. The following is an example how you would do this (modified from the official Embed a Bokeh Server Into Jupyter example):

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import show, output_notebook

from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

output_notebook()

df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)

def bkapp(doc):

    plot = figure(x_axis_type='datetime', y_range=(0, 25), tools="lasso_select",
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.circle('time', 'temperature', source=source)

    doc.add_root( plot)

show(bkapp)

After you selected something, you could get the selected data as following:

selected_data = df.iloc[source.selected.indices]
print(selected_data)

Which should show you the selected values.

While out of scope for this question, note that there is a disconnect between jupyter notebooks and the interactive nature of bokeh apps: This solution introduces state which is not saved by the jupyter notebook, so restarting it and executing all cells does not give the same results. One way to tackle this would be to persist the selection with pickle:

df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
if os.path.isfile("selection.pickle"):
    with open("selection.pickle", mode="rb") as f:
        source.selected.indices = pickle.load(f)

... # interactive part

with open("selection.pickle", mode="wb") as f:
    pickle.dump(source.selected.indices, f)
Answered By: syntonym

The ColumnDataSource object has an on_change method, with which you can register a Python callback. Thus, the JS callback is not necessary. An example can be found here.

As of @syntonym’s answer, it is also not necessary to embed a bokeh server manually. One can wrap the Bokeh figure inside a Holoviz Panel pane which will manage the server for you. Actually Panel is build on top of Bokeh.

Let’s reuse @syntonym’s example:

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature

import panel as pn

output_notebook() # this is not necessary with panel
display( pn.extension( ) ) # start the panel server

source = ColumnDataSource(
  data = sea_surface_temperature.copy()[:100]
)

plot = figure(x_axis_type='datetime', y_range=(0, 25), tools="lasso_select",
                  y_axis_label='Temperature (Celsius)',
                  title="Sea Surface Temperature at 43.18, -70.43")
plot.circle('time', 'temperature', source=source)

This look quite cumbersome because you have to construct the ColumnDataSource object yourself.
A better way would be to let Holoview handle this for you.

import holoviews as hv
hv.extension('bokeh') # ask holoviews to use Bokeh

pt  = hv.Points( 
  data  = sea_surface_temperature.copy()[:100],  
  kdims = ['time', 'temperature' ] 
)

plot   = hv.render(pt) # holoviews render the Bokeh figure
source = plot.select({'type':ColumnDataSource}) # get the ColumnDataSource object created by holoviews

Finally you can register the callback and display the figure

def selection_cb(attr, old, new):
    """ callback for changed selection """
    print('The',attr,'of selection changed from', old, 'to', new)

source.selected.on_change('indices', selection_cb)

pn.pane.Bokeh( plot ) # show the figure
Answered By: gdlmx
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.