Why is plotly express so much more performant than plotly graph_objects?

Question:

I’m visualizing a scatterplots with between 400K and 2.5M points. I expectected to need to downsample before visualizing but to see just how much I ran a pilot test with a 400k dataset in plotly express, and the plot popped up quickly, beautifully, and responsively.

In order to make the interractive figure I really need to use plotly.graph_objects, as I need multiple traces with different colorscales, so I made basically the same graph with graph_objects and it wasn’t just slower, it crashed my computer.

I’d really like to downsample as little as possible and I’m surprised by the sheer performance difference between these two approaches so I guess that boils down to my question:

Why is there such a performance difference and is it possible to change layout/figure/whatever parameters in graph_objects so to close the gap?

Here is a snippet to show what I mean by basically the same graph:

graph_objects

        fig = go.Figure()
        fig.add_trace(go.Scatter(x = x_values, y = y_values, opacity = opacity, marker = {
                'size': size,
                'color': community,
                'colorscale': colorscale
            }))

express

        pacmap_map = px.scatter(x = x_values, y = y_values, color_continuous_scale=colorscale,  opacity = opacity, color = community)
        pacmap_map.update_traces(marker = {
                'size': size
            })

I would have expected performance to either be identical or at least in the same ballpark, but express works like a dream and graph_objects crashes the jupyter kernel and whatever IDE it is running from, so a large difference.

Asked By: psychicesp

||

Answers:

Running the following simple example:

import numpy as np
import plotly.graph_objects as go
import plotly.express as px

x = np.linspace(-2, 2, 100000)
y = np.cos(x)

fig = go.Figure(data=[go.Scatter(x=x, y=y)])
fig2 = px.scatter(x=x, y=y)

type(fig.data[0]), type(fig2.data[0])
# out: (plotly.graph_objs._scatter.Scatter, plotly.graph_objs._scattergl.Scattergl)

As you can see, plotly express appears to switch to Scattergl when the number of points is higher than some threshold. Scattergl renders on an html5 canvas, hence it uses the GPU (hence efficiency). Whereas Scatter creates svg objects that get inserted in the current document, consuming muuuuuch more memory.

Answered By: Davide_sd
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.