create bokeh vbar but error " (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: value"

Question:

I’m trying to create vbar graph like from bokeh documentation on graph :mean MPG by #cylinders and manufacturer (https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#userguide-categorical), but keep getting error (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: value".

Here is my csv file:

 id,name,value,vendorname
 1, contract abc, "2,000,500.00", company x
 2, contract bcd, "1,300,500.00", company y
 3, contract cde, "1,344,000.00", company x
 4, contract def, "400,000.00", company z
 5, contract efg, "566,000.00", company s

and so on….

here is the code:

 from bokeh.io import show, output_file
 from bokeh.palettes import Viridis256
 from bokeh.transform import factor_cmap
 import pandas as pd
 from bokeh.core.properties import value
 from bokeh.models import FactorRange, ColumnDataSource
 from bokeh.palettes import Spectral5
 from bokeh.plotting import figure, show
 from bokeh.embed import components  

 df = pd.read_csv('contract.csv')
 group = df.groupby(['vendorname', 'name'])
 index_cmap = factor_cmap('vendorname_name', palette=Viridis256, factors=sorted(df.vendorname.unique()), end=1)
 p = figure(plot_width=1000, plot_height=1500, title="Value Contract by # Contract and Vendor", x_range=group, toolbar_location=None, 
       tooltips=[("Value", "@value"), ("vendorname, name", "@vendorname_name")])
 p.vbar(x='vendorname_name', top='value', width=1, source=group, line_color="white", fill_color=index_cmap, )
 
 p.y_range.start = 0
 p.x_range.range_padding = 0.05
 p.xgrid.grid_line_color = None
 p.xaxis.axis_label = "Contract grouped by # Vendor"
 p.xaxis.major_label_orientation = 1.2
 p.outline_line_color = None
 output_file("contract.html")
 show(p)
Asked By: ji.sein

||

Answers:

When you perform a groupby on a dataframe, there are only aggregates for the group available:

In [3]: source = ColumnDataSource(group)

In [4]: source.data
Out[4]:
{'id_count': array([1, 1, 1], dtype=object),
 'id_unique': array([1, 1, 1], dtype=object),
 'id_top': array([' "2', ' "1', ' "1'], dtype=object),
 'id_freq': array([1, 1, 1], dtype=object),
 'value_count': array([1, 1, 1], dtype=object),
 'value_unique': array([1, 1, 1], dtype=object),
 'value_top': array(['500.00"', '000.00"', '500.00"'], dtype=object),
 'value_freq': array([1, 1, 1], dtype=object),
 'vendorname_name': array([(' company x', '000'), (' company x', '344'),
        (' company y', '300')], dtype=object)}

But notice these values are junk, because your read_csv is also not working. In your data, the value column is an array of strings of poorly formatted numbers. The commas in the numbers confuse pandas with the basic code you have above. You need to follow the advice in https://stackoverflow.com/a/22137890/3406693 to read in that column as actual numbers. However, your CSV is malformed, with extra space around the commas, this also confuses pandas. I was not able to get pandas to read in the dataframe correctly until i fixed up the CSV to remove the spaces:

id,name,value,vendorname
1,contract abc,"2,000,500.00",company x
2,contract bcd,"1,300,500.00",company y
3,contract cde,"1,344,000.00",company x
4,contract def,"400,000.00",company z
5,contract efg,"566,000.00",company s

Then, this read_csv command

df = pd.read_csv('contract.csv', thousands=",", quotechar='"', quoting=1)

Yields an intelligible dataframe:

In [3]: df
Out[3]:
   id          name      value vendorname
0   1  contract abc  2000500.0  company x
1   2  contract bcd  1300500.0  company y
2   3  contract cde  1344000.0  company x
3   4  contract def   400000.0  company z
4   5  contract efg   566000.0  company s

It’s possible you could get around having to fix up the CSV by telling Pandas explicitly what the column types are.

Anyway after all this you can plot using value_mean from the group:

 p.vbar(x='vendorname_name', top='value_meamn', ...) # use value_mean

Which yields:

enter image description here

Answered By: bigreddot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.