Power BI Python visual doesn't plot all available datapoints

Question:

I am getting a strange result in Power BI python visual. I am working with the diamonds dataset (sns.load_dataset('diamonds')). I have this code in the python visual editor:

import seaborn as sns
import matplotlib.pyplot as plt
sns.histplot(dataset['carat'], bins = 50)
plt.show()

I am however getting this visual (truncated for most of the values, should be a bell curve, ish with the maximum bar going up to 11,000):

enter image description here

I have tried a seaborn swarmplot and that looks ok so it does not seem to be a data type issue. Dataset size is 53,940 rows, so well below the 150,000 max. Matplotlib plt.hist(dataset['carat']) returns the truncated visual also, so it does not look like a seaborn thing.

Asked By: GivenX

||

Answers:

The Python visual gives you a warning that it will drop duplicates and also supplies the formula it will use for the dataframe you will actually base your plot on:

enter image description here

By adding an index column in Power Query prior to loading the data, and adding both the (non-summarized) index column and the carat column to the visualization, you will avoid this duplication removal.

Here I have used your exact code, but the visual evaluates an incoming data frame with all the rows instead of only distinct carat values:

enter image description here

Answered By: Marcus
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.