how to set scope of data displayed with simple plotly bar graph
Question:
working my through my understanding of plotly/dash. immediately have a problem I can’t find an answer too. I broke my code down into its simplest form to isolate the problem.
There are 10 float values for x, and 10 for y
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
fig = px.bar(x=x,y=y)
fig.show()
this yields : enter image description here
this is obviously wrong. Is feeding it float values that aren’t perfectly sequential in nature causing plotly to misinterpret scope of where to draw it bars? I don’t even know how to phrase my question here. I just want to see 10 bars next to each other, scaled to fit my screen.
Right now it looks like its plotting my float values on a sequential timeline, and since my numbers aren’t sequential i wind up with these gaps.
Answers:
Answer instead of comment to be able to add a plot and sorted data.
When scaling the x values to a reasonable range the plot looks as expected:
import plotly.express as px
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
x_scaled = [(i - 1548)*100 for i in x]
print(f"scaled x:n{x_scaled}")
fig = px.bar(x=x_scaled,y=y)
fig.show()
x_scaled.sort()
print(f"scaled and sorted x: n{x_scaled}")
scaled x:
[35.999999999989996, 34.999999999990905, 31.999999999993634, 30.999999999994543,
29.999999999995453, 25.99999999999909, 25.0, 17.000000000007276, 11.999999999989086,
2.9999999999972715]
scaled and sorted x:
[2.9999999999972715, 11.999999999989086, 17.000000000007276, 25.0, 25.99999999999909,
29.999999999995453, 30.999999999994543, 31.999999999993634, 34.999999999990905,
35.999999999989996]
To sort x aside with y using pandas dataframe:
import pandas as pd
zipped = list(zip(x, y))
df = pd.DataFrame(zipped, columns=['X', 'Y'])
df_sorted = df.sort_values(by='X')
df_sorted
X Y
9 1548.03 0.2000
8 1548.12 29.7094
7 1548.17 0.0200
6 1548.25 4.3028
5 1548.26 3.6085
4 1548.30 18.9993
3 1548.31 7.5905
2 1548.32 4.5658
1 1548.35 2.7585
0 1548.36 36.9467
If that’s still not what you expect and the answer from u1234x1234 doesn’t fit as well you may want to describe more what you expect.
Another interpretation that may meet "I just want to see 10 bars next to each other, scaled to fit my screen." including some background:
Converting the x values to string before plotting gives:
import plotly.express as px
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
x_string = list(map(str, x))
print(x_string)
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
fig = px.bar(x=x_string,y=y)
fig.show()
['1548.36', '1548.35', '1548.32', '1548.31', '1548.3', '1548.26', '1548.25',
'1548.17', '1548.12', '1548.03']
By the conversion to string the x axis list contains the same information concerning plotting as e.g. ['Joe','Jane','Julia','Alfons',...]
, so the numerical information is "removed".
And without the numerical information the common (and sensible) way to plot is just to have the strings in their order with one after the other on the x axis.
You then can even mix string "numbers" with "normal" strings, try the following as an x axis:
x = ['1548.36', 'Julia', '1548.32', 'Zaphod', 'Joe', '1548.26', '1548.25',
'1548.17', '1548.12', '1548.03']
Since the numerical information is "removed" by the string conversion you have to take care of any sorting in advance if that’s intended.
If you want the x axis to still be sorted acc. the intial numbers sequence, e.g. pandas dataframe can be used for that:
import plotly.express as px
import pandas as pd
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.i7094, 0.2]
zipped = list(zip(x, y))
df = pd.DataFrame(zipped, columns=['X', 'Y'])
df_sorted = df.sort_values(by='X')
df_sorted['X'] = df_sorted['X'].astype(str)
fig = px.bar(df_sorted, x='X', y='Y')
fig.show()
Maybe another angle for the explanation, let’s plot a sine function:
However note that it’s actually not a continous sine function but a sampled one with just the dots connected by a line.
Code and scatter plot:
import numpy as np
import random
import plotly.express as px
import plotly.graph_objects as go
x = np.linspace(0, 10, 100)
y = np.sin(x)
fig = px.line(x=x,y=y)
fig.show()
fig = px.scatter(x=x,y=y)
fig.show()
Note: Every plot is sampled. You may think of it like that:
- To "plot" there needs to be a color dot placed.
- A continuous function has infinite points. And if you would actually plot such a continuous function that would need infinite dots being placed.
- For continuous line looking plots the samples are just connected with the dpi and the human eyes and brain it looks continuous when the sampling rate is high enough.
To mess with the sampling let’s remove some of those sampling points and plot again:
random.seed(1424836)
random_index = random.sample(range(0,100), 75)
x_reduced = np.delete(x, random_index)
y_reduced = np.delete(y, random_index)
fig1 = px.scatter(x=x_reduced,y=y_reduced)
fig2 = px.line(x=x_reduced,y=y_reduced)
fig = go.Figure(data = fig1.data + fig2.data)
fig.show()
Due to the ‘distribution’ that still looks kinda like a sine function.
But when we now do the string conversion and plot again (so no distribution but just one "number sting" after the next one) it looks like something different:
# number to strings, float with 2 digits after the comma
# (resolution doesn't matter, but the x axis lables are more readable)
x_reduced_string = ["%.2f" % i for i in x_reduced]
print(x_reduced_string)
fig1 = px.scatter(x=x_reduced_string,y=y_reduced)
fig2 = px.line(x=x_reduced_string,y=y_reduced)
fig = go.Figure(data = fig1.data + fig2.data)
fig.show()
['0.20', '0.30', '0.81', '1.21', '1.31', '1.52', '1.62', '2.42', '2.63', '3.33', '4.55', '5.45', '5.76', '5.86', '5.96', '6.67', '7.88', '8.38', '8.48', '8.59', '8.69', '9.09', '9.39', '9.49', '9.80']
The same effect shown in a series of bar plots of the above data:
So keep that in mind when generating the bar plot as you intended in your question.
I’d recommend when you do this it would be good to explicitely highlight or mention that because when others see numbers on the x axis they probably expect the default plot.
working my through my understanding of plotly/dash. immediately have a problem I can’t find an answer too. I broke my code down into its simplest form to isolate the problem.
There are 10 float values for x, and 10 for y
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
fig = px.bar(x=x,y=y)
fig.show()
this yields : enter image description here
this is obviously wrong. Is feeding it float values that aren’t perfectly sequential in nature causing plotly to misinterpret scope of where to draw it bars? I don’t even know how to phrase my question here. I just want to see 10 bars next to each other, scaled to fit my screen.
Right now it looks like its plotting my float values on a sequential timeline, and since my numbers aren’t sequential i wind up with these gaps.
Answer instead of comment to be able to add a plot and sorted data.
When scaling the x values to a reasonable range the plot looks as expected:
import plotly.express as px
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
x_scaled = [(i - 1548)*100 for i in x]
print(f"scaled x:n{x_scaled}")
fig = px.bar(x=x_scaled,y=y)
fig.show()
x_scaled.sort()
print(f"scaled and sorted x: n{x_scaled}")
scaled x:
[35.999999999989996, 34.999999999990905, 31.999999999993634, 30.999999999994543,
29.999999999995453, 25.99999999999909, 25.0, 17.000000000007276, 11.999999999989086,
2.9999999999972715]
scaled and sorted x:
[2.9999999999972715, 11.999999999989086, 17.000000000007276, 25.0, 25.99999999999909,
29.999999999995453, 30.999999999994543, 31.999999999993634, 34.999999999990905,
35.999999999989996]
To sort x aside with y using pandas dataframe:
import pandas as pd
zipped = list(zip(x, y))
df = pd.DataFrame(zipped, columns=['X', 'Y'])
df_sorted = df.sort_values(by='X')
df_sorted
X Y
9 1548.03 0.2000
8 1548.12 29.7094
7 1548.17 0.0200
6 1548.25 4.3028
5 1548.26 3.6085
4 1548.30 18.9993
3 1548.31 7.5905
2 1548.32 4.5658
1 1548.35 2.7585
0 1548.36 36.9467
If that’s still not what you expect and the answer from u1234x1234 doesn’t fit as well you may want to describe more what you expect.
Another interpretation that may meet "I just want to see 10 bars next to each other, scaled to fit my screen." including some background:
Converting the x values to string before plotting gives:
import plotly.express as px
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
x_string = list(map(str, x))
print(x_string)
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.7094, 0.2]
fig = px.bar(x=x_string,y=y)
fig.show()
['1548.36', '1548.35', '1548.32', '1548.31', '1548.3', '1548.26', '1548.25',
'1548.17', '1548.12', '1548.03']
By the conversion to string the x axis list contains the same information concerning plotting as e.g. ['Joe','Jane','Julia','Alfons',...]
, so the numerical information is "removed".
And without the numerical information the common (and sensible) way to plot is just to have the strings in their order with one after the other on the x axis.
You then can even mix string "numbers" with "normal" strings, try the following as an x axis:
x = ['1548.36', 'Julia', '1548.32', 'Zaphod', 'Joe', '1548.26', '1548.25',
'1548.17', '1548.12', '1548.03']
Since the numerical information is "removed" by the string conversion you have to take care of any sorting in advance if that’s intended.
If you want the x axis to still be sorted acc. the intial numbers sequence, e.g. pandas dataframe can be used for that:
import plotly.express as px
import pandas as pd
x = [1548.36, 1548.35, 1548.32, 1548.31, 1548.3, 1548.26, 1548.25, 1548.17, 1548.12,
1548.03]
y = [36.9467, 2.7585, 4.5658, 7.5905, 18.9993, 3.6085, 4.3028, 0.02, 29.i7094, 0.2]
zipped = list(zip(x, y))
df = pd.DataFrame(zipped, columns=['X', 'Y'])
df_sorted = df.sort_values(by='X')
df_sorted['X'] = df_sorted['X'].astype(str)
fig = px.bar(df_sorted, x='X', y='Y')
fig.show()
Maybe another angle for the explanation, let’s plot a sine function:
However note that it’s actually not a continous sine function but a sampled one with just the dots connected by a line.
Code and scatter plot:
import numpy as np
import random
import plotly.express as px
import plotly.graph_objects as go
x = np.linspace(0, 10, 100)
y = np.sin(x)
fig = px.line(x=x,y=y)
fig.show()
fig = px.scatter(x=x,y=y)
fig.show()
Note: Every plot is sampled. You may think of it like that:
- To "plot" there needs to be a color dot placed.
- A continuous function has infinite points. And if you would actually plot such a continuous function that would need infinite dots being placed.
- For continuous line looking plots the samples are just connected with the dpi and the human eyes and brain it looks continuous when the sampling rate is high enough.
To mess with the sampling let’s remove some of those sampling points and plot again:
random.seed(1424836)
random_index = random.sample(range(0,100), 75)
x_reduced = np.delete(x, random_index)
y_reduced = np.delete(y, random_index)
fig1 = px.scatter(x=x_reduced,y=y_reduced)
fig2 = px.line(x=x_reduced,y=y_reduced)
fig = go.Figure(data = fig1.data + fig2.data)
fig.show()
Due to the ‘distribution’ that still looks kinda like a sine function.
But when we now do the string conversion and plot again (so no distribution but just one "number sting" after the next one) it looks like something different:
# number to strings, float with 2 digits after the comma
# (resolution doesn't matter, but the x axis lables are more readable)
x_reduced_string = ["%.2f" % i for i in x_reduced]
print(x_reduced_string)
fig1 = px.scatter(x=x_reduced_string,y=y_reduced)
fig2 = px.line(x=x_reduced_string,y=y_reduced)
fig = go.Figure(data = fig1.data + fig2.data)
fig.show()
['0.20', '0.30', '0.81', '1.21', '1.31', '1.52', '1.62', '2.42', '2.63', '3.33', '4.55', '5.45', '5.76', '5.86', '5.96', '6.67', '7.88', '8.38', '8.48', '8.59', '8.69', '9.09', '9.39', '9.49', '9.80']
The same effect shown in a series of bar plots of the above data:
So keep that in mind when generating the bar plot as you intended in your question.
I’d recommend when you do this it would be good to explicitely highlight or mention that because when others see numbers on the x axis they probably expect the default plot.