Best fit line semilog scale with linear y-axis and log x-axis
Question:
My question is closely related to the following topic on SO:
Fit straight line on semi-log scale with Matplotlib
However, I want to create a best fit line in a chart where the X-axis is logarithmic and the Y-axis is linear.
import matplotlib.pyplot as plt
import numpy as np
plt.scatter(players['AB'], players['Average'], c='black', alpha=0.5)
p = np.polyfit(players['AB'], players['Average'], 1)
plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)
plt.xscale('log')
plt.xlim(1, 25000)
plt.ylim(-0.05, 0.60)
plt.xlabel('Number of at-bats (AB)')
plt.ylabel('Batting Average')
plt.show()
This gives me the following:
What am I doing wrong? Thanks
EDIT
p = np.polyfit(np.log(players['AB']), players['Average'], 1)
plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)
Answers:
I believe you need to do
p = np.polyfit(np.log(players['AB']), players['Average'], 1)
plt.plot(players['AB'], p[0] * np.log(players['AB']) + p[1])
This will give you a linear polynomial fit when plotted in x-axis semi-log space. Here is a complete example demonstrating this
import matplotlib.pyplot as plt
import numpy as np
n = 100
np.random.seed(1)
x = np.linspace(1,10000,n)
y = np.zeros(n)
rand = np.random.randn(n)
for ii in range(1,n):
x[ii] = 10**(float(ii)/10.0) # Create semi-log linear data
y[ii] = rand[ii]*10 + float(ii) # with some noise in the y values
plt.scatter(x,y)
p = np.polyfit(np.log(x), y, 1)
plt.semilogx(x, p[0] * np.log(x) + p[1], 'g--')
plt.xscale('log')
plt.show()
For the sample data generated this gives you
My question is closely related to the following topic on SO:
Fit straight line on semi-log scale with Matplotlib
However, I want to create a best fit line in a chart where the X-axis is logarithmic and the Y-axis is linear.
import matplotlib.pyplot as plt
import numpy as np
plt.scatter(players['AB'], players['Average'], c='black', alpha=0.5)
p = np.polyfit(players['AB'], players['Average'], 1)
plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)
plt.xscale('log')
plt.xlim(1, 25000)
plt.ylim(-0.05, 0.60)
plt.xlabel('Number of at-bats (AB)')
plt.ylabel('Batting Average')
plt.show()
This gives me the following:
What am I doing wrong? Thanks
EDIT
p = np.polyfit(np.log(players['AB']), players['Average'], 1)
plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)
I believe you need to do
p = np.polyfit(np.log(players['AB']), players['Average'], 1)
plt.plot(players['AB'], p[0] * np.log(players['AB']) + p[1])
This will give you a linear polynomial fit when plotted in x-axis semi-log space. Here is a complete example demonstrating this
import matplotlib.pyplot as plt
import numpy as np
n = 100
np.random.seed(1)
x = np.linspace(1,10000,n)
y = np.zeros(n)
rand = np.random.randn(n)
for ii in range(1,n):
x[ii] = 10**(float(ii)/10.0) # Create semi-log linear data
y[ii] = rand[ii]*10 + float(ii) # with some noise in the y values
plt.scatter(x,y)
p = np.polyfit(np.log(x), y, 1)
plt.semilogx(x, p[0] * np.log(x) + p[1], 'g--')
plt.xscale('log')
plt.show()
For the sample data generated this gives you