Best fit line semilog scale with linear y-axis and log x-axis

Question

My question is closely related to the following topic on SO:
Fit straight line on semi-log scale with Matplotlib

However, I want to create a best fit line in a chart where the X-axis is logarithmic and the Y-axis is linear.

import matplotlib.pyplot as plt
import numpy as np

plt.scatter(players['AB'], players['Average'], c='black', alpha=0.5)

p = np.polyfit(players['AB'], players['Average'], 1)
plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)

plt.xscale('log')
plt.xlim(1, 25000)
plt.ylim(-0.05, 0.60)
plt.xlabel('Number of at-bats (AB)')
plt.ylabel('Batting Average')
plt.show()

This gives me the following:

What am I doing wrong? Thanks

EDIT

  p = np.polyfit(np.log(players['AB']), players['Average'], 1)
  plt.plot(players['AB'], p[0] + p[1] * np.log(players['AB']), color='r', linestyle='dashed', alpha=0.7)

This still gives me the wrong best fit:

Asked By: HJA24

||

Source

Answer 1

I believe you need to do

p = np.polyfit(np.log(players['AB']), players['Average'], 1)
plt.plot(players['AB'], p[0] * np.log(players['AB']) + p[1])

This will give you a linear polynomial fit when plotted in x-axis semi-log space. Here is a complete example demonstrating this

import matplotlib.pyplot as plt
import numpy as np

n = 100
np.random.seed(1)
x = np.linspace(1,10000,n)
y = np.zeros(n)
rand = np.random.randn(n)
for ii in range(1,n):
    x[ii] = 10**(float(ii)/10.0)      # Create semi-log linear data
    y[ii] = rand[ii]*10 + float(ii)   # with some noise in the y values
    
plt.scatter(x,y)
p = np.polyfit(np.log(x), y, 1)
plt.semilogx(x, p[0] * np.log(x) + p[1], 'g--')

plt.xscale('log')

plt.show()

For the sample data generated this gives you

Answered By: William Miller

Best fit line semilog scale with linear y-axis and log x-axis

Question:

Answers: