What does "KeyError(f"None of [{key}] are in the [{axis_name}]")" mean
Question:
I have a sample data frame created from the columns of two different data frames.
The code for that looks like this:
import pandas as pd
pvgis_df = pd.read_csv(pvgis_file)
month = pd.Series(pvgis_df["Month"],)
pvgis_generated = pd.Series(pvgis_df["Avg Monthly Energy Production"],)
pvoutput_generated = pd.Series(pvoutput_df["Generated (KWh)"],)
frame = {
"Month": month, "PVGIS Generated": pvgis_generated,
"PVOUTPUT Generated": pvoutput_generated
}
joined_df = pd.DataFrame(frame)
And output looks like this:
Month PVGIS Generated PVOUTPUT Generated
0 1.0 107434.69 80608.001709
1 2.0 112428.41 106485.000610
2 3.0 153701.40 132772.003174
3 4.0 179380.47 148830.993652
4 5.0 200402.90 177705.001831
5 6.0 211507.83 173893.005371
6 7.0 233932.95 182261.993408
7 8.0 223986.41 174046.005249
8 9.0 178682.94 142970.993042
9 10.0 142141.02 107087.997437
10 11.0 108498.34 73358.001709
11 12.0 101886.06 73003.997803
Now I want to plot the other columns against Month and I have my code looking like this
from matplotlib import pyplot as plt
label = [
df["Month"], df["PVGIS Generated"],
df["PVOUTPUT Generated"]
]
figure_title = f"{plt.xlabel} VS {plt.ylabel}"
fig = plt.figure(figure_title)
fig.set_size_inches(13.6, 7.06)
plot_no = df.shape
filename = f"{folder}_joined"
color="blue"
plt.legend()
plt.xlabel("Month")
plt.ylabel("Generated")
plt.grid()
plt.margins(x=0)
plt.ticklabel_format(useOffset=False, axis="y", style="plain")
plt.bar(df[label[0]], df[label[1]])
plt.bar(df[label[0]], df[label[2]])
plt.show()
plt.close()
When I run it, I get a key error
KeyError: "None of [Float64Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='float64')] are in the [columns]
I have tried making reindexing and making the month column an index but I keep running into different versions of KeyError
.
What may I be missing?
Does this mean the column is missing from the dataframe? If yes how come?
Answers:
The error is caused by the fact that in label
you are listing the dataframe series in place of the columns names only; try with:
label = ["Month", "PVGIS Generated", "PVOUTPUT Generated"]
In any case, I suggest you to use the object oriented interface to draw plots with matplotlib.
Complete Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
df = pd.DataFrame({'Month': np.arange(0, 12, 1)})
df['PVGIS Generated'] = 150000 + 30000*np.random.randn(len(df))
df['PVOUTPUT Generated'] = 120000 + 40000*np.random.randn(len(df))
fig, ax = plt.subplots()
fig.set_size_inches(13.6, 7.06)
width = 0.3
ax.bar(x = df['Month'] - width/2, height = df['PVGIS Generated'], width = width, align = 'center', label = 'PVGIS Generated')
ax.bar(x = df['Month'] + width/2, height = df['PVOUTPUT Generated'], width = width, align = 'center', label = 'PVOUTPUT Generated')
ax.set_xlabel('Month')
ax.set_ylabel('Generated')
ax.set_title('Month VS Generated')
plt.ticklabel_format(useOffset = False, axis = 'y', style = 'plain')
ax.legend()
ax.grid()
plt.show()
Plot
I have a sample data frame created from the columns of two different data frames.
The code for that looks like this:
import pandas as pd
pvgis_df = pd.read_csv(pvgis_file)
month = pd.Series(pvgis_df["Month"],)
pvgis_generated = pd.Series(pvgis_df["Avg Monthly Energy Production"],)
pvoutput_generated = pd.Series(pvoutput_df["Generated (KWh)"],)
frame = {
"Month": month, "PVGIS Generated": pvgis_generated,
"PVOUTPUT Generated": pvoutput_generated
}
joined_df = pd.DataFrame(frame)
And output looks like this:
Month PVGIS Generated PVOUTPUT Generated
0 1.0 107434.69 80608.001709
1 2.0 112428.41 106485.000610
2 3.0 153701.40 132772.003174
3 4.0 179380.47 148830.993652
4 5.0 200402.90 177705.001831
5 6.0 211507.83 173893.005371
6 7.0 233932.95 182261.993408
7 8.0 223986.41 174046.005249
8 9.0 178682.94 142970.993042
9 10.0 142141.02 107087.997437
10 11.0 108498.34 73358.001709
11 12.0 101886.06 73003.997803
Now I want to plot the other columns against Month and I have my code looking like this
from matplotlib import pyplot as plt
label = [
df["Month"], df["PVGIS Generated"],
df["PVOUTPUT Generated"]
]
figure_title = f"{plt.xlabel} VS {plt.ylabel}"
fig = plt.figure(figure_title)
fig.set_size_inches(13.6, 7.06)
plot_no = df.shape
filename = f"{folder}_joined"
color="blue"
plt.legend()
plt.xlabel("Month")
plt.ylabel("Generated")
plt.grid()
plt.margins(x=0)
plt.ticklabel_format(useOffset=False, axis="y", style="plain")
plt.bar(df[label[0]], df[label[1]])
plt.bar(df[label[0]], df[label[2]])
plt.show()
plt.close()
When I run it, I get a key error
KeyError: "None of [Float64Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='float64')] are in the [columns]
I have tried making reindexing and making the month column an index but I keep running into different versions of KeyError
.
What may I be missing?
Does this mean the column is missing from the dataframe? If yes how come?
The error is caused by the fact that in label
you are listing the dataframe series in place of the columns names only; try with:
label = ["Month", "PVGIS Generated", "PVOUTPUT Generated"]
In any case, I suggest you to use the object oriented interface to draw plots with matplotlib.
Complete Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
df = pd.DataFrame({'Month': np.arange(0, 12, 1)})
df['PVGIS Generated'] = 150000 + 30000*np.random.randn(len(df))
df['PVOUTPUT Generated'] = 120000 + 40000*np.random.randn(len(df))
fig, ax = plt.subplots()
fig.set_size_inches(13.6, 7.06)
width = 0.3
ax.bar(x = df['Month'] - width/2, height = df['PVGIS Generated'], width = width, align = 'center', label = 'PVGIS Generated')
ax.bar(x = df['Month'] + width/2, height = df['PVOUTPUT Generated'], width = width, align = 'center', label = 'PVOUTPUT Generated')
ax.set_xlabel('Month')
ax.set_ylabel('Generated')
ax.set_title('Month VS Generated')
plt.ticklabel_format(useOffset = False, axis = 'y', style = 'plain')
ax.legend()
ax.grid()
plt.show()