What does "KeyError(f"None of [{key}] are in the [{axis_name}]")" mean

Question:

I have a sample data frame created from the columns of two different data frames.
The code for that looks like this:

import pandas as pd
pvgis_df = pd.read_csv(pvgis_file)

month = pd.Series(pvgis_df["Month"],)

pvgis_generated = pd.Series(pvgis_df["Avg Monthly Energy Production"],)

pvoutput_generated = pd.Series(pvoutput_df["Generated (KWh)"],)

frame = {
   "Month": month, "PVGIS Generated": pvgis_generated, 
   "PVOUTPUT Generated": pvoutput_generated
}
joined_df = pd.DataFrame(frame)

And output looks like this:

    Month  PVGIS Generated  PVOUTPUT Generated
0     1.0        107434.69        80608.001709
1     2.0        112428.41       106485.000610
2     3.0        153701.40       132772.003174
3     4.0        179380.47       148830.993652
4     5.0        200402.90       177705.001831
5     6.0        211507.83       173893.005371
6     7.0        233932.95       182261.993408
7     8.0        223986.41       174046.005249
8     9.0        178682.94       142970.993042
9    10.0        142141.02       107087.997437
10   11.0        108498.34        73358.001709
11   12.0        101886.06        73003.997803

Now I want to plot the other columns against Month and I have my code looking like this

from matplotlib import pyplot as plt

label = [
  df["Month"], df["PVGIS Generated"], 
  df["PVOUTPUT Generated"]
]

figure_title = f"{plt.xlabel} VS {plt.ylabel}"
fig = plt.figure(figure_title)
fig.set_size_inches(13.6, 7.06) 
plot_no = df.shape
filename = f"{folder}_joined"
color="blue"
plt.legend()
plt.xlabel("Month")
plt.ylabel("Generated")
plt.grid()
plt.margins(x=0)
plt.ticklabel_format(useOffset=False, axis="y", style="plain")
plt.bar(df[label[0]], df[label[1]])
plt.bar(df[label[0]], df[label[2]])

plt.show()
plt.close()

When I run it, I get a key error

KeyError: "None of [Float64Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0], dtype='float64')] are in the [columns]

I have tried making reindexing and making the month column an index but I keep running into different versions of KeyError.

What may I be missing?
Does this mean the column is missing from the dataframe? If yes how come?

Asked By: Nnaobi

||

Answers:

The error is caused by the fact that in label you are listing the dataframe series in place of the columns names only; try with:

label = ["Month", "PVGIS Generated", "PVOUTPUT Generated"]

In any case, I suggest you to use the object oriented interface to draw plots with matplotlib.

Complete Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(42)
df = pd.DataFrame({'Month': np.arange(0, 12, 1)})
df['PVGIS Generated'] = 150000 + 30000*np.random.randn(len(df))
df['PVOUTPUT Generated'] = 120000 + 40000*np.random.randn(len(df))

fig, ax = plt.subplots()
fig.set_size_inches(13.6, 7.06)

width = 0.3
ax.bar(x = df['Month'] - width/2, height = df['PVGIS Generated'], width = width, align = 'center', label = 'PVGIS Generated')
ax.bar(x = df['Month'] + width/2, height = df['PVOUTPUT Generated'], width = width, align = 'center', label = 'PVOUTPUT Generated')

ax.set_xlabel('Month')
ax.set_ylabel('Generated')
ax.set_title('Month VS Generated')

plt.ticklabel_format(useOffset = False, axis = 'y', style = 'plain')

ax.legend()
ax.grid()

plt.show()

Plot

enter image description here

Answered By: Zephyr