looping in a data frame failing: Overriding existing column values

Question:

I am using a for loop to reuse existing data frames.

Sample Code:

for i in range(0, 5, 1):
    RGU_TT_TempX = pd.DataFrame()
    RGU_TT_TempX = RGU_TT_Temp
    #Merging Regular Ambulance TT with MSUs TT
    #Updating MSUs TT according to the Formula
    RGU_TT_TempX["MSU_X_DURATION"] = 0.05 + df_temp_MSU1["MSU_X_DURATION"].values + 0.25 + 0.25
    RGU_TT_TempX["MSU_Y_DURATION"] = 0.05 + df_temp_MSU2["MSU_Y_DURATION"].values + 0.25 + 0.25
    RGU_TT_TempX["MSU_Z_DURATION"] = 0.05 + df_temp_MSU3["MSU_Z_DURATION"].values + 0.25 + 0.25

This gives me the error:

---> 44 RGU_TT_TempX["MSU_X_DURATION"] = 0.05 + df_temp_MSU1["MSU_X_DURATION"].values + 0.25 + 0.25

ValueError: Length of values (0) does not match length of index (16622)

In each data frame, I have 16622 values. Still, this gives me the length of the index error.

Full Error Track:

ValueError                                Traceback (most recent call last)
Input In [21], in <cell line: 16>()
     41 RGU_TT_TempX = RGU_TT_Temp
     42 #Merging Regular Ambulance TT with MSUs TT
     43 #Updating MSUs TT according to the Formula
---> 44 RGU_TT_TempX["MSU_X_DURATION"] = 0.05 + df_temp_MSU1["MSU_X_DURATION"].values + 0.25 + 0.25
     45 RGU_TT_TempX["MSU_Y_DURATION"] = 0.05 + df_temp_MSU2["MSU_Y_DURATION"].values + 0.25 + 0.25
     46 RGU_TT_TempX["MSU_Z_DURATION"] = 0.05 + df_temp_MSU3["MSU_Z_DURATION"].values + 0.25 + 0.25

File ~/opt/anaconda3/envs/geo_env/lib/python3.10/site-packages/pandas/core/frame.py:3977, in DataFrame.__setitem__(self, key, value)
   3974     self._setitem_array([key], value)
   3975 else:
   3976     # set column
-> 3977     self._set_item(key, value)

File ~/opt/anaconda3/envs/geo_env/lib/python3.10/site-packages/pandas/core/frame.py:4171, in DataFrame._set_item(self, key, value)
   4161 def _set_item(self, key, value) -> None:
   4162     """
   4163     Add series to DataFrame in specified column.
   4164 
   (...)
   4169     ensure homogeneity.
   4170     """
-> 4171     value = self._sanitize_column(value)
   4173     if (
   4174         key in self.columns
   4175         and value.ndim == 1
   4176         and not is_extension_array_dtype(value)
   4177     ):
   4178         # broadcast across multiple columns if necessary
   4179         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File ~/opt/anaconda3/envs/geo_env/lib/python3.10/site-packages/pandas/core/frame.py:4904, in DataFrame._sanitize_column(self, value)
   4901     return _reindex_for_setitem(Series(value), self.index)
   4903 if is_list_like(value):
-> 4904     com.require_length_match(value, self.index)
   4905 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File ~/opt/anaconda3/envs/geo_env/lib/python3.10/site-packages/pandas/core/common.py:561, in require_length_match(data, index)
    557 """
    558 Check the length of data matches the length of the index.
    559 """
    560 if len(data) != len(index):
--> 561     raise ValueError(
    562         "Length of values "
    563         f"({len(data)}) "
    564         "does not match length of index "
    565         f"({len(index)})"
    566     )

ValueError: Length of values (0) does not match length of index (16622)

I am really stuck here. Any suggestions will be highly appreciated.

Data Frame (MSU_TT_Temp) Samples:

FROM_ID TO_ID   DURATION_H  DIST_KM
    1     7      0.528556   38.43980
    1    26      0.512511   37.38515
    1    71      0.432453   32.57571
    1    83      0.599486   39.26188
    1    98      0.590517   35.53107 

Data Frame (RGU_TT_Temp) Samples:

Ambulance_ID    Centroid_ID Hospital_ID   Regular_Ambu_TT
    37               1            6         1.871885
    39               2            13        1.599971
    6                3             6        1.307165
    42               4            12        1.411554
    37               5            14        1.968138

The problem is, if I iterate my loop once, the code works absolutely fine.

Sample Code:

for i in range(0, 1, 1):
    s = my_chrome_list[i]
    MSU_X,MSU_Y,MSU_Z = s
    #print (MSU_X,MSU_Y,MSU_Z)

    #Three scenario 
    df_temp_MSU1 = pd.DataFrame()
    df_temp_MSU2 = pd.DataFrame()
    df_temp_MSU3 = pd.DataFrame()


    df_temp_MSU1 = MSU_TT_Temp.loc[(MSU_TT_Temp['FROM_ID'] == MSU_X)]
    df_temp_MSU1.rename(columns = {'DURATION_H':'MSU_X_DURATION'}, inplace = True)
    #df_temp_MSU1


    df_temp_MSU2 = MSU_TT_Temp.loc[(MSU_TT_Temp['FROM_ID'] == MSU_Y)]
    df_temp_MSU2.rename(columns = {'DURATION_H':'MSU_Y_DURATION'}, inplace = True)
    #df_temp_MSU2

    df_temp_MSU3 = MSU_TT_Temp.loc[(MSU_TT_Temp['FROM_ID'] == MSU_Z)]
    df_temp_MSU3.rename(columns = {'DURATION_H':'MSU_Z_DURATION'}, inplace = True)
    #df_temp_MSU3


    RGU_TT_TempX = pd.DataFrame()
    RGU_TT_TempX = RGU_TT_Temp
    #Merging Regular Ambulance TT with MSUs TT
    #Updating MSUs TT according to the Formula
    RGU_TT_TempX["MSU_X_DURATION"] = 0.05 + df_temp_MSU1["MSU_X_DURATION"].values + 0.25 + 0.25
    RGU_TT_TempX["MSU_Y_DURATION"] = 0.05 + df_temp_MSU2["MSU_Y_DURATION"].values + 0.25 + 0.25
    RGU_TT_TempX["MSU_Z_DURATION"] = 0.05 + df_temp_MSU3["MSU_Z_DURATION"].values + 0.25 + 0.25

    #RGU_TT_TempX

    #MSUs Average Time to Treatment
    MSU1=RGU_TT_TempX["MSU_X_DURATION"].mean()
    MSU2=RGU_TT_TempX["MSU_Y_DURATION"].mean()
    MSU3=RGU_TT_TempX["MSU_Z_DURATION"].mean()

    MSU_AVG_TT = (MSU1+MSU2+MSU3)/3
    parents_chromosomes_list.append(MSU_AVG_TT)

Output:

[2.0241383927258387]

Note: The data length in the three data frames are equal: Indexes are the same length

Loop for multiple iteration:Erorr

for i in range(0, 5, 1):

What is the problem?

Asked By: Oceans

||

Answers:

This error occurs when you attempt to assign a NumPy array of values to a new column in a pandas DataFrame, yet the array’s length does not match the current length of the index.

The easiest way to fix this error is to simply create a new column using a pandas Series instead of a NumPy array.

You can take a closer look here: How to Fix: Length of values does not match the length of index

Answered By: Hisham

I think you need to bug your code. Apparently it seems that your data samples are missing values among MSU_X,MSU_Y,MSU_Z.

Most probably, in the second iteration, you can check the value of MSU_X since it used in df_temp_MSU1 = MSU_TT_Temp.loc[(MSU_TT_Temp['FROM_ID'] == MSU_X)].

Moreover, you can use .len(df_temp_MSU1) to confirm that there is no in the data frame. The reason could be that there is no MSU_X in the coming data.

Good luck!

Answered By: LearningLogic