How to returns the number of records that have passed since EXPR in Python

Question

I’m trying to replicate the IBM SPSS function @SINCE using Python and Pandas, but unfortunately, I’ve got stuck in a part of my process.
Is there a direct function that replicates IBM SPSS CLEM @SINCE using python?

Here is the link for more info: Link

IMB @SINCE function description

"This function returns the offset of the last record where this condition was true–that is, the number of records before this one in which the condition was true. If the condition has never been true, @SINCE returns @INDEX + 1." (IBM, 2020)

I’ve been trying to replicate this function from scratch, but I haven’t find the right way to do it:

How do I do this with pandas and Python?

Here the problem,

My data looks like this:

Original Data

+------+----------+
| Type | Flag     |
+------+----------+
| d    |          |
+------+----------+
| A    | myStatus |
+------+----------+
| c    |          |
+------+----------+
| B    | myStatus |
+------+----------+
| c    |          |
+------+----------+
| c    | myStatus |
+------+----------+
| c    |          |
+------+----------+
| d    |          |
+------+----------+
| d    |          |
+------+----------+
| A    | myStatus |
+------+----------+

In IBM SPSS I use this formula to get this data:

if Type = 'A' or Type = 'B' then @SINCE(Flag = 'myStatus') else -1 endif

And this is the output:

+------+----------+----------------+
| Type | Flag     | Expected Count |
+------+----------+----------------+
| d    |          | -1             |
+------+----------+----------------+
| A    | myStatus | 0              |
+------+----------+----------------+
| c    |          | -1             |
+------+----------+----------------+
| B    | myStatus | 2              |
+------+----------+----------------+
| c    |          | -1             |
+------+----------+----------------+
| c    | myStatus | -1             |
+------+----------+----------------+
| c    |          | -1             |
+------+----------+----------------+
| d    |          | -1             |
+------+----------+----------------+
| d    |          | -1             |
+------+----------+----------------+
| A    | myStatus | 4              |
+------+----------+----------------+

Asked By: Jorge Enrique Gamboa Fuentes

||

Source

Answer 1

So, I found the way to fix this problem: here is the code:

df = pd.DataFrame({"Type":["d", "A", "c", "B", "c", "c", "c", "d", "d", "A"],
               "Flag":[np.nan, "myStatus", np.nan, "myStatus", np.nan, "myStatus", np.nan, np.nan, np.nan, "myStatus"]})

Function to solve the problem:

def spssSince(df):
   df_temp = df
   df_temp = df[df.Flag=="myStatus"]
   df_temp['last_ind'] = df_temp.index
   df_temp['last_ind'] = df_temp.last_ind.shift(1)
   df_temp['last_ind'] = df_temp['last_ind'].fillna(1)
   df_temp["Expected Count"] = df_temp.index - df_temp.last_ind
   df_temp.loc[~df_temp.Type.isin(["A", "B"]), "Expected Count"] = -1
   DFreturn = pd.merge(left=df, right=df_temp.drop(['Type', 'Flag', 'last_ind'], axis=1), how="left", left_index=True, right_index=True)
   DFreturn["Expected Count"] = DFreturn["Expected Count"].fillna(-1)
   return DFreturn

Basically the function calculates the last SINCE value from a condition, calculating the actual index among the index that has the validation (using shift()).

Answered By: Jorge Enrique Gamboa Fuentes

How to returns the number of records that have passed since EXPR in Python

Question:

Answers: