Trying to predict probability score using test data

Question:

I’m currently trying to test features and impacts to probability score on regression model we’ve built. I’m trying to test impacts of age on proba score to see if we need to retrain our model. I’m using parameters from our model as Param_Collection and using test data for age and sex and cc_list. I thought the code would work but for the life of me I can’t figure out what’s causing y to be null given the if statement below it should still show me the score if its not > threshold.

import numpy as np

# Building test data where member has PULL
x_test = {
    "cc_list": ["PULL"],
    "age": 38,
    "sex": "M"
}

# Defining parameters from training data from previous model
PARAM_COLLECTION = {
    "PULL": {
        "auc": 0.8202432743081695,
        "coef": [-0.01853237366699478, 0.14359336438414397, 3.0070029131017155, 1.4999028794882714, 0.2499927123452168, 0.00869006612608888, -0.17741710091314503],
        "features_sltd": ["CARM", "GIL", "PULL", "PULM", "SKCVL", "age", "sex"],
        "intercept": -3.066213895858403,
        "model_name": "l1-reg",
        "regularization_param": 100000.0,
        "threshold": 0.5277152026373001
    }
}

# Trying to predict the probability score here
y = {}
coll_name = "PULL"
param_coll = PARAM_COLLECTION[coll_name]

for cc in x_test["cc_list"]:
    if cc not in param_coll:
        continue
    param = param_coll[cc]
    if param["model_name"] == "none":
        continue
    features_sltd = param["features_sltd"]
    features_efft = []
    x_vec = np.zeros(len(features_sltd))
    for i, f in enumerate(features_sltd):
        if f in x_test["cc_list"]:
            x_vec[i] = 1.0
            features_efft.append((f, param["coef"][i]))
    features_efft = sorted(features_efft, key=lambda x: -x[1])
    features_efft = [f[0] for f in features_efft if f[1] > 0.1]   
    if len(features_efft)==0:
        continue
    x_vec[features_sltd.index("age")] = x_test["age"] 
    x_vec[features_sltd.index("sex")] = int(x_test["sex"]=="M")
    beta = np.dot(np.array(param["coef"]), x_vec) + param["intercept"]
    proba = 1.0/(1.0 + np.exp(-beta))
    if proba > param["threshold"]:
        y[cc] = {"score": np.clip(proba, 0.0, 1.0), "features": features_efft}
    else:
        y[cc] = {"score": 0.0, "features": []}

# Print the output
print(y)
Asked By: btk666

||

Answers:

The only cc value you have is PULL which is never in your param_coll, so the loop never runs past the first if statement.

Answered By: Michael Cao
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.