How to assign a string variable to a numeric value for confusion matrix in Python?

Question:

In Python, I am trying to assign "Negative" as 0 and "Positive" as 1 for my large dataset. They are string variables as of now. My goal is to create a confusion matrix from these values in finding true positive, false positive, true negative, and false negative. This is my code so far:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

# from sklearn.metrics import make_classification

X = pd.DataFrame(df.iloc[:, :-1])
y = pd.DataFrame(df.iloc[:, :-1])
# generate 2 class dataset
# X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
logmodel = LogisticRegression()
print(logmodel.fit(X_train, y_train))
y_pred = logmodel.predict(X_test)
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
print(classification_report(y_test, y_test))
Asked By: mresh31

||

Answers:

If you have a list you can do this:

list_1 = ["Positive", "Negative", "Positive", "Positive"]
[int(x=="Positive") for x in list_1]

Output:
[1, 0, 1, 1]

If you have a dataframe you can do this:

import pandas as pd
df = pd.DataFrame({"value":["Positive", "Negative", "Positive", "Positive"]})
df.value.eq("Positive").astype(int)

Output:

0    1
1    0
2    1
3    1
Answered By: Michael S.