Calculate group fairness metrics with AIF360
Question:
I want to calculate group fairness metrics using AIF360. This is a sample dataset and model, in which gender is the protected attribute and income is the target.
import pandas as pd
from sklearn.svm import SVC
from aif360.sklearn import metrics
df = pd.DataFrame({'gender': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
'experience': [0, 0.1, 0.2, 0.4, 0.5, 0.6, 0, 0.1, 0.2, 0.4, 0.5, 0.6],
'income': [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]})
clf = SVC(random_state=0).fit(df[['gender', 'experience']], df['income'])
y_pred = clf.predict(df[['gender', 'experience']])
metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)
It throws out:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-609692e52b2a> in <module>
11 y_pred = clf.predict(X)
12
---> 13 metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)
TypeError: statistical_parity_difference() got an unexpected keyword argument 'y_true'
Similar error for disparate_impact_ratio
. It seems the data needs to be entered differently, but I have not been able to figure out how.
Answers:
Remove the y_true=
and y_pred=
characters in the function call and retry. As one can see in the documentation, *y
within the function prototype stands for arbitrary number of arguments (see this post). So this is the most logical guess.
In other words, y_true
and y_pred
are NOT keyword arguments. So they cannot be passed with their names. Keyword arguments are expressed as **kwargs
within a function prototype.
This can be done by transforming the data to a StandardDataset
followed by calling the fair_metrics
function below:
from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
dataset = StandardDataset(df,
label_name='income',
favorable_classes=[1],
protected_attribute_names=['gender'],
privileged_classes=[[1]])
def fair_metrics(dataset, y_pred):
dataset_pred = dataset.copy()
dataset_pred.labels = y_pred
attr = dataset_pred.protected_attribute_names[0]
idx = dataset_pred.protected_attribute_names.index(attr)
privileged_groups = [{attr:dataset_pred.privileged_protected_attributes[idx][0]}]
unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}]
classified_metric = ClassificationMetric(dataset, dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
metric_pred = BinaryLabelDatasetMetric(dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
result = {'statistical_parity_difference': metric_pred.statistical_parity_difference(),
'disparate_impact': metric_pred.disparate_impact(),
'equal_opportunity_difference': classified_metric.equal_opportunity_difference()}
return result
fair_metrics(dataset, y_pred)
which returns the correct results (image ref):
{'statistical_parity_difference': -0.6666666666666667,
'disparate_impact': 0.3333333333333333,
'equal_opportunity_difference': 0.0}
I had the same problem. The y_pred_default was array type and the whole dataset was Dataframe. But if you convert the y_pred_default to dataframe you will lose the order of the values and as a result it will show nan values to the new dataset. So i converted the dataset to numpy array, then concat with the y_pred_default array and convert to dataframe. Also you have to change the column names as they were first because now there are numbers. By doing this you have exactly what you want. A dataframe with your x values and the corresponding y predicted values in order to count the spd metric.
I want to calculate group fairness metrics using AIF360. This is a sample dataset and model, in which gender is the protected attribute and income is the target.
import pandas as pd
from sklearn.svm import SVC
from aif360.sklearn import metrics
df = pd.DataFrame({'gender': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
'experience': [0, 0.1, 0.2, 0.4, 0.5, 0.6, 0, 0.1, 0.2, 0.4, 0.5, 0.6],
'income': [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]})
clf = SVC(random_state=0).fit(df[['gender', 'experience']], df['income'])
y_pred = clf.predict(df[['gender', 'experience']])
metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)
It throws out:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-609692e52b2a> in <module>
11 y_pred = clf.predict(X)
12
---> 13 metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)
TypeError: statistical_parity_difference() got an unexpected keyword argument 'y_true'
Similar error for disparate_impact_ratio
. It seems the data needs to be entered differently, but I have not been able to figure out how.
Remove the y_true=
and y_pred=
characters in the function call and retry. As one can see in the documentation, *y
within the function prototype stands for arbitrary number of arguments (see this post). So this is the most logical guess.
In other words, y_true
and y_pred
are NOT keyword arguments. So they cannot be passed with their names. Keyword arguments are expressed as **kwargs
within a function prototype.
This can be done by transforming the data to a StandardDataset
followed by calling the fair_metrics
function below:
from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
dataset = StandardDataset(df,
label_name='income',
favorable_classes=[1],
protected_attribute_names=['gender'],
privileged_classes=[[1]])
def fair_metrics(dataset, y_pred):
dataset_pred = dataset.copy()
dataset_pred.labels = y_pred
attr = dataset_pred.protected_attribute_names[0]
idx = dataset_pred.protected_attribute_names.index(attr)
privileged_groups = [{attr:dataset_pred.privileged_protected_attributes[idx][0]}]
unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}]
classified_metric = ClassificationMetric(dataset, dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
metric_pred = BinaryLabelDatasetMetric(dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)
result = {'statistical_parity_difference': metric_pred.statistical_parity_difference(),
'disparate_impact': metric_pred.disparate_impact(),
'equal_opportunity_difference': classified_metric.equal_opportunity_difference()}
return result
fair_metrics(dataset, y_pred)
which returns the correct results (image ref):
{'statistical_parity_difference': -0.6666666666666667,
'disparate_impact': 0.3333333333333333,
'equal_opportunity_difference': 0.0}
I had the same problem. The y_pred_default was array type and the whole dataset was Dataframe. But if you convert the y_pred_default to dataframe you will lose the order of the values and as a result it will show nan values to the new dataset. So i converted the dataset to numpy array, then concat with the y_pred_default array and convert to dataframe. Also you have to change the column names as they were first because now there are numbers. By doing this you have exactly what you want. A dataframe with your x values and the corresponding y predicted values in order to count the spd metric.