Calculate group fairness metrics with AIF360

Question:

I want to calculate group fairness metrics using AIF360. This is a sample dataset and model, in which gender is the protected attribute and income is the target.

import pandas as pd
from sklearn.svm import SVC
from aif360.sklearn import metrics

df = pd.DataFrame({'gender': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
                  'experience': [0, 0.1, 0.2, 0.4, 0.5, 0.6, 0, 0.1, 0.2, 0.4, 0.5, 0.6],
                  'income': [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1]})

clf = SVC(random_state=0).fit(df[['gender', 'experience']], df['income'])

y_pred = clf.predict(df[['gender', 'experience']])

metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

It throws out:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-609692e52b2a> in <module>
     11 y_pred = clf.predict(X)
     12 
---> 13 metrics.statistical_parity_difference(y_true=df['income'], y_pred=y_pred, prot_attr='gender', priv_group=1, pos_label=1)

TypeError: statistical_parity_difference() got an unexpected keyword argument 'y_true'

Similar error for disparate_impact_ratio. It seems the data needs to be entered differently, but I have not been able to figure out how.

Asked By: Reveille

||

Answers:

Remove the y_true= and y_pred= characters in the function call and retry. As one can see in the documentation, *y within the function prototype stands for arbitrary number of arguments (see this post). So this is the most logical guess.

In other words, y_true and y_pred are NOT keyword arguments. So they cannot be passed with their names. Keyword arguments are expressed as **kwargs within a function prototype.

Answered By: Bill Huang

This can be done by transforming the data to a StandardDataset followed by calling the fair_metrics function below:

from aif360.datasets import StandardDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric

dataset = StandardDataset(df, 
                          label_name='income', 
                          favorable_classes=[1], 
                          protected_attribute_names=['gender'], 
                          privileged_classes=[[1]])

def fair_metrics(dataset, y_pred):
    dataset_pred = dataset.copy()
    dataset_pred.labels = y_pred
        
    attr = dataset_pred.protected_attribute_names[0]
    
    idx = dataset_pred.protected_attribute_names.index(attr)
    privileged_groups =  [{attr:dataset_pred.privileged_protected_attributes[idx][0]}] 
    unprivileged_groups = [{attr:dataset_pred.unprivileged_protected_attributes[idx][0]}] 

    classified_metric = ClassificationMetric(dataset, dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    metric_pred = BinaryLabelDatasetMetric(dataset_pred, unprivileged_groups=unprivileged_groups, privileged_groups=privileged_groups)

    result = {'statistical_parity_difference': metric_pred.statistical_parity_difference(),
             'disparate_impact': metric_pred.disparate_impact(),
             'equal_opportunity_difference': classified_metric.equal_opportunity_difference()}
        
    return result


fair_metrics(dataset, y_pred)

which returns the correct results (image ref):

{'statistical_parity_difference': -0.6666666666666667,
 'disparate_impact': 0.3333333333333333,
 'equal_opportunity_difference': 0.0}

enter image description here

Answered By: Reveille

I had the same problem. The y_pred_default was array type and the whole dataset was Dataframe. But if you convert the y_pred_default to dataframe you will lose the order of the values and as a result it will show nan values to the new dataset. So i converted the dataset to numpy array, then concat with the y_pred_default array and convert to dataframe. Also you have to change the column names as they were first because now there are numbers. By doing this you have exactly what you want. A dataframe with your x values and the corresponding y predicted values in order to count the spd metric.

Answered By: Thanos Tompras