DataFrame Multiobjective Sort to Define Pareto Boundary

Question:

Are there any multiobjective sorting algorithms built into Pandas?

I have found this which is an NSGA-II algorithm (which is what I want), but it requires passing the objective functions in as separate files. In an ideal world, I would use a DataFrame for all of the data, call a method like multi_of_sort on it while specifying the objective function columns (and other required parameters), and it would return another DataFrame with the Pareto optimum values.

This seems like it should be trivial with Pandas, but I could be wrong.

Asked By: mitchute

||

Answers:

As it turns out… the pareto package referenced above does handle DataFrame inputs.

import pareto
import pandas as pd

# load the data
df = pd.read_csv('data.csv')

# define the objective function column indices
# optional. default is ALL columns
of_cols = [4, 5]

# define the convergence tolerance for the OF's
# optional. default is 1e-9
eps_tols = [1, 2]

# sort
nondominated = pareto.eps_sort([list(df.itertuples(False))], of_cols, eps_tols)

# convert multi-dimension array to DataFrame
df_pareto = pd.DataFrame.from_records(nondominated, columns=list(df.columns.values))
Answered By: mitchute