Pulp Python problem setting constraints when summing values in a column

Question:

Hi this is my first question here so go easy on me if I format things incorrectly.

I’m trying to model a table where each value is either 1 or 0.
I’d like to determine whether the sum of a column is 0 or not 0, then check how many columns are > 0.
The underlying problem I’m trying to solve is appointment scheduling, where each column represent one appointment. I’ve simplified it here as in the original I’m using a dataframe to match clinician competencies to patient needs (each row is a patient need). My problem started when I tried to ensure all variables could only be equal to 1 if in one if they were in one of 2 columns, hence my simplified code here to try to work out where I am going wrong.

I’ve set up a pulp variable dictionary with ROWS and COLS as the keys, and value == 0 or 1.

In the problem definition I’m trying to assign a value of 1 to the column sum if sum of the row values in the column is >= 1 and 0 otherwise, then summing the total. This should allow me to set the total number of columns that sum to >= 1, for example only 2 columns are represented by non zero variables.

In the code below my aim is for the total sum of all variables to be minimised BUT there should be 2 columns that contain a variable 1 i.e. 2 columns sum to >=1.

Thanks in advance.

import pulp as Pulp
ROWS = range(1, 6)
COLS = range(1,5)

prob = Pulp.LpProblem("Fewestcolumns", Pulp.LpMinimize)
choices = Pulp.LpVariable.dicts("Choice", (ROWS, COLS), cat="Integer", lowBound=0, upBound=1)
prob += Pulp.lpSum([choices[row][col] for row in ROWS for col in COLS])
prob += Pulp.lpSum([1 if Pulp.lpSum([choices[row][col] for row in ROWS]) >= 1 else 0 for col in COLS]) == 2



prob.solve()

print("Status:", Pulp.LpStatus[prob.status])
for v in prob.variables():
    print(v.name, "=", v.varValue)`

My results:

C:UsersxxxComputingLinearProgrammingScriptspython.exe C:/Users/xxx/Computing/LinearProgramming/LinearProgTest.py
Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - C:UsersxxxxComputingLinearProgramminglibsite-packagespulpsolverdircbcwin64cbc.exe C:UserssimonAppDataLocalTemp4f8ff67726844bde8abe98316b6338c4-pulp.mps timeMode elapsed branch printingOptions all solution C:UserssimonAppDataLocalTemp4f8ff67726844bde8abe98316b6338c4-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 6 COLUMNS
At line 67 RHS
At line 69 BOUNDS
At line 90 ENDATA
Problem MODEL has 1 rows, 20 columns and 0 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Problem is infeasible - 0.00 seconds
Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.01

Status: Infeasible
Choice_1_1 = 0.0
Choice_1_2 = 0.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0

Process finished with exit code 0

I was expecting a list of variables a bit like this, with a possible solution:

Status: Optimal
Choice_1_1 = 1.0
Choice_1_2 = 1.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0

Edits:
Many thanks AirSquid for pointing me in the right direction. I’m still struggling with big M constraints.

I tried this:

import pulp as Pulp
ROWS = range(1, 6)
COLS = range(1,5)

prob = Pulp.LpProblem("Fewestcolumns", Pulp.LpMaximize)
choices = Pulp.LpVariable.dicts("Choice", (ROWS, COLS), cat="Integer", lowBound=0, upBound=1)
used = Pulp.LpVariable.dicts("used", COLS, cat="Binary")
b = Pulp.LpVariable.dicts("b", COLS, cat="Binary")

prob += Pulp.lpSum([choices[row][col] for row in ROWS for col in COLS])
for rows, items in choices.items():
    prob += Pulp.lpSum(cols for cols in items.values()) == 1

M = 20
for col in COLS:
    prob += b[col] >= (Pulp.lpSum([choices[row][col] for row in ROWS]) - 1) / M
    prob += used[col] >= M * (b[col] - 1)

prob += Pulp.lpSum([used[col] for col in COLS]) == 2
prob.solve()

print("Status:", Pulp.LpStatus[prob.status])
for v in prob.variables():
    print(v.name, "=", v.varValue)

I got the following results:

 Result - Optimal solution found

Objective value:                5.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.01   (Wallclock seconds):       0.02

Status: Optimal
Choice_1_1 = 0.0
Choice_1_2 = 0.0
Choice_1_3 = 0.0
Choice_1_4 = 1.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 1.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 1.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 1.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 1.0
b_1 = 1.0
b_2 = 1.0
b_3 = 1.0
b_4 = 1.0
used_1 = 1.0
used_2 = 1.0
used_3 = 0.0
used_4 = 0.0

Process finished with exit code 0

Not sure what I did wrong – I was hoping for some 1.0s in columns that aren’t column 4. Any more hints please?

Asked By: ZachCope

||

Answers:

Your question is clear, but the setup on your LP isn’t real clear. We can come back to that.

You are getting the error because you used an if statement in your summation. That isn’t legal. When pulp makes the math model to solve, the value of the variables are not known, so we cannot use if statements in the formulation. It sounds like you want to use a “big M” constraint here to see if anything was selected within the column. (Google it or look on this site, it is a fundamental LP concept and I have posted several answers with it). You will need to introduce another binary variable indexed by column and then minimize that… In pseudocode:

used[col] a binary variable, indexed by Col
M = some suitably large variable (a max).  In your case the number of rows would be appropriate.

Then:

sum(choices[row, col] for row in rows) <= used[col] * M    

If desired, you could then minimize the variable used to minimize columns used.

Answered By: AirSquid

I had trouble with > causing errors but >= didn’t force the used_ variable to be either 1 or 0.

I ended up adding a very small number to formula to ensure if my decision variable b_ == 1 then my used_ variable would defintely be 1:

for col in COLS:
prob += b[col] >= 0.001 + ((Pulp.lpSum([choices[row][col] for row in ROWS]) - 1) / M)
prob += used[col] >= (M * (b[col] - 1)) + 0.001

prob += Pulp.lpSum([used[col] for col in COLS]) == 2

This gave the following result:

Status: Optimal
Choice_1_1 = 0.0
Choice_1_2 = 1.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 1.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 1.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 1.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 1.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0
b_1 = 1.0
b_2 = 1.0
b_3 = 0.0
b_4 = 0.0
used_1 = 1.0
used_2 = 1.0
used_3 = 0.0
used_4 = 0.0

This seems to work with different numbers of columns to give acceptable answers.

I’m not sure how hacky this solution is so if there is a more elegant way please feel free to answer!

Thanks for the help AirSquid, just what I was hoping for!

Answered By: ZachCope
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.