# Pulp Python problem setting constraints when summing values in a column

## Question:

Hi this is my first question here so go easy on me if I format things incorrectly.

I’m trying to model a table where each value is either 1 or 0.

I’d like to determine whether the sum of a column is 0 or not 0, then check how many columns are > 0.

The underlying problem I’m trying to solve is appointment scheduling, where each column represent one appointment. I’ve simplified it here as in the original I’m using a dataframe to match clinician competencies to patient needs (each row is a patient need). My problem started when I tried to ensure all variables could only be equal to 1 if in one if they were in one of 2 columns, hence my simplified code here to try to work out where I am going wrong.

I’ve set up a pulp variable dictionary with ROWS and COLS as the keys, and value == 0 or 1.

In the problem definition I’m trying to assign a value of 1 to the column sum if sum of the row values in the column is >= 1 and 0 otherwise, then summing the total. This should allow me to set the total number of columns that sum to >= 1, for example only 2 columns are represented by non zero variables.

In the code below my aim is for the total sum of all variables to be minimised BUT there should be 2 columns that contain a variable 1 i.e. 2 columns sum to >=1.

Thanks in advance.

```
import pulp as Pulp
ROWS = range(1, 6)
COLS = range(1,5)
prob = Pulp.LpProblem("Fewestcolumns", Pulp.LpMinimize)
choices = Pulp.LpVariable.dicts("Choice", (ROWS, COLS), cat="Integer", lowBound=0, upBound=1)
prob += Pulp.lpSum([choices[row][col] for row in ROWS for col in COLS])
prob += Pulp.lpSum([1 if Pulp.lpSum([choices[row][col] for row in ROWS]) >= 1 else 0 for col in COLS]) == 2
prob.solve()
print("Status:", Pulp.LpStatus[prob.status])
for v in prob.variables():
print(v.name, "=", v.varValue)`
```

My results:

```
C:UsersxxxComputingLinearProgrammingScriptspython.exe C:/Users/xxx/Computing/LinearProgramming/LinearProgTest.py
Welcome to the CBC MILP Solver
Version: 2.10.3
Build Date: Dec 15 2019
command line - C:UsersxxxxComputingLinearProgramminglibsite-packagespulpsolverdircbcwin64cbc.exe C:UserssimonAppDataLocalTemp4f8ff67726844bde8abe98316b6338c4-pulp.mps timeMode elapsed branch printingOptions all solution C:UserssimonAppDataLocalTemp4f8ff67726844bde8abe98316b6338c4-pulp.sol (default strategy 1)
At line 2 NAME MODEL
At line 3 ROWS
At line 6 COLUMNS
At line 67 RHS
At line 69 BOUNDS
At line 90 ENDATA
Problem MODEL has 1 rows, 20 columns and 0 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Problem is infeasible - 0.00 seconds
Option for printingOptions changed from normal to all
Total time (CPU seconds): 0.01 (Wallclock seconds): 0.01
Status: Infeasible
Choice_1_1 = 0.0
Choice_1_2 = 0.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0
Process finished with exit code 0
```

I was expecting a list of variables a bit like this, with a possible solution:

```
Status: Optimal
Choice_1_1 = 1.0
Choice_1_2 = 1.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0
```

Edits:

Many thanks AirSquid for pointing me in the right direction. I’m still struggling with big M constraints.

I tried this:

```
import pulp as Pulp
ROWS = range(1, 6)
COLS = range(1,5)
prob = Pulp.LpProblem("Fewestcolumns", Pulp.LpMaximize)
choices = Pulp.LpVariable.dicts("Choice", (ROWS, COLS), cat="Integer", lowBound=0, upBound=1)
used = Pulp.LpVariable.dicts("used", COLS, cat="Binary")
b = Pulp.LpVariable.dicts("b", COLS, cat="Binary")
prob += Pulp.lpSum([choices[row][col] for row in ROWS for col in COLS])
for rows, items in choices.items():
prob += Pulp.lpSum(cols for cols in items.values()) == 1
M = 20
for col in COLS:
prob += b[col] >= (Pulp.lpSum([choices[row][col] for row in ROWS]) - 1) / M
prob += used[col] >= M * (b[col] - 1)
prob += Pulp.lpSum([used[col] for col in COLS]) == 2
prob.solve()
print("Status:", Pulp.LpStatus[prob.status])
for v in prob.variables():
print(v.name, "=", v.varValue)
```

I got the following results:

```
Result - Optimal solution found
Objective value: 5.00000000
Enumerated nodes: 0
Total iterations: 0
Time (CPU seconds): 0.00
Time (Wallclock seconds): 0.00
Option for printingOptions changed from normal to all
Total time (CPU seconds): 0.01 (Wallclock seconds): 0.02
Status: Optimal
Choice_1_1 = 0.0
Choice_1_2 = 0.0
Choice_1_3 = 0.0
Choice_1_4 = 1.0
Choice_2_1 = 0.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 1.0
Choice_3_1 = 0.0
Choice_3_2 = 0.0
Choice_3_3 = 0.0
Choice_3_4 = 1.0
Choice_4_1 = 0.0
Choice_4_2 = 0.0
Choice_4_3 = 0.0
Choice_4_4 = 1.0
Choice_5_1 = 0.0
Choice_5_2 = 0.0
Choice_5_3 = 0.0
Choice_5_4 = 1.0
b_1 = 1.0
b_2 = 1.0
b_3 = 1.0
b_4 = 1.0
used_1 = 1.0
used_2 = 1.0
used_3 = 0.0
used_4 = 0.0
Process finished with exit code 0
```

Not sure what I did wrong – I was hoping for some 1.0s in columns that aren’t column 4. Any more hints please?

## Answers:

Your question is clear, but the setup on your LP isn’t real clear. We can come back to that.

You are getting the error because you used an `if`

statement in your summation. That isn’t legal. When `pulp`

makes the math model to solve, the value of the variables are not known, so we cannot use `if`

statements in the formulation. It sounds like you want to use a “big M” constraint here to see if anything was selected within the column. (Google it or look on this site, it is a fundamental LP concept and I have posted several answers with it). You will need to introduce another binary variable indexed by column and then minimize that… In pseudocode:

```
used[col] a binary variable, indexed by Col
M = some suitably large variable (a max). In your case the number of rows would be appropriate.
```

Then:

```
sum(choices[row, col] for row in rows) <= used[col] * M
```

If desired, you could then minimize the variable `used`

to minimize columns used.

I had trouble with > causing errors but >= didn’t force the used_ variable to be either 1 or 0.

I ended up adding a very small number to formula to ensure if my decision variable b_ == 1 then my used_ variable would defintely be 1:

```
for col in COLS:
prob += b[col] >= 0.001 + ((Pulp.lpSum([choices[row][col] for row in ROWS]) - 1) / M)
prob += used[col] >= (M * (b[col] - 1)) + 0.001
```

prob += Pulp.lpSum([used[col] for col in COLS]) == 2

This gave the following result:

```
Status: Optimal
Choice_1_1 = 0.0
Choice_1_2 = 1.0
Choice_1_3 = 0.0
Choice_1_4 = 0.0
Choice_2_1 = 1.0
Choice_2_2 = 0.0
Choice_2_3 = 0.0
Choice_2_4 = 0.0
Choice_3_1 = 0.0
Choice_3_2 = 1.0
Choice_3_3 = 0.0
Choice_3_4 = 0.0
Choice_4_1 = 0.0
Choice_4_2 = 1.0
Choice_4_3 = 0.0
Choice_4_4 = 0.0
Choice_5_1 = 0.0
Choice_5_2 = 1.0
Choice_5_3 = 0.0
Choice_5_4 = 0.0
b_1 = 1.0
b_2 = 1.0
b_3 = 0.0
b_4 = 0.0
used_1 = 1.0
used_2 = 1.0
used_3 = 0.0
used_4 = 0.0
```

This seems to work with different numbers of columns to give acceptable answers.

I’m not sure how hacky this solution is so if there is a more elegant way please feel free to answer!

Thanks for the help AirSquid, just what I was hoping for!