Solving larger problems - the limits of brute force

Brute force - evaluating every possible combination - can be suitable for smaller problems. However, it very quickly gets out of hand. Let’s explore that here.

We’ll start by defining a simple problem. Note that we don’t have any geometry for this one - only the demand and travel matrices. You can read more about this in the no geometry example page.

from lokigi.site import SiteProblem
import time
import pickle
import pandas as pd

First, let’s initialise our SiteProblem() and add the demand and travel data.

problem = SiteProblem()
problem.add_demand(
    "https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/sh_demand.csv",
    demand_col="n_patients", location_id_col="sector")
problem.add_travel_matrix(
    "https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/clinic_car_travel_time.csv",
    source_col="sector")
problem.show_demand()
sector n_patients
0 PS1 3375
1 PS2 3338
2 PS3 2922
3 PS4 3191
4 PS5 3134
... ... ...
273 PS274 216
274 PS275 261
275 PS276 342
276 PS277 0
277 PS278 0

278 rows × 2 columns

problem.show_travel_matrix()
sector clinic_1 clinic_2 clinic_3 clinic_4 clinic_5 clinic_6 clinic_7 clinic_8 clinic_9 ... clinic_19 clinic_20 clinic_21 clinic_22 clinic_23 clinic_24 clinic_25 clinic_26 clinic_27 clinic_28
0 PS158 33.17 40.15 38.17 37.93 29.35 51.48 53.28 48.00 53.82 ... 12.10 12.27 15.83 53.27 53.98 29.75 34.22 32.68 19.62 39.25
1 PS159 31.42 36.55 36.42 34.53 27.60 47.88 49.68 44.40 50.22 ... 11.75 11.92 10.62 49.68 50.38 26.15 30.62 32.35 19.28 35.65
2 PS160 31.82 38.80 36.82 36.58 28.00 50.13 51.95 46.65 52.47 ... 10.75 10.92 14.35 51.93 52.65 28.40 32.87 31.35 18.27 37.90
3 PS161 31.68 38.65 36.67 36.43 27.87 49.98 51.80 46.50 52.32 ... 10.32 10.77 16.38 51.78 52.50 28.27 32.73 31.20 17.82 37.75
4 PS162 29.55 36.53 34.55 34.32 25.73 47.87 49.67 44.38 50.20 ... 6.77 7.28 17.18 49.65 50.37 26.13 30.60 29.07 14.27 35.63
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 PS273 34.13 21.83 39.23 35.90 34.70 49.25 49.08 45.78 51.58 ... 54.62 53.95 43.67 51.05 3.85 28.45 28.67 42.50 52.72 22.15
274 PS157 29.60 17.30 34.70 31.37 30.17 44.72 44.55 41.25 47.05 ... 51.27 50.60 40.32 46.52 6.22 27.22 24.13 37.95 49.38 17.62
275 PS274 31.17 18.38 36.28 32.95 31.73 46.30 46.13 42.82 48.63 ... 49.03 48.35 38.08 48.10 8.25 22.10 25.72 37.57 47.13 18.70
276 PS275 34.40 23.73 39.40 37.52 30.58 50.87 52.68 47.38 53.20 ... 47.82 47.13 36.87 52.67 12.30 22.65 32.75 36.35 45.92 25.22
277 PS276 37.80 25.52 42.90 39.58 38.37 52.93 52.77 49.45 55.27 ... 58.32 57.65 47.38 54.73 11.17 32.48 32.35 46.17 56.43 25.82

278 rows × 29 columns

3 sites, 28 possible locations

Now, we’ll solve this for 3 possible sites. We’ll also add some extra code to time how long this takes.

start = time.perf_counter()
solution_3 = problem.solve(p=3, objectives="p_median", show_brute_force_progress=True)
C:\lokigi\lokigi\site.py:1115: UserWarning: No candidate site dataframe was given.
Sites names have been taken from the columns of your travel matrix: clinic_1, clinic_2, clinic_3, clinic_4, clinic_5, clinic_6, clinic_7, clinic_8, clinic_9, clinic_10, clinic_11, clinic_12, clinic_13, clinic_14, clinic_15, clinic_16, clinic_17, clinic_18, clinic_19, clinic_20, clinic_21, clinic_22, clinic_23, clinic_24, clinic_25, clinic_26, clinic_27, clinic_28.
If you wish to override this, run .add_sites() to add your site dataframe before running .solve() again.
You can use the .show_sites_format() to see the expected format beforehand.
  warn(
end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_3))

We’ll also estimate how big the resulting solution object is.

We can see that lokigi has warned us that we didn’t pass in a candidate site dataframe. As we didn’t have anything extra we wanted to pass in for our sites, like their exact locations or whether certain sites had to be included in the solution, this is fine.

If we want, we can see what site dataframe lokigi has automatically created.

problem.show_sites()
index site
0 0 clinic_1
1 1 clinic_2
2 2 clinic_3
3 3 clinic_4
4 4 clinic_5
5 5 clinic_6
6 6 clinic_7
7 7 clinic_8
8 8 clinic_9
9 9 clinic_10
10 10 clinic_11
11 11 clinic_12
12 12 clinic_13
13 13 clinic_14
14 14 clinic_15
15 15 clinic_16
16 16 clinic_17
17 17 clinic_18
18 18 clinic_19
19 19 clinic_20
20 20 clinic_21
21 21 clinic_22
22 22 clinic_23
23 23 clinic_24
24 24 clinic_25
25 25 clinic_26
26 26 clinic_27
27 27 clinic_28

Now let’s look at our solutions.

solution_3.show_solutions()
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
0 None [0, 7, 11] None 11.12 14.57 26.78 36.18 0.0 sector sector_x  clinic_1  clinic_8  clini...
1 None [0, 9, 11] None 11.37 14.63 26.89 36.18 0.0 sector sector_x  clinic_1  clinic_10  clin...
2 None [4, 7, 11] None 11.51 14.45 25.52 32.37 0.0 sector sector_x  clinic_5  clinic_8  clini...
3 None [0, 11, 21] None 11.56 15.17 27.11 36.18 0.0 sector sector_x  clinic_1  clinic_12  clin...
4 None [0, 8, 11] None 11.58 15.23 27.25 36.18 0.0 sector sector_x  clinic_1  clinic_9  clini...
... ... ... ... ... ... ... ... ... ...
3271 None [13, 17, 19] None 35.23 29.30 45.23 54.35 0.0 sector sector_x  clinic_14  clinic_18  cli...
3272 None [13, 17, 18] None 35.51 29.53 45.70 54.35 0.0 sector sector_x  clinic_14  clinic_18  cli...
3273 None [17, 18, 19] None 35.83 30.34 45.74 54.35 0.0 sector sector_x  clinic_18  clinic_19  cli...
3274 None [13, 18, 19] None 36.13 30.03 46.66 57.93 0.0 sector sector_x  clinic_14  clinic_19  cli...
3275 None [18, 19, 26] None 42.30 41.74 61.69 68.77 0.0 sector sector_x  clinic_19  clinic_20  cli...

3276 rows × 9 columns

print(f"Evaluated combinations: {len(solution_3.show_solutions())}")
Evaluated combinations: 3276

We can see that even with 20475 possible combinations, lokigi didn’t take that long to run.

print(f"Elapsed time: {elapsed_time:0.1f} seconds")
Elapsed time: 8.9 seconds
print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")
The solution object is 65.2mb in size

4 sites, 28 possible locations

Now let’s repeat it for a much larger number of combinations.

start = time.perf_counter()
solution_4 = problem.solve(p=4, objectives="p_median", show_brute_force_progress=True)
end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_4))
print(f"Evaluated combinations: {len(solution_4.show_solutions())}")
Evaluated combinations: 20475
print(f"Elapsed time: {elapsed_time:0.1f} seconds")
Elapsed time: 55.1 seconds

Lokigi will warn you if you’re going to exceed the recommended limit of {python} lokigi.site.BRUTE_FORCE_WARN_THRESHOLD and will refuse to run if {python} lokigi.site.BRUTE_FORCE_LIMIT is reached, trying to limit the risk of you locking up your computer with an output that will be too large and long-running for it to handle. However, you can pass ignore_brute_force_limit=True to the .solve() method to overrule this.

We can see that this object is now much larger.

print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")
The solution object is 452.5mb in size

6 sites, 28 possible locations

Let’s repeat this one more time for a very large number of combinations.

start = time.perf_counter()
solution_6 = problem.solve(p=6, objectives="p_median", show_brute_force_progress=True)
C:\lokigi\lokigi\mixins\site_solvers.py:60: UserWarning: You are trying to evaluate 376,740 combinations via brute force. The recommended maximum is 75,000. This could take a while! You may wish to try a different solver.
  warn(
end = time.perf_counter()
elapsed_time = end - start

Calculating the size of the solution object takes another 10 minutes so we won’t do it here - but from previous experiments, it comes in at over 8 gigabytes!

print(f"Evaluated combinations: {len(solution_6.show_solutions())}")
print(f"Elapsed time: {elapsed_time:0.1f} seconds")
Evaluated combinations: 376740
Elapsed time: 1035.5 seconds

Keeping only the best and worst solutions

We can see that this isn’t really a sustainable option!

We will run this again, but this time keeping only the best 100 and worst 1 instance - though for the sake of speed, we’ll return to evaluating all possible combinations of 3 sites here.

solution_3_top_100 = problem.solve(
    p=3, objectives="p_median",
    show_brute_force_progress=True,
    brute_force_keep_best_n=100,
    brute_force_keep_worst_n=None
    )
solution_3_top_100.show_solutions()
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
0 None [0, 7, 11] None 11.12 14.57 26.78 36.18 0.0 sector sector_x  clinic_1  clinic_8  clini...
1 None [0, 9, 11] None 11.37 14.63 26.89 36.18 0.0 sector sector_x  clinic_1  clinic_10  clin...
2 None [4, 7, 11] None 11.51 14.45 25.52 32.37 0.0 sector sector_x  clinic_5  clinic_8  clini...
3 None [0, 11, 21] None 11.56 15.17 27.11 36.18 0.0 sector sector_x  clinic_1  clinic_12  clin...
4 None [0, 8, 11] None 11.58 15.23 27.25 36.18 0.0 sector sector_x  clinic_1  clinic_9  clini...
... ... ... ... ... ... ... ... ... ...
95 None [4, 17, 21] None 13.17 16.47 29.08 42.92 0.0 sector sector_x  clinic_5  clinic_18  clin...
96 None [3, 21, 27] None 13.18 18.14 32.86 40.95 0.0 sector sector_x  clinic_4  clinic_22  clin...
97 None [4, 8, 17] None 13.19 16.54 29.08 42.92 0.0 sector sector_x  clinic_5  clinic_9  clini...
98 None [3, 9, 15] None 13.20 17.48 33.35 44.13 0.0 sector sector_x  clinic_4  clinic_10  clin...
99 None [0, 1, 21] None 13.20 18.04 32.37 39.75 0.0 sector sector_x  clinic_1  clinic_2  clini...

100 rows × 9 columns

We could also opt to keep some of the worst solutions for comparison.

solution_3_top_1_bottom_1 = problem.solve(
    p=3, objectives="p_median",
    show_brute_force_progress=True,
    brute_force_keep_best_n=1, brute_force_keep_worst_n=1
    )
solution_3_top_1_bottom_1.show_solutions()
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
0 None [0, 7, 11] None 11.12 14.57 26.78 36.18 0.0 sector sector_x  clinic_1  clinic_8  clini...
1 None [18, 19, 26] None 42.30 41.74 61.69 68.77 0.0 sector sector_x  clinic_19  clinic_20  cli...

We can confirm that these are the same rows as we got when we kept everything for the 3-site problem before, back at the start of this example.

solution_3.solution_df.head(1)
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
0 None [0, 7, 11] None 11.118049 14.572518 26.775 36.18 0.0 sector sector_x  clinic_1  clinic_8  clini...
solution_3.solution_df.tail(1)
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
3275 None [18, 19, 26] None 42.295713 41.738597 61.691 68.77 0.0 sector sector_x  clinic_19  clinic_20  cli...

This is much more efficient for larger sets. While it doesn’t significantly shorten the duration of the solving, the output object remains a much more manageable size, speeding up subsequent operations.

Let’s compare the best and worst solutions.

solution_3_top_1_bottom_1.plot_travel_time_distribution(
    top_n=2, title="Distribution of travel times - best and worst solution"
    )
Unable to display output for mime type(s): application/vnd.plotly.v1+json

Showing the top and bottom solution in this way really helps to highlight how much of a different it can make to have an optimal solution.

Back to top