from lokigi.site import SiteProblem
import time
import pickle
import pandas as pdSolving larger problems - the limits of brute force
Brute force - evaluating every possible combination - can be suitable for smaller problems. However, it very quickly gets out of hand. Let’s explore that here.
We’ll start by defining a simple problem. Note that we don’t have any geometry for this one - only the demand and travel matrices. You can read more about this in the no geometry example page.
First, let’s initialise our SiteProblem() and add the demand and travel data.
problem = SiteProblem()
problem.add_demand(
"https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/sh_demand.csv",
demand_col="n_patients", location_id_col="sector")
problem.add_travel_matrix(
"https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/clinic_car_travel_time.csv",
source_col="sector")problem.show_demand()| sector | n_patients | |
|---|---|---|
| 0 | PS1 | 3375 |
| 1 | PS2 | 3338 |
| 2 | PS3 | 2922 |
| 3 | PS4 | 3191 |
| 4 | PS5 | 3134 |
| ... | ... | ... |
| 273 | PS274 | 216 |
| 274 | PS275 | 261 |
| 275 | PS276 | 342 |
| 276 | PS277 | 0 |
| 277 | PS278 | 0 |
278 rows × 2 columns
problem.show_travel_matrix()| sector | clinic_1 | clinic_2 | clinic_3 | clinic_4 | clinic_5 | clinic_6 | clinic_7 | clinic_8 | clinic_9 | ... | clinic_19 | clinic_20 | clinic_21 | clinic_22 | clinic_23 | clinic_24 | clinic_25 | clinic_26 | clinic_27 | clinic_28 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | PS158 | 33.17 | 40.15 | 38.17 | 37.93 | 29.35 | 51.48 | 53.28 | 48.00 | 53.82 | ... | 12.10 | 12.27 | 15.83 | 53.27 | 53.98 | 29.75 | 34.22 | 32.68 | 19.62 | 39.25 |
| 1 | PS159 | 31.42 | 36.55 | 36.42 | 34.53 | 27.60 | 47.88 | 49.68 | 44.40 | 50.22 | ... | 11.75 | 11.92 | 10.62 | 49.68 | 50.38 | 26.15 | 30.62 | 32.35 | 19.28 | 35.65 |
| 2 | PS160 | 31.82 | 38.80 | 36.82 | 36.58 | 28.00 | 50.13 | 51.95 | 46.65 | 52.47 | ... | 10.75 | 10.92 | 14.35 | 51.93 | 52.65 | 28.40 | 32.87 | 31.35 | 18.27 | 37.90 |
| 3 | PS161 | 31.68 | 38.65 | 36.67 | 36.43 | 27.87 | 49.98 | 51.80 | 46.50 | 52.32 | ... | 10.32 | 10.77 | 16.38 | 51.78 | 52.50 | 28.27 | 32.73 | 31.20 | 17.82 | 37.75 |
| 4 | PS162 | 29.55 | 36.53 | 34.55 | 34.32 | 25.73 | 47.87 | 49.67 | 44.38 | 50.20 | ... | 6.77 | 7.28 | 17.18 | 49.65 | 50.37 | 26.13 | 30.60 | 29.07 | 14.27 | 35.63 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 273 | PS273 | 34.13 | 21.83 | 39.23 | 35.90 | 34.70 | 49.25 | 49.08 | 45.78 | 51.58 | ... | 54.62 | 53.95 | 43.67 | 51.05 | 3.85 | 28.45 | 28.67 | 42.50 | 52.72 | 22.15 |
| 274 | PS157 | 29.60 | 17.30 | 34.70 | 31.37 | 30.17 | 44.72 | 44.55 | 41.25 | 47.05 | ... | 51.27 | 50.60 | 40.32 | 46.52 | 6.22 | 27.22 | 24.13 | 37.95 | 49.38 | 17.62 |
| 275 | PS274 | 31.17 | 18.38 | 36.28 | 32.95 | 31.73 | 46.30 | 46.13 | 42.82 | 48.63 | ... | 49.03 | 48.35 | 38.08 | 48.10 | 8.25 | 22.10 | 25.72 | 37.57 | 47.13 | 18.70 |
| 276 | PS275 | 34.40 | 23.73 | 39.40 | 37.52 | 30.58 | 50.87 | 52.68 | 47.38 | 53.20 | ... | 47.82 | 47.13 | 36.87 | 52.67 | 12.30 | 22.65 | 32.75 | 36.35 | 45.92 | 25.22 |
| 277 | PS276 | 37.80 | 25.52 | 42.90 | 39.58 | 38.37 | 52.93 | 52.77 | 49.45 | 55.27 | ... | 58.32 | 57.65 | 47.38 | 54.73 | 11.17 | 32.48 | 32.35 | 46.17 | 56.43 | 25.82 |
278 rows × 29 columns
3 sites, 28 possible locations
Now, we’ll solve this for 3 possible sites. We’ll also add some extra code to time how long this takes.
start = time.perf_counter()
solution_3 = problem.solve(p=3, objectives="p_median", show_brute_force_progress=True)C:\lokigi\lokigi\site.py:1115: UserWarning: No candidate site dataframe was given.
Sites names have been taken from the columns of your travel matrix: clinic_1, clinic_2, clinic_3, clinic_4, clinic_5, clinic_6, clinic_7, clinic_8, clinic_9, clinic_10, clinic_11, clinic_12, clinic_13, clinic_14, clinic_15, clinic_16, clinic_17, clinic_18, clinic_19, clinic_20, clinic_21, clinic_22, clinic_23, clinic_24, clinic_25, clinic_26, clinic_27, clinic_28.
If you wish to override this, run .add_sites() to add your site dataframe before running .solve() again.
You can use the .show_sites_format() to see the expected format beforehand.
warn(
end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_3))We’ll also estimate how big the resulting solution object is.
We can see that lokigi has warned us that we didn’t pass in a candidate site dataframe. As we didn’t have anything extra we wanted to pass in for our sites, like their exact locations or whether certain sites had to be included in the solution, this is fine.
If we want, we can see what site dataframe lokigi has automatically created.
problem.show_sites()| index | site | |
|---|---|---|
| 0 | 0 | clinic_1 |
| 1 | 1 | clinic_2 |
| 2 | 2 | clinic_3 |
| 3 | 3 | clinic_4 |
| 4 | 4 | clinic_5 |
| 5 | 5 | clinic_6 |
| 6 | 6 | clinic_7 |
| 7 | 7 | clinic_8 |
| 8 | 8 | clinic_9 |
| 9 | 9 | clinic_10 |
| 10 | 10 | clinic_11 |
| 11 | 11 | clinic_12 |
| 12 | 12 | clinic_13 |
| 13 | 13 | clinic_14 |
| 14 | 14 | clinic_15 |
| 15 | 15 | clinic_16 |
| 16 | 16 | clinic_17 |
| 17 | 17 | clinic_18 |
| 18 | 18 | clinic_19 |
| 19 | 19 | clinic_20 |
| 20 | 20 | clinic_21 |
| 21 | 21 | clinic_22 |
| 22 | 22 | clinic_23 |
| 23 | 23 | clinic_24 |
| 24 | 24 | clinic_25 |
| 25 | 25 | clinic_26 |
| 26 | 26 | clinic_27 |
| 27 | 27 | clinic_28 |
Now let’s look at our solutions.
solution_3.show_solutions()| site_names | site_indices | coverage_threshold | weighted_average | unweighted_average | 90th_percentile | max | proportion_within_coverage_threshold | problem_df | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | None | [0, 7, 11] | None | 11.12 | 14.57 | 26.78 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_8 clini... |
| 1 | None | [0, 9, 11] | None | 11.37 | 14.63 | 26.89 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_10 clin... |
| 2 | None | [4, 7, 11] | None | 11.51 | 14.45 | 25.52 | 32.37 | 0.0 | sector sector_x clinic_5 clinic_8 clini... |
| 3 | None | [0, 11, 21] | None | 11.56 | 15.17 | 27.11 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_12 clin... |
| 4 | None | [0, 8, 11] | None | 11.58 | 15.23 | 27.25 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_9 clini... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3271 | None | [13, 17, 19] | None | 35.23 | 29.30 | 45.23 | 54.35 | 0.0 | sector sector_x clinic_14 clinic_18 cli... |
| 3272 | None | [13, 17, 18] | None | 35.51 | 29.53 | 45.70 | 54.35 | 0.0 | sector sector_x clinic_14 clinic_18 cli... |
| 3273 | None | [17, 18, 19] | None | 35.83 | 30.34 | 45.74 | 54.35 | 0.0 | sector sector_x clinic_18 clinic_19 cli... |
| 3274 | None | [13, 18, 19] | None | 36.13 | 30.03 | 46.66 | 57.93 | 0.0 | sector sector_x clinic_14 clinic_19 cli... |
| 3275 | None | [18, 19, 26] | None | 42.30 | 41.74 | 61.69 | 68.77 | 0.0 | sector sector_x clinic_19 clinic_20 cli... |
3276 rows × 9 columns
print(f"Evaluated combinations: {len(solution_3.show_solutions())}")Evaluated combinations: 3276
We can see that even with 20475 possible combinations, lokigi didn’t take that long to run.
print(f"Elapsed time: {elapsed_time:0.1f} seconds")Elapsed time: 8.9 seconds
print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")The solution object is 65.2mb in size
4 sites, 28 possible locations
Now let’s repeat it for a much larger number of combinations.
start = time.perf_counter()
solution_4 = problem.solve(p=4, objectives="p_median", show_brute_force_progress=True)end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_4))print(f"Evaluated combinations: {len(solution_4.show_solutions())}")Evaluated combinations: 20475
print(f"Elapsed time: {elapsed_time:0.1f} seconds")Elapsed time: 55.1 seconds
Lokigi will warn you if you’re going to exceed the recommended limit of {python} lokigi.site.BRUTE_FORCE_WARN_THRESHOLD and will refuse to run if {python} lokigi.site.BRUTE_FORCE_LIMIT is reached, trying to limit the risk of you locking up your computer with an output that will be too large and long-running for it to handle. However, you can pass ignore_brute_force_limit=True to the .solve() method to overrule this.
We can see that this object is now much larger.
print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")The solution object is 452.5mb in size
6 sites, 28 possible locations
Let’s repeat this one more time for a very large number of combinations.
start = time.perf_counter()
solution_6 = problem.solve(p=6, objectives="p_median", show_brute_force_progress=True)C:\lokigi\lokigi\mixins\site_solvers.py:60: UserWarning: You are trying to evaluate 376,740 combinations via brute force. The recommended maximum is 75,000. This could take a while! You may wish to try a different solver.
warn(
end = time.perf_counter()
elapsed_time = end - startCalculating the size of the solution object takes another 10 minutes so we won’t do it here - but from previous experiments, it comes in at over 8 gigabytes!
print(f"Evaluated combinations: {len(solution_6.show_solutions())}")
print(f"Elapsed time: {elapsed_time:0.1f} seconds")Evaluated combinations: 376740
Elapsed time: 1035.5 seconds
Keeping only the best and worst solutions
We can see that this isn’t really a sustainable option!
We will run this again, but this time keeping only the best 100 and worst 1 instance - though for the sake of speed, we’ll return to evaluating all possible combinations of 3 sites here.
solution_3_top_100 = problem.solve(
p=3, objectives="p_median",
show_brute_force_progress=True,
brute_force_keep_best_n=100,
brute_force_keep_worst_n=None
)solution_3_top_100.show_solutions()| site_names | site_indices | coverage_threshold | weighted_average | unweighted_average | 90th_percentile | max | proportion_within_coverage_threshold | problem_df | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | None | [0, 7, 11] | None | 11.12 | 14.57 | 26.78 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_8 clini... |
| 1 | None | [0, 9, 11] | None | 11.37 | 14.63 | 26.89 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_10 clin... |
| 2 | None | [4, 7, 11] | None | 11.51 | 14.45 | 25.52 | 32.37 | 0.0 | sector sector_x clinic_5 clinic_8 clini... |
| 3 | None | [0, 11, 21] | None | 11.56 | 15.17 | 27.11 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_12 clin... |
| 4 | None | [0, 8, 11] | None | 11.58 | 15.23 | 27.25 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_9 clini... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | None | [4, 17, 21] | None | 13.17 | 16.47 | 29.08 | 42.92 | 0.0 | sector sector_x clinic_5 clinic_18 clin... |
| 96 | None | [3, 21, 27] | None | 13.18 | 18.14 | 32.86 | 40.95 | 0.0 | sector sector_x clinic_4 clinic_22 clin... |
| 97 | None | [4, 8, 17] | None | 13.19 | 16.54 | 29.08 | 42.92 | 0.0 | sector sector_x clinic_5 clinic_9 clini... |
| 98 | None | [3, 9, 15] | None | 13.20 | 17.48 | 33.35 | 44.13 | 0.0 | sector sector_x clinic_4 clinic_10 clin... |
| 99 | None | [0, 1, 21] | None | 13.20 | 18.04 | 32.37 | 39.75 | 0.0 | sector sector_x clinic_1 clinic_2 clini... |
100 rows × 9 columns
We could also opt to keep some of the worst solutions for comparison.
solution_3_top_1_bottom_1 = problem.solve(
p=3, objectives="p_median",
show_brute_force_progress=True,
brute_force_keep_best_n=1, brute_force_keep_worst_n=1
)solution_3_top_1_bottom_1.show_solutions()| site_names | site_indices | coverage_threshold | weighted_average | unweighted_average | 90th_percentile | max | proportion_within_coverage_threshold | problem_df | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | None | [0, 7, 11] | None | 11.12 | 14.57 | 26.78 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_8 clini... |
| 1 | None | [18, 19, 26] | None | 42.30 | 41.74 | 61.69 | 68.77 | 0.0 | sector sector_x clinic_19 clinic_20 cli... |
We can confirm that these are the same rows as we got when we kept everything for the 3-site problem before, back at the start of this example.
solution_3.solution_df.head(1)| site_names | site_indices | coverage_threshold | weighted_average | unweighted_average | 90th_percentile | max | proportion_within_coverage_threshold | problem_df | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | None | [0, 7, 11] | None | 11.118049 | 14.572518 | 26.775 | 36.18 | 0.0 | sector sector_x clinic_1 clinic_8 clini... |
solution_3.solution_df.tail(1)| site_names | site_indices | coverage_threshold | weighted_average | unweighted_average | 90th_percentile | max | proportion_within_coverage_threshold | problem_df | |
|---|---|---|---|---|---|---|---|---|---|
| 3275 | None | [18, 19, 26] | None | 42.295713 | 41.738597 | 61.691 | 68.77 | 0.0 | sector sector_x clinic_19 clinic_20 cli... |
This is much more efficient for larger sets. While it doesn’t significantly shorten the duration of the solving, the output object remains a much more manageable size, speeding up subsequent operations.
Let’s compare the best and worst solutions.
solution_3_top_1_bottom_1.plot_travel_time_distribution(
top_n=2, title="Distribution of travel times - best and worst solution"
)Unable to display output for mime type(s): application/vnd.plotly.v1+json
Showing the top and bottom solution in this way really helps to highlight how much of a different it can make to have an optimal solution.