Solving larger problems - the limits of brute force

Brute force - evaluating every possible combination - can be suitable for smaller problems. However, it very quickly gets out of hand. Let’s explore that here.

We’ll start by defining a simple problem. Note that we don’t have any geometry for this one - only the demand and travel matrices. You can read more about this in the no geometry example page.

from lokigi.site import SiteProblem
import time
import pickle
import pandas as pd

First, let’s initialise our SiteProblem() and add the demand and travel data.

problem = SiteProblem()
problem.add_demand(
    "https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/sh_demand.csv",
    demand_col="n_patients", location_id_col="sector")
problem.add_travel_matrix(
    "https://github.com/health-data-science-OR/healthcare-logistics/blob/master/optimisation/data/clinic_car_travel_time.csv",
    source_col="sector")

problem.show_demand()

	sector	n_patients
0	PS1	3375
1	PS2	3338
2	PS3	2922
3	PS4	3191
4	PS5	3134
...	...	...
273	PS274	216
274	PS275	261
275	PS276	342
276	PS277	0
277	PS278	0

278 rows × 2 columns

problem.show_travel_matrix()

	sector	clinic_1	clinic_2	clinic_3	clinic_4	clinic_5	clinic_6	clinic_7	clinic_8	clinic_9	...	clinic_19	clinic_20	clinic_21	clinic_22	clinic_23	clinic_24	clinic_25	clinic_26	clinic_27	clinic_28
0	PS158	33.17	40.15	38.17	37.93	29.35	51.48	53.28	48.00	53.82	...	12.10	12.27	15.83	53.27	53.98	29.75	34.22	32.68	19.62	39.25
1	PS159	31.42	36.55	36.42	34.53	27.60	47.88	49.68	44.40	50.22	...	11.75	11.92	10.62	49.68	50.38	26.15	30.62	32.35	19.28	35.65
2	PS160	31.82	38.80	36.82	36.58	28.00	50.13	51.95	46.65	52.47	...	10.75	10.92	14.35	51.93	52.65	28.40	32.87	31.35	18.27	37.90
3	PS161	31.68	38.65	36.67	36.43	27.87	49.98	51.80	46.50	52.32	...	10.32	10.77	16.38	51.78	52.50	28.27	32.73	31.20	17.82	37.75
4	PS162	29.55	36.53	34.55	34.32	25.73	47.87	49.67	44.38	50.20	...	6.77	7.28	17.18	49.65	50.37	26.13	30.60	29.07	14.27	35.63
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
273	PS273	34.13	21.83	39.23	35.90	34.70	49.25	49.08	45.78	51.58	...	54.62	53.95	43.67	51.05	3.85	28.45	28.67	42.50	52.72	22.15
274	PS157	29.60	17.30	34.70	31.37	30.17	44.72	44.55	41.25	47.05	...	51.27	50.60	40.32	46.52	6.22	27.22	24.13	37.95	49.38	17.62
275	PS274	31.17	18.38	36.28	32.95	31.73	46.30	46.13	42.82	48.63	...	49.03	48.35	38.08	48.10	8.25	22.10	25.72	37.57	47.13	18.70
276	PS275	34.40	23.73	39.40	37.52	30.58	50.87	52.68	47.38	53.20	...	47.82	47.13	36.87	52.67	12.30	22.65	32.75	36.35	45.92	25.22
277	PS276	37.80	25.52	42.90	39.58	38.37	52.93	52.77	49.45	55.27	...	58.32	57.65	47.38	54.73	11.17	32.48	32.35	46.17	56.43	25.82

278 rows × 29 columns

3 sites, 28 possible locations

Now, we’ll solve this for 3 possible sites. We’ll also add some extra code to time how long this takes.

start = time.perf_counter()
solution_3 = problem.solve(p=3, objectives="p_median", show_brute_force_progress=True)

C:\lokigi\lokigi\site.py:1115: UserWarning: No candidate site dataframe was given.
Sites names have been taken from the columns of your travel matrix: clinic_1, clinic_2, clinic_3, clinic_4, clinic_5, clinic_6, clinic_7, clinic_8, clinic_9, clinic_10, clinic_11, clinic_12, clinic_13, clinic_14, clinic_15, clinic_16, clinic_17, clinic_18, clinic_19, clinic_20, clinic_21, clinic_22, clinic_23, clinic_24, clinic_25, clinic_26, clinic_27, clinic_28.
If you wish to override this, run .add_sites() to add your site dataframe before running .solve() again.
You can use the .show_sites_format() to see the expected format beforehand.
  warn(

end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_3))

We’ll also estimate how big the resulting solution object is.

We can see that lokigi has warned us that we didn’t pass in a candidate site dataframe. As we didn’t have anything extra we wanted to pass in for our sites, like their exact locations or whether certain sites had to be included in the solution, this is fine.

If we want, we can see what site dataframe lokigi has automatically created.

problem.show_sites()

	index	site
0	0	clinic_1
1	1	clinic_2
2	2	clinic_3
3	3	clinic_4
4	4	clinic_5
5	5	clinic_6
6	6	clinic_7
7	7	clinic_8
8	8	clinic_9
9	9	clinic_10
10	10	clinic_11
11	11	clinic_12
12	12	clinic_13
13	13	clinic_14
14	14	clinic_15
15	15	clinic_16
16	16	clinic_17
17	17	clinic_18
18	18	clinic_19
19	19	clinic_20
20	20	clinic_21
21	21	clinic_22
22	22	clinic_23
23	23	clinic_24
24	24	clinic_25
25	25	clinic_26
26	26	clinic_27
27	27	clinic_28

Now let’s look at our solutions.

solution_3.show_solutions()

	site_names	site_indices	coverage_threshold	weighted_average	unweighted_average	90th_percentile	max	proportion_within_coverage_threshold	problem_df
0	None	[0, 7, 11]	None	11.12	14.57	26.78	36.18	0.0	sector sector_x clinic_1 clinic_8 clini...
1	None	[0, 9, 11]	None	11.37	14.63	26.89	36.18	0.0	sector sector_x clinic_1 clinic_10 clin...
2	None	[4, 7, 11]	None	11.51	14.45	25.52	32.37	0.0	sector sector_x clinic_5 clinic_8 clini...
3	None	[0, 11, 21]	None	11.56	15.17	27.11	36.18	0.0	sector sector_x clinic_1 clinic_12 clin...
4	None	[0, 8, 11]	None	11.58	15.23	27.25	36.18	0.0	sector sector_x clinic_1 clinic_9 clini...
...	...	...	...	...	...	...	...	...	...
3271	None	[13, 17, 19]	None	35.23	29.30	45.23	54.35	0.0	sector sector_x clinic_14 clinic_18 cli...
3272	None	[13, 17, 18]	None	35.51	29.53	45.70	54.35	0.0	sector sector_x clinic_14 clinic_18 cli...
3273	None	[17, 18, 19]	None	35.83	30.34	45.74	54.35	0.0	sector sector_x clinic_18 clinic_19 cli...
3274	None	[13, 18, 19]	None	36.13	30.03	46.66	57.93	0.0	sector sector_x clinic_14 clinic_19 cli...
3275	None	[18, 19, 26]	None	42.30	41.74	61.69	68.77	0.0	sector sector_x clinic_19 clinic_20 cli...

3276 rows × 9 columns

print(f"Evaluated combinations: {len(solution_3.show_solutions())}")

Evaluated combinations: 3276

We can see that even with 20475 possible combinations, lokigi didn’t take that long to run.

print(f"Elapsed time: {elapsed_time:0.1f} seconds")

Elapsed time: 8.9 seconds

print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")

The solution object is 65.2mb in size

4 sites, 28 possible locations

Now let’s repeat it for a much larger number of combinations.

start = time.perf_counter()
solution_4 = problem.solve(p=4, objectives="p_median", show_brute_force_progress=True)

end = time.perf_counter()
elapsed_time = end - start
size_estimate = len(pickle.dumps(solution_4))

print(f"Evaluated combinations: {len(solution_4.show_solutions())}")

Evaluated combinations: 20475

print(f"Elapsed time: {elapsed_time:0.1f} seconds")

Elapsed time: 55.1 seconds

Lokigi will warn you if you’re going to exceed the recommended limit of {python} lokigi.site.BRUTE_FORCE_WARN_THRESHOLD and will refuse to run if {python} lokigi.site.BRUTE_FORCE_LIMIT is reached, trying to limit the risk of you locking up your computer with an output that will be too large and long-running for it to handle. However, you can pass ignore_brute_force_limit=True to the .solve() method to overrule this.

We can see that this object is now much larger.

print(f"The solution object is {(size_estimate/1_000_000):.1f}mb in size")

The solution object is 452.5mb in size

6 sites, 28 possible locations

Let’s repeat this one more time for a very large number of combinations.

start = time.perf_counter()
solution_6 = problem.solve(p=6, objectives="p_median", show_brute_force_progress=True)

C:\lokigi\lokigi\mixins\site_solvers.py:60: UserWarning: You are trying to evaluate 376,740 combinations via brute force. The recommended maximum is 75,000. This could take a while! You may wish to try a different solver.
  warn(

end = time.perf_counter()
elapsed_time = end - start

Calculating the size of the solution object takes another 10 minutes so we won’t do it here - but from previous experiments, it comes in at over 8 gigabytes!

print(f"Evaluated combinations: {len(solution_6.show_solutions())}")
print(f"Elapsed time: {elapsed_time:0.1f} seconds")

Evaluated combinations: 376740
Elapsed time: 1035.5 seconds

Keeping only the best and worst solutions

We can see that this isn’t really a sustainable option!

We will run this again, but this time keeping only the best 100 and worst 1 instance - though for the sake of speed, we’ll return to evaluating all possible combinations of 3 sites here.

solution_3_top_100 = problem.solve(
    p=3, objectives="p_median",
    show_brute_force_progress=True,
    brute_force_keep_best_n=100,
    brute_force_keep_worst_n=None
    )

solution_3_top_100.show_solutions()

	site_names	site_indices	coverage_threshold	weighted_average	unweighted_average	90th_percentile	max	proportion_within_coverage_threshold	problem_df
0	None	[0, 7, 11]	None	11.12	14.57	26.78	36.18	0.0	sector sector_x clinic_1 clinic_8 clini...
1	None	[0, 9, 11]	None	11.37	14.63	26.89	36.18	0.0	sector sector_x clinic_1 clinic_10 clin...
2	None	[4, 7, 11]	None	11.51	14.45	25.52	32.37	0.0	sector sector_x clinic_5 clinic_8 clini...
3	None	[0, 11, 21]	None	11.56	15.17	27.11	36.18	0.0	sector sector_x clinic_1 clinic_12 clin...
4	None	[0, 8, 11]	None	11.58	15.23	27.25	36.18	0.0	sector sector_x clinic_1 clinic_9 clini...
...	...	...	...	...	...	...	...	...	...
95	None	[4, 17, 21]	None	13.17	16.47	29.08	42.92	0.0	sector sector_x clinic_5 clinic_18 clin...
96	None	[3, 21, 27]	None	13.18	18.14	32.86	40.95	0.0	sector sector_x clinic_4 clinic_22 clin...
97	None	[4, 8, 17]	None	13.19	16.54	29.08	42.92	0.0	sector sector_x clinic_5 clinic_9 clini...
98	None	[3, 9, 15]	None	13.20	17.48	33.35	44.13	0.0	sector sector_x clinic_4 clinic_10 clin...
99	None	[0, 1, 21]	None	13.20	18.04	32.37	39.75	0.0	sector sector_x clinic_1 clinic_2 clini...

100 rows × 9 columns

We could also opt to keep some of the worst solutions for comparison.

solution_3_top_1_bottom_1 = problem.solve(
    p=3, objectives="p_median",
    show_brute_force_progress=True,
    brute_force_keep_best_n=1, brute_force_keep_worst_n=1
    )

solution_3_top_1_bottom_1.show_solutions()

	site_names	site_indices	coverage_threshold	weighted_average	unweighted_average	90th_percentile	max	proportion_within_coverage_threshold	problem_df
0	None	[0, 7, 11]	None	11.12	14.57	26.78	36.18	0.0	sector sector_x clinic_1 clinic_8 clini...
1	None	[18, 19, 26]	None	42.30	41.74	61.69	68.77	0.0	sector sector_x clinic_19 clinic_20 cli...

We can confirm that these are the same rows as we got when we kept everything for the 3-site problem before, back at the start of this example.

solution_3.solution_df.head(1)

	site_names	site_indices	coverage_threshold	weighted_average	unweighted_average	90th_percentile	max	proportion_within_coverage_threshold	problem_df
0	None	[0, 7, 11]	None	11.118049	14.572518	26.775	36.18	0.0	sector sector_x clinic_1 clinic_8 clini...

solution_3.solution_df.tail(1)

	site_names	site_indices	coverage_threshold	weighted_average	unweighted_average	90th_percentile	max	proportion_within_coverage_threshold	problem_df
3275	None	[18, 19, 26]	None	42.295713	41.738597	61.691	68.77	0.0	sector sector_x clinic_19 clinic_20 cli...

This is much more efficient for larger sets. While it doesn’t significantly shorten the duration of the solving, the output object remains a much more manageable size, speeding up subsequent operations.

Let’s compare the best and worst solutions.

solution_3_top_1_bottom_1.plot_travel_time_distribution(
    top_n=2, title="Distribution of travel times - best and worst solution"
    )

Unable to display output for mime type(s): application/vnd.plotly.v1+json

Showing the top and bottom solution in this way really helps to highlight how much of a different it can make to have an optimal solution.