Getting Started

The lokigi package is designed to make it easier to undertake location analysis problems.

In many organisations - but particularly in healthcare - there is a need to be able to find a near-optimum combination of possible sites to minimize an objective. For example, you may be aiming to open an additional site to deliver healthcare services from, wanting to achieve a goal like minimizing the average travel time for each individual.

Excellent packages exist for finding optimum solutions, such as spopt. However, in healthcare scenarios in particular, providing only the optimum solution is often not ideal as these decisions are extremely complex and costly, and stakeholders need to understand the pros and cons of a range of near-optimal solutions to make an informed decision.

Enter lokigi.

Setting up a site location problem

First, we need to import the SiteProblem class from the lokigi.site module.

from lokigi.site import SiteProblem

We will set up an instance of the SiteProblem class. Let’s just call this problem - but you could call it anything.

problem = SiteProblem()

There are four core types of data we can add to our problem object:

  • a travel time (or other ‘cost’) matrix
  • site locations *
  • per-region demand *
  • a geographic dataset representing the region being explored *

* indicates optional datasets

The data you provide should take the following format.

Remember - only the travel matrix is completely necessary. You can include any combination of the remaining data.

The columns in blue can contain any unit of geographical area. In the UK, this might be LSOA, OA, MSOA, postcode, postcode sector, and so on. The column names do not need to be consistent across the three datasets as you will specify which column contains that data in each case - in this example two use the name “LSOA” while the third uses “LSOA11NM”, for example.

Similarly, the columns in orange do not need to use the format shown. Your sites may have any names, as long as the names used in the site dataframe (if provided) are consistent with the names used for the columns in your travel matrix.

Adding a travel time (or other ‘cost’) matrix

The bare minimum we need to provide to be able to undertake a location problem is a travel matrix.

This is a grid of travel times - or another ‘cost’ parameter that you want to optimize for, like the distance.

The rows should represent where people would start their journey from.

The first column should contain the names of the places people would start their journey from. In location problems in a UK context, we often use something like LSOA or MSOA.

The remaining columns should represent where people would end their journey - i.e. all of your candidate sites. Here we will use names to represent each site; we can pass in exact coordinates or other details in a different step.

The values in each cell within your grid would be the cost parameter: the travel time, distance, or another option like CO2 emissions.

Here is an example dataset in the correct format.

LSOA Site 1 Site 2 Site 3
0 Brighton and Hove 027E 773.93 527.69 444.29
1 Brighton and Hove 027F 757.39 499.11 517.57
2 Brighton and Hove 027A 763.24 601.38 517.98
3 Brighton and Hove 029E 743.62 651.72 660.36
4 Brighton and Hove 029D 665.85 664.65 658.20

Here, someone travelling from the LSOA Brighton and Hove 027E to site 3 would have a travel time of 7.4 minutes, versus 12.9 minutes if they went to site 1.

Tip

Want to know about different ways of organising geographic information in the UK?

Check out the subsection “What other ways do we determine areas within the UK” chapter of the HSMA geographic book.

We don’t have to use LSOAs for our rows, or site names for our columns - these could be anything! You might have postcode sectors for your rows and lat/long pairs for your columns, for example.

To add this data to our problem, we use the .add_travel_matrix() method of our SiteProblem instance.

problem.add_travel_matrix(
1    travel_matrix_df="../sample_data/brighton_travel_matrix_driving.csv",
2    source_col="LSOA",
3    from_unit="seconds",
4    to_unit="minutes"
    )
1
We can pass in either a pandas object or a link to a filepath (either on your computer or on the web).
2
We need to tell lokigi which column contains our ‘source’ data - i.e. the column that has the names of the locations people will be travelling from. It will then assume all other columns in the dataframe relate to the destinations; if this is not the case, the skip_cols parameter must be provided so that any irrelevant columns can be ignored.
3
We can optionally convert the values in our travel matrix. This travel matrix records travel times in seconds, so we use that as our ‘from_unit’…
4
… and we would prefer to display our times in minutes, so we provide that as the ‘to’ unit.
Tip

If we weren’t doing a conversion, we can just pass the ‘unit’ parameter to describe the unit used in our dataset, and this will then be picked up when we’re plotting our solution, so it’s worth specifying here.

If we’re doing a conversion, we don’t need to pass the ‘unit’ parameter too as lokigi will automatically set this using the ‘to_unit’.

Let’s now view our dataset.

problem.show_travel_matrix().head()
LSOA Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
0 Brighton and Hove 027E 12.898833 8.794833 7.404833 8.197500 10.125667 9.248500
1 Brighton and Hove 027F 12.623167 8.318500 8.626167 9.351167 9.649500 8.972833
2 Brighton and Hove 027A 12.720667 10.023000 8.633000 6.840000 11.353833 9.289167
3 Brighton and Hove 029E 12.393667 10.862000 11.006000 6.328667 12.193000 9.293000
4 Brighton and Hove 029D 11.097500 11.077500 10.970000 5.216667 12.408333 9.508500

We can see that we the travel times for every LSOA to 6 possible sites.

Note

At present, lokigi assumes you always want to minimize the cost parameter - i.e. you want people to travel the shortest distance or have the shortest travel time, or undertake journeys that emit the least amount of CO2.

If you need lokigi to be able to support maximising a cost objective, please raise an issue on the repository: https://github.com/hsma-tools/lokigi/issues

Tip

Lokigi doesn’t handle generating travel/cost matrices. It assumes you have already got this data to feed into it.

Interested in generating your own travel matrices in this format?

Check out the “Lookup Up Travel Times Using APIs” chapter of the HSMA geographic book.

Solving a simple problem

If we don’t have any other requirements, we can solve our problem right now!

Let’s solve by minimizing the average travel time from all ‘sources’ to all ‘destinations’ - which we can call a “simple p-median” problem.

We’ll assume we want to find the best possible combination of any 3 sites.

solutions = problem.solve(p=3, objectives="simple_p_median")
/__w/lokigi/lokigi/lokigi/site.py:1370: UserWarning: No demand data was provided. Demand from all regions has been assumed to be equal.If you wish to override this, run .add_demand() to add your site dataframe before running .solve() again.You can use the .show_demand_format() to see the expected format beforehand.
  warn(
/__w/lokigi/lokigi/lokigi/site.py:1378: UserWarning: No candidate site dataframe was given.
Sites names have been taken from the columns of your travel matrix: Site 1, Site 2, Site 3, Site 4, Site 5, Site 6.
If you wish to override this, run .add_sites() to add your site dataframe before running .solve() again.
You can use the .show_sites_format() to see the expected format beforehand.
  warn(

This returns an object with the lokigi class SiteSolutionSet.

solutions
<lokigi.site_solutions.SiteSolutionSet at 0x7f894a3d3d10>
solutions.show_solutions()
site_names site_indices coverage_threshold weighted_average unweighted_average 90th_percentile max proportion_within_coverage_threshold problem_df
0 None [2, 3, 5] None 5.36 5.36 8.06 16.69 0.0 LSOA                  L...
1 None [2, 3, 4] None 5.45 5.45 8.50 16.69 0.0 LSOA                  L...
2 None [1, 2, 3] None 5.59 5.59 9.00 16.69 0.0 LSOA                  L...
3 None [0, 2, 3] None 5.67 5.67 9.36 16.69 0.0 LSOA                  L...
4 None [0, 2, 5] None 6.21 6.21 9.33 16.69 0.0 LSOA                  L...
5 None [2, 4, 5] None 6.29 6.29 9.26 16.69 0.0 LSOA                  L...
6 None [1, 2, 5] None 6.31 6.31 9.45 16.69 0.0 LSOA                  L...
7 None [0, 2, 4] None 6.32 6.32 9.70 16.69 0.0 LSOA                  L...
8 None [0, 1, 2] None 6.39 6.39 9.77 16.69 0.0 LSOA                  L...
9 None [1, 3, 5] None 6.64 6.64 11.58 22.86 0.0 LSOA                  L...
10 None [3, 4, 5] None 6.73 6.73 12.26 21.71 0.0 LSOA                  L...
11 None [1, 3, 4] None 6.76 6.76 11.32 21.71 0.0 LSOA                  L...
12 None [0, 3, 4] None 6.81 6.81 12.26 21.71 0.0 LSOA                  L...
13 None [0, 1, 3] None 6.92 6.92 11.93 23.92 0.0 LSOA                  L...
14 None [1, 2, 4] None 7.46 7.46 11.57 16.69 0.0 LSOA                  L...
15 None [0, 1, 5] None 7.54 7.54 11.90 22.86 0.0 LSOA                  L...
16 None [1, 4, 5] None 7.65 7.65 11.67 21.71 0.0 LSOA                  L...
17 None [0, 1, 4] None 7.66 7.66 11.67 21.71 0.0 LSOA                  L...
18 None [0, 3, 5] None 7.69 7.69 13.72 22.86 0.0 LSOA                  L...
19 None [0, 4, 5] None 7.70 7.70 12.65 21.71 0.0 LSOA                  L...

We can produce a bar plot of the solutions, showing the variation between the best and worst.

solutions.plot_n_best_combinations_bar()

Adding in geographic data

If we have a geojson, shp or geopackage file that represents the areas we are looking at, we can pass this in as well.

Here, we are passing in a geojson that contains the

problem.add_region_geometry_layer(
1    region_geometry_df="https://github.com/hsma-programme/h6_3d_facility_location_problems/raw/refs/heads/main/h6_3d_facility_location_problems/example_code/LSOA_2011_Boundaries_Super_Generalised_Clipped_BSC_EW_V4.geojson",
2    common_col="LSOA11NM"
    )
1
We pass in the pass to a geojson, shp or geopackage file. This can be located locally or on the web. If provided with a standard github link, it will automatically try to create the appropriate ‘raw’ link that it can download the file directly from.
2
We pass in the name of the column in this geojson that should be used when trying to join to the other datasets - i.e. the column that acts as a bridge between the rows in our travel matrix and this geographic data. In our travel matrix, recall we have a column called “LSOA”. This contains the LSOA names in the same format as the “LSOA11NM” in our geographic dataset - so this is what we provide as our ‘common_col’.

Let’s take a look at the first few rows of this geographic data.

problem.show_region_geometry_layer().head(5)
FID LSOA11CD LSOA11NM LSOA11NMW BNG_E BNG_N LONG LAT GlobalID geometry
0 1 E01000001 City of London 001A City of London 001A 532123 181632 -0.097140 51.51816 a758442e-7679-45d0-95a8-ed4c968ecdaa POLYGON ((532282.629 181906.496, 532248.25 181...
1 2 E01000002 City of London 001B City of London 001B 532480 181715 -0.091970 51.51882 861dbb53-dfaf-4f57-be96-4527e2ec511f POLYGON ((532746.814 181786.892, 532248.25 181...
2 3 E01000003 City of London 001C City of London 001C 532239 182033 -0.095320 51.52174 9f765b55-2061-484a-862b-fa0325991616 POLYGON ((532293.068 182068.422, 532419.592 18...
3 4 E01000005 City of London 001E City of London 001E 533581 181283 -0.076270 51.51468 a55c4c31-ef1c-42fc-bfa9-07c8f2025928 POLYGON ((533604.245 181418.129, 533743.689 18...
4 5 E01000006 Barking and Dagenham 016A Barking and Dagenham 016A 544994 184274 0.089317 51.53875 9cdabaa8-d9bd-4a94-bb3b-98a933ceedad POLYGON ((545271.918 184183.948, 545296.314 18...

However, what’s more useful is plotting it.

problem.plot_region_geometry_layer()

This becomes more useful when we then solve the problem again.

solutions = problem.solve(p=3, objectives="simple_p_median")

We can now plot the solution.

solutions.plot_best_combination()

Adding in site data

However, it would be more useful if we could see the sites on the map.

Let’s load in a dataset containing the locations of our sites.

problem.add_sites(
    candidate_site_df="../sample_data/brighton_sites.geojson",
    candidate_id_col="site"
    )
Note

As we’re passing in a dataset in a recognised geographic data format, it will look for a ‘geography’ column automatically. Alternatively, we could pass in a pandas dataframe or csv if it contains lat/long or eastings/northings, for example, specifying the ‘vertical’ geometry column and the ‘horizontal’ geometry column.

Let’s take a look at this data.

problem.show_sites()
index site geometry
0 0 Site 1 POINT (527142.275 106616.053)
1 1 Site 2 POINT (531493.995 106639.488)
2 2 Site 3 POINT (533356.778 105476.782)
3 3 Site 4 POINT (528513.424 105052.43)
4 4 Site 5 POINT (532421.163 109069.196)
5 5 Site 6 POINT (528716.452 108042.794)

Let’s also plot this data.

problem.plot_sites()

Now let’s solve the problem again, this time noticing that we can plot the site data.

solutions = problem.solve(p=3, objectives="simple_p_median")

We can now plot the solution. This time, we’ll plot the 6 best solutions.

solutions.plot_n_best_combinations(n_best=5)
(<Figure size 2880x576 with 6 Axes>,
 array([<Axes: title={'center': 'Unweighted Average: 5.4 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Unweighted Average: 5.4 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Unweighted Average: 5.6 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Unweighted Average: 5.7 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Unweighted Average: 6.2 minutes \nMaximum: 16.7 minutes'}>],
       dtype=object))

Demand Data

Finally, we can add in demand data.

With this, we can start exploring the ‘standard’ p-median problems, where the travel time is weighted by the number of people travelling from each region, which can support a more equitable solution.

Lokigi requires the demand data to contain a row per source region (i.e. where people will travel from).

problem.add_demand(
    demand_df="../sample_data/brighton_demand.csv",
    demand_col="demand",
    location_id_col="LSOA"
    )
problem.show_demand()
LSOA demand
0 Brighton and Hove 027E 3627
1 Brighton and Hove 027F 2323
2 Brighton and Hove 027A 2596
3 Brighton and Hove 029E 3132
4 Brighton and Hove 029D 2883
... ... ...
160 Brighton and Hove 012A 2497
161 Brighton and Hove 005C 2570
162 Brighton and Hove 012B 2051
163 Brighton and Hove 005A 1164
164 Brighton and Hove 005B 1097

165 rows × 2 columns

When we plot our region, this time we can look at the demand.

problem.plot_region_geometry_layer(plot_demand=True)

Now we can solve for a true p-median problem rather than a simplified one. This means that the weighted travel time - the travel time adjusted by the number of people travelling from each place - will be considered.

solutions = problem.solve(
    p=3,
1    objectives="p_median"
    )
1
Note that we have changed our objective from “simple_p_median” to “p_median” here. We haven’t had to change any other parts of our problem class, so we can easily run multiple different types of solver from our single problem class.
Tip

Want a more detailed explanation of weighted travel time? Take a look at the p-median problems subsection of the HSMA geographic book.

Let’s plot the solutions again.

solutions.plot_n_best_combinations(n_best=5)
(<Figure size 2880x576 with 6 Axes>,
 array([<Axes: title={'center': 'Weighted Average: 5.4 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Weighted Average: 5.4 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Weighted Average: 5.5 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Weighted Average: 5.5 minutes \nMaximum: 16.7 minutes'}>,
        <Axes: title={'center': 'Weighted Average: 6.3 minutes \nMaximum: 16.7 minutes'}>],
       dtype=object))

Recap - the data to add to your problem class

A more detailed breakdown of the allowed data is provided below.

Input Data Relevant Functions + mandatory parameters Optional/Mandatory Accepted Formats Required Columns Required Rows Cell Contents Commonalities
Travel/Cost Matrix

.add_travel_matrix(travel_matrix_df, source_col)

show_travel_matrix()

Mandatory
  • Pandas dataframe

  • Path to csv, parquet

One column should represent the

The remaining columns should be ‘Destinations’ (places people are travelling to) as columns

‘Sources’ (regions people are travelling from) as rows Cells are a figure to optimize on - e.g. travel time, distance, CO2 emissions | The ‘destinations’ column should have destinations in the same format as the region_geometry | | | The sources should match up with the site names used in the site locations dataset (if provided)
Region Geometry

.add_region_geometry_layer(region_geometry_df, common_col)

.show_region_geometry_layer()

.plot_region_geometry_layer()

Optional
  • Geopandas dataframe

  • Path to geojson, shp or geopackage

‘Geometry’, containing polygons or multipolygons of the regions of interest There should be a row for each ‘source’ region The rows in this dataframe should overlap with the rows in the ‘sources’ column of the travel/cost matrix
Site Locations

.add_sites(candidate_site_df, candidate_id_col)

.show_sites()

.plot_sites()

Optional
  • Pandas dataframe OR geopandas dataframe

  • Path to csv, parquet, geojson, shp or geopackage

A column representing the names of candidate sites

AND either

geometry containing points representing locations

OR

columns representing the positionof your sites. Defaults to looking for lat/long.

One row per candidate site

Site names

Locations of sites

Site names should match the names of the columns provided in the travel/cost matrix
Demand

.add_demand(demand_df, demand_col, location_id_col)

.show_demand()

Optional
  • Pandas dataframe

  • Path to csv or parquet

A column representing the regions

Another column representing the demand or other weighting factor

One row per region A demand figure or another value that should be used to weight travel times Row names should match the rows used for the travel matrix and region geometry

Summary - the key object types

SiteProblem

The SiteProblem object is used to store all relevant data for the problem.

You initialise an empty SiteProblem, then run the methods detailed above to add the relevant datasets to the object.

You then call the .solve() method to create the SiteSolutionSet containing the ranked solutions.

SiteSolutionSet

A SiteSolutionSet is created by calling .solve() on your SiteProblem.

This object contains lot of methods for plotting the solutions.

EvaluatedCombination

Developer use only

During the process of generating solutions, an EvaluatedCombination object will be created for every evaluated set of sites. This is not directly available to you after you have solved your problem.

Back to top