Danny Dorling and the GeoPandas

I recently picked up a copy of Danny Dorling’s So You Think You Know About Britain (Dorling, 2011) from a second-hand bookshop and it inspired me to explore some demographic data using Python and, in particular, GeoPandas.

The book discusses topics such as life expectancy, gender imbalances in the population, the concept of “optimum population”, migration, an aging population and aims to dispel the many myths that surround these matters.

I downloaded household deprivation data for Greater Manchester (GM) for the 2011 Census from the InFuse website. Using QGIS, the tabular data was combined  with the Middle Super Output Area (MSOA) geometry download from the same site. The output is a Shapefile of MSOAs in GM, with deprivation attributes.

GeoPandas simply provides spatial extensions to Pandas and Python and was installed following the instructions in a blog article. The aim was to produce maps of deprivation for each of the ten districts in Greater Manchester. The script is as follows:

import geopandas as gpd
import matplotlib.pyplot as plt
import pysal
sourceDataPath = "C:/test/"
sourceDataFile = "GM_MSOA_deprivation_2011.shp"
# create a geopandas geodataframe...
sourceTable = gpd.read_file(sourceDataPath + sourceDataFile)
attributes = {'Total households':'F996','Household is deprived in 4 dimensions':'F1001'}
print ( sourceDataFile + " has " + str(len(sourceTable)) + " records" )
sourceTable['area'] = sourceTable['geometry'].area / 10**6 # area in km squared
for attribute in attributes:
    sourceTable[attributes[attribute] + '_density'] = sourceTable[attributes[attribute]] / sourceTable['area']
sourceTable['district'] = sourceTable['geo_label'].str.split(" ").str[0] 
districts = sourceTable['district'].unique()
plotNum = 1
for district in districts:
    # get data for this district only
    districtTable = sourceTable[sourceTable['district'] == district]    
    for attribute in attributes:    
        districtTable.plot(column=attributes[attribute] + '_density',cmap='Reds',scheme='fisher_jenks',edgecolor='black') 
        title = attribute + " in " + district
        plt.savefig("c:/test/" + title + ".png")
        plotNum += 1
plt.show() # NB: call this just once

Among other things, GeoPandas can read a Shapefile into a dataframe  (line 7). The Shapefile has two attributes, total households, and the number of households deprived in 4 dimensions (related to Employment, Health and Disability, Overcrowding and Education).

GeoPandas can also perform spatial calculations, such as calculating areas of polygons (line 10). This capability was used to the calculate the density of households for the two attributes (lines 11 and 12).

The next step was to slice the dataset by district, since we want one set of maps for each of the ten districts in GM. This is  not straightforward since we have to extract the district name using a split on the geo_label attribute:

Manchester 030
Salford 015
Salford 020

This is done in line 13 and adds an extra column to the dataframe, but note this technique would not work where we had local authority names with multiple words, e.g. Tyne and Wear.

The 10 unique district names are finally extracted in line 14.

The districts are cycled through. Firstly we extract only those rows for the current district (line 18). We then loop through the two census attributes (line 19) to produce a plot of that attribute using the matplotlib library. (lines 20 to 25). In addition, we generate a PNG file for each plot (line 24).  As an extra, we use the pysal library to use natural breaks (the Fisher-Jenks technique) to classify the census data rather than the default equal intervals (line 20).

Finally we send the plots to the screen (line 26). The following two images show the outputs for just one of the districts, Oldham, but the script will produce plots for all 10 districts.

Total households in Oldham
Household is deprived in 4 dimensions in Oldham

As you can see, GeoPandas and the matplotlib library allow us to produce maps from a Shapefile, very efficiently, with just a few lines of code. One thing missing are some place names to provide context (but that’s a subject for another post).

The last word goes to Professor Dorling, who writes in the chapter on life expectancy, that he was told by a Cambridge  academic that what matters most is not your life expectancy but “how much time you might get to share with those you love” (p37).

Dorling, D. (2011) So you think you know about Britain? Constable, UK (paperback).

Census data: Office for National Statistics (2017): 2011 Census aggregate data. UK Data Service (Edition: February 2017). DOI

Leave a Reply

Your email address will not be published. Required fields are marked *