The Irish Property Price Register – Geocoded to Small Areas

Data Download Links

In this post, I’ve added GPS coordinates to the Property Price Register (PPR) data from years 2012-2017 (approx 220k property sales). Read below to find the method used to generate these results, or just download the files here:

An example data extract is shown below. For visualisations in this post, I use the R ggplot2 library, the Google Fusion Table Maps, and the Power Map plugin for Excel!

Please let me know if the data is useful, and if you end up building anything great out of it! I’ll update the dataset when I can with the latest properties listed.

Data Example from geocoded property price register (PPR) data showing selected columns. There are 217k houses in the data set, and for each property, the sale date, address, latitude, longitude, small area ID, electoral district name, sale price, and other variables are available. The file can be downloaded above.

Note that there are some errors in the geocoding process documented below that are required reading for anyone performing analysis on this dataset.

map of dublin with all prices mapped and marked.
Google Fusion Table visualisation of geocoded property prices. Circles are prices below โ‚ฌ1 million in value. You can explore the data and map here.
Number of sales recorded per month in the property price register from 2012-2017
Median house sale price in Ireland per month from 2012 – 2017. Ireland is experiencing large rises in property prices over the last number of years.

This video was created with the remarkable Power Map plugin for Excel (of all the software!). This plugin is well worth a look if you want to make an impression in a quick and easy fashion, and you have data with location information well formatted.

The Property Price Register

An interesting data set for all Irish data scientists is the Irish Property Price Register. The property price register (PPR) records the price, address and date of sale on all residential properties which have been purchased in Ireland since 1 January 2010. The data searchable and freely downloadable to the public, forming, at this point, a repository of pricing information for 7 years worth of property sales.

property price register entries, non geocoded, with address and price listed.
List of properties as displayed on the Irish property price register.

The purpose of the register when launched was to โ€œprovide, on an ongoing basis, accurate prices of residential properties purchased at a particular dateโ€, and while, not without its critisms on accuracyย because the data is manually loaded, or the prices and addresses can be error prone.

One of the major limitations of the register is the information on each house sold, where houses are described only on a basic level with categories such as “second hand dwelling” and very rough size descriptions “greater than or equal to 38 sq metres and less than 125 sq metres”.

property price register data entry with sample data on price address, and description.
Sample entry on the property price register in Ireland. Detail on house types is limited to a price, date, address, and simple description.

Geocoding the PPR data

Google geocoding process

The PPR data is made much more useful with the addition of geocoded GPS coordinates, and furthermore with matching of these coordinates to CSO census small area and electoral district. With these matched, information on average house size, population demographics, family sizes and more can be approximated for each property sold.

The geocoding process was performed in Python, using the Google geocoding script detailed previously on this blog. The script was run on an Amazon instance with a free Google API key, and allowed to geocode 2,500 addresses a day, for a couple of weeks.

Geocode results

Over the entire dataset 2012 – 2017, there was a 93.4 % match rate, that is, 6.6 % of PPR addresses were returned from Google with no matching address found, and as such, no geocoded result. To improve this match rate, various methods of address correction or augmentation can be used before feeding the data into the geocoding script (if of interest).

Strangely, the rate of error is not constant from year to year, with a larger proportion of errors occurring in recent 2016 and 2017 sales.

Overall however, with some cleansing (see the problems below), an excellent data set for exploration of the spatial distribution of house pricing in Ireland becomes available.

bar chart of the number of property sales per county in Ireland.
Number of sales found per geocoded county result in the Property Price Register (PPR) data.
House and apartment prices in Ireland vary between city and countryside properties, with cities experiencing higher and faster rising prices.

Geocoding problems

The geocoding process is not perfect, and inaccuracies in encoding are intensified by Ireland’s baffling address formats and peculiarities, mainly outside of the main cities.

Note that the Google geocoded API returns different results to simply typing the address into the Google maps search.

Some of the issues found in this dataset include:

  • Some addresses in the PPR are not well formatted, and no results were returned at all by Google (approx 6% of cases).
  • Many addresses in Ireland are not unique, identified with just “<housename>, <town>”, and these result in only “approximate” location matches.
Accuracy of property price register (PPR) geocoding results as reported by Google. On examination, all rooftop results are not entirely precise but should provide a good match to Electoral District.
  • Many houses returned at the centre of cities / towns, rather than their exact location. An example of this is the address “10 Washington Street, South Circular Road, Dublin 8”, which is actually an invalid address, but gets geocoded to “Dublin, Ireland”. In this case, the results can be removed, along with the 2705 other addresses mapped to “Dublin, Ireland.” In some cases, approximate matches however will align at an electoral district level.
  • Badly formatted addresses and non-specific addresses are problems that plague anyone using location and address data in Ireland, which is screaming out for a functional postcode (Eircode tries its best, but is not without its issuesย and critics). In the geocoded data set, 95% of the input addresses are unique, but only 63% of the resulting output addresses.
  • Google thought that the location of approximately 400 houses were outside the borders of Ireland, returning addresses globally. The diagrams below show the extent of this issue:
map of the world with points outside of ireland showing gps errors
Some of the addresses in the Property Price Register result in non-Irish GPS points when passed through the Google geocoder.
map of Ireland with error points mapped.
Errors in Irish addresses correspond to the very edge of small area boundaries – coastal or border locations. These are mainly accuracy issues and are relatively infrequent.

Over the entire dataset 2012 – 2017, there was a 93.4 % match rate, that is, 6.6 % of PPR addresses were returned from Google with no matching address found, and as such, no geocoded result. To improve this match rate, various methods of address correction or augmentation can be used before feeding the data into the geocoding script (if of interest).

Strangely, the rate of error is not constant from year to year, with a larger proportion of errors occurring in recent 2016 and 2017 sales.

Matching to small area and electoral district

Electoral Divisions (EDs) are legally defined administrative areas in Ireland for which Small Area Population Statistics (SAPS) are published from the Census.ย There are 3,440 defined EDs in the State. A smaller division, “Small Areas” are areas of population generally comprising between 80 and 120 dwellings and are designed as the lowest level of geography for the compilation of statistics in line with data protection. See the CSO website for more, and the picture below from the SAPMAP application for the ED divisions in Dublin City.

Ireland is divided into Electoral Divisions and Small areas. This diagram shows Electoral Divisions (EDs) for Dublin city, there are statistics available for 3409 EDs in Ireland.

Once a GPS latitude and longitude is determined for each property sale, an R script was used to determine the relevant small area and electoral district for each GPS point. There are a few steps to this process:

  1. The polygon SHP files for small areas, downloaded from the Central Statistic Office are specified in Irish Grid coordinates. These maps can be converted to WGS84 GPS format using a projection in the open-source QGIS software (or download the GPS shp files from the links at the top of this post)
  2. The R library for spatial data (sp)ย is used to create a Spatial Points DataFrame.
  3. The same library has a function, “over()”, that can align spatial points to a SHP dataset containing polygons provided the projections are the same.
  4. Once the correctly overlapping polygons are found, relevant names for the small areas and electoral districts in Ireland can be assigned.

The script used to combine the datasets, load the SHP files, and to match the GPS coordinates to the SHP file polygons can be found on GitHub.

Geocoding, processing and visualising scripts

The entire process to generate these results, and potentially add additional sale data, uses two main scripts.

  1. Start with the Python geocoding script to get raw GPS coordinates from the addresses in the PPR.
  2. The assignment of small areas and electoral divisions is achieved by loading the small area SHP files into R, and using the over() function in the sp library. See the code extract below, and use the process.r file in GitHub.
# Now overlay the small areas from the census data
# load small area files - remember this needs to be in GPS form for matching.
map_data <- readShapePoly('Census2011_Small_Areas_generalised20m/small_areas_gps.shp')

# Assign a small area and electoral district to each property with a GPS coordinate.
# The assignment of points to polygons is done using the sp::over() function.
# Inputs are a SpatialPoints (house locations) set, and SpatialPolygons (boundary shapes)
spatial_points <- SpatialPointsDataFrame(coords = ppr_data[!is.na(latitude),.(longitude,latitude)], data=ppr_data[!is.na(latitude), .(input_string, postcode)])
polygon_overlap <- over(spatial_points, map_data)

# Now we can merge the Small Area / Electoral District IDs back onto the ppr_data.
ppr_data[!is.na(latitude), geo_county:=polygon_overlap$COUNTYNAME]
ppr_data$geo_county = str_replace(ppr_data$geo_county, pattern = " County", replacement = "")
ppr_data[!is.na(latitude), electoral_district:=polygon_overlap$EDNAME]
ppr_data[!is.na(latitude), electoral_district_id:=polygon_overlap$CSOED]
ppr_data[!is.na(latitude), region:=polygon_overlap$NUTS3NAME]
ppr_data[!is.na(latitude), small_area:=polygon_overlap$SMALL_AREA]

Visualisations in this post were completed using the R ggplot2 library primarily, the full scripts to create them are given in the GitHub repository.

Dublin area with property sale prices and electoral division boundaries marked. To create in R, you must use the fortify() function on your SHP files before using ggmap and ggplot2.
Property sale prices for the major cities Dublin, Cork, Galway, Limerick, and Waterford in Ireland.
The 10 Electoral Divisions with the highest median sale prices over the previous 5 years!

 

Other links

There has been some other geocoding and visualisation work published on the property price register data, but some of the links have fallen behind / are quite old. However – have a look at the details below if you are down a rabbit hole of PPR data!

 

Subscribe
Notify of

17 Comments
Inline Feedbacks
View all comments

Brilliant write up and learned a lot. Definitely plenty of food for thought, especially the Python and R portions. Thanks!

Great piece of work and easy to follow. One thing I was interested in knowing is whether its compliant with the terms and conditions set out by Google? By aggregating the data to the Census geography, does that get around any issues of storing the lat and long data, created by geo-coding the original addresses?

Thank you very much for adding the GPS coordinates to the Property Price Register (PPR) data for 2012-2017. It is true that geo-coding process is not perfect mainly due to Irelandโ€™s baffling address formats.

Wow so impressive. Iโ€™m a former data curator and weep at the โ€œIrish Property Price Registerโ€, what a mess.

The eircodes in the dataset are not fully formed where used, has anyone tried using the eircode lookup to build up this field as well?

Hi Shane, very impressive. I have searched for weeks now for an SAB-to-GPS mapping and I think I have just found it! I would like to use one of your outputs in my current research project and trust that I have your permission to do so.

How do you want me to cite your work? GitHub, this website, other?

Thanks again.

Thanks Shane. Great work. Yes, the OSI don’t seem to use GPS!

Hello Shane, very impressive work. However, I have a question on your use of Google API. You have around 200k properties, this means 200k unique addresses to geocode. With the free limit of 2,5k each day, wouldn’t this take 80 days? I have a similar task of batch geocoding more than 200k addresses, I was wondering, what is the best way to go around this? I am not experienced in API use since geocoding is instrumental to my research. I do not seem to understand the pricing on Google either. What do you think this would cost if you were to do it at once? Or in a couple of days.

Best,

Hi Shane, thank you a lot for sharing your geocoding results! I am working on visualizing the same data set using folium and was looking for an government API to geolocate the addresses but instead I already found the data ๐Ÿ™‚ Thanks a lot, this saved a lot of my time

Hi Shane,

This is really powerful stuff. With regards to the dataset, is there any chance you can run the Script for all of 2017 , 2018 and 2019, please?

I tried following the steps for your Python script, however I have only started coding in the last 8 months and it is quite above my proficiency (some day I will get there!). I would like to use the data for a college project @ TUI for predicting house prices in the Dublin Area.

Thank you and keep up the great publications.

Kr,
David

Hey Shane, I am also looking to use this for a project. Please let me know if you get a chance to update it.

Hi Shane,

Thanks for the data, not sure if its of any interest. I used the data and carried out k means clustering per per year in 100k price brackets only for Dublin. Image attached for 2017 300k to 400k, 9 clusters. not sure how the image will appear. Thanks again for the data

300000 euro to 400000 euro.jpeg