Geocode your addresses for free with Python and Google
For a recent project, I ported the “batch geocoding in R” script over to Python. The script allows geocoding of large numbers of string addresses to latitude and longitude values using the Google Maps Geocoding API. The Google Geocoding API is one of the most accurate geocoding APIs out there at the moment.
The script encodes addresses up to the daily geocoding limit each day, and then waits until Google refills your allowance before continuing. You can leave the script running on a remote server (I use Digital Ocean, where you can get a free $10 server with my referral link), and over the course of a week, encode nearly 20,000 addresses.
For Ireland, the Google Geocoder is also a sneaky way to get a large list of Eircode codes for string addresses that you may have. Google integrated Eircode information integrated with their mapping data in Ireland in September 2016.
Jump straight to the script here.
Geocoding API Limits
There are a few options with respect to Google and your API depending if you want results fast and are willing to pay, or if you are in no rush, and want to geocode for free:
- With no API key, the script will work at a slow rate of approx 1 request per second if you have no API key, up to the free limit of 2,500 geocoded addresses per day.
- Get a free API key to up the rate to 50 requests per second, but still limited to 2,500 per day. API keys are easily generated at the Google Developers API Console. You’ll need to get a “Google Maps Geocoding API”, find this, press enable, and then look under “credentials”.
- Associate a payment method or billing account with Google and your API key, and have limitless fast geocoding results at a rate of $0.50 per 1000 additional addresses beyond the free 2,500 per day.
Python Geocoding Script
The script uses Python 3, and can be downloaded with and some demonstration data on Github at “python batch geocoding” project. There’s a requirements.txt file that will allow you to construct a virtualenv around the script, and you can run it indefinitely over ssh on a remote server using the “screen” command (“Screen” allows you to run terminal commands while logged out of an ssh session – really useful if you use cloud servers).
Input Data
The script expects an input CSV file with a column that contains addresses. The default column name is “Address”, but you can specify different names in the configuration section of the script. You can create CSV files from Excel using using “Save As”->CSV. The sample data in the repository is the 2015 Property Price Register data for Ireland. Some additional preprocessing on addresses is performed to improve accuracy, adding County and Country level information. Remove or change these lines in the script as necessary!
Output Data
The script will take each address and geocode it using the Google APIs, returning:
- the matching latitude and longitude,
- the cleaned and formatted address from Google,
- postcode of the matched address / (eircode in Ireland)
- accuracy of the match,
- the “type” of the location – “street, neighbourhood, locality”
- google place ID,
- the number of results returned,
- the entire JSON response (see example below) from Google can be requested if there’s additional information that you’d like to parse yourself. Change this in the configuration section of the script.
Script Setup
To setup the script, optionally insert your API key, your input file name, input column name, and your output file name, then simply run the code with “python3 python_batch_geocode.py” and come back in (<total addresses>/2500) days! Each time the script hits the geocoding limit, it backs off for 30 minutes before trying again with Google.
Python Code Function
The script functionality is simple: there’s a central function “get_google_result()” that actually requests data from the Google API using the Python requests library, and then a wrapper around that starting at line 133 to handle data backup and geocoding query limits.
Reverse Geocoding
Reverse geocoding is the process of going from a set of GPS co-ordinates, and working out what the text address is. Google also provides an API for reverse geocoding, and the above script, with some edits, can be used with that API.
One of the blog’s readers, Joseph, has kindly put together this Github snippet with the adjustments!
Improving Geocoding Accuracy
There’s a number of tips and tricks to improve the accuracy of your geocoding efforts.
- Append additional contextual information if you have it. For instance, in the example here, I appended “, Ireland” to the address since I knew all of my addresses were Irish locations. I could have also done some work with the “County” field in my input data.
- Simple text processing before upload can improve accuracy. Replacing st. with Street, removing unusual characters, replacing brdg with bridge, sq with Square, etc., all leaves Google with less to guess and can help.
- Try and ensure your addresses are as well formed as possible, with commas in the right places separating “lines” of the address.
- You can parse and repair your address strings with specialised address parsing libraries in python – have a look at postal-address, us-address (for US addresses), and pyaddress which might help out.
Thanks for this. I was using the https://postcodes.io/ to do some geocoding in the UK. Out of ~24000 only 78 couldn’t be found so the Google API and this awesome code helped to get the rest. Thanks!
Thanks. what if I want to scrap website and also phone#?
Instead of using a CSV file, could you connect to a MariaDB / MySQL?
Hi, I am trying to get this script to work, however, have some problems.
Failure Message: “ConnectionError: Problem with test results from Google Geocode – check your API key and internet connection.”
I think, both API key and internet connection are ok.
What is wrong?
Thanks so much for your support.
Regards
Alexander
Try enable billing: ” To use the Geocoding API, you must include an API key with all API requests and you must enable billing on each of your projects.”
I cannot thank you enough. Awesome!
I love you. This is amazing!
Also, if anyone encounters same issue as Alexander, try changing:
geocode_url = “https://maps.googleapis.com/maps/api/geocode/json?address={}”.format(address)
to:
geocode_url = “https://maps.googleapis.com/maps/api/geocode/json?address={}&key=[PUT_YOUR_API_KEY_HERE]”.format(address)
No problem Bobika – glad that I could help and that you found the geocoding script useful! I will update the blog post to incorporate your comments actually! thanks!
Hi Would like to enter a postcode/eircode and get back co-ordinates lat/long . Is this possible? The link to Josephs reverse geocoding script is broken
Very nice. Thank you for this, it helped me load coordinates quickly for medical centers to map online.
Please update the link to Joseph’s repo for reverse geocoding: https://github.com/jdeferio/Reverse_Geocode/blob/master/Code.py
I am very new to python and GIS, but learning for my professional growth, any idea why I am getting an error on line 127:
test_result = get_google_results(“London, England”, API_KEY, RETURN_FULL_RESULTS)
The error is name ‘get_google_results’ is not defined
I guess because you didn’t run the function definition before. You need to run the whole script.
HI, I am finding the error as &key=[‘xyz’]”.format(address)
^
SyntaxError: invalid character in identifier in my key . when changed the url address suggested by Bobika.
eocode_url = “https://maps.googleapis.com/maps/api/geocode/json?address={}&key=[PASTE HERE YOUR API KEY]”.format(address)
hi, I get problem with my key i dont know the error is SyntaxError: invalid character in identifier
Thanks so much Shane, this has been a real godsend!!
No problem Niamh, I’m glad it’s useful!
Great intro to geocoding in python! Thanks for sharing
This is wonderful, thank you. Was able to do it following your code.
[…] to GPS co-ordinates). These APIs are used by people who are building websites or apps that need to change location strings into GPS co-ordinates. On one of these sites, you can imagine filling in your address, and int he background […]
Has anything changed regarding the free 2,500 addresses limit per day without API key?
I just ran the script and am getting this error:
Could not get the addresses: Your request was denied.
why is give me this error
Code no longer works. Returns this error every time.