Data Science: Parsing Location Input Fields

For the recent #DesignInTech Report survey, I needed to clean up the city/country field that I had specified in a free-text input field.

It took me an hour to find a way to make the task happen, and to look at the initial result quick and dirty result here it is:

designintech heatmap of the world.jpg
This map shows where all the survey participants of the 2018 #DesignInTech Report generally came from.

I’ve figured out how to make the heatmap myself (from scratch) but made a quick sketch today with this nice service by pulling out the lat/long coordinates after I processed the data.

The trick to doing it is:

  1. Install the Python module for googlemaps.
  2. Get a Google API key for Maps.
  3. Pop the key into the code below which uses this doc for reference.
  4. Feed the code a list of text fields in your ‘cities.txt’ and delimited by newlines.
  5. Use the resulting ‘allcities_tab.txt’ tab-delimited output file any way you like.

Enjoy! —JM

[code language=”python”]

import googlemaps
from datetime import datetime

gmaps = googlemaps.Client(key=’YOUR API KEY OVER HERE’)
fp = open(“cities.txt”,”r”)
kk = fp.readlines()

mm = []
for k in kk:

def getcountry(ss):
for s in ss:
if ‘country’ in s[‘types’]:
return s[‘long_name’]
return ‘UNKNOWN’

def getlocality(ss):
for s in ss:
if ‘locality’ in s[‘types’]:
return s[‘long_name’]
return ‘UNKNOWN’

def getlevel1(ss):
for s in ss:
if ‘administrative_area_level_1’ in s[‘types’]:
return s[‘long_name’]
return ‘UNKNOWN’

def getinfo(m):
geocode_result = gmaps.geocode(m)
g = geocode_result[0]
addr = g[‘formatted_address’]
country = getcountry(g[‘address_components’])
city = getlocality(g[‘address_components’])
level1 = getlevel1(g[‘address_components’])
loclat = g[‘geometry’][‘location’][‘lat’]
loclng = g[‘geometry’][‘location’][‘lng’]
ss = “%s\t%s\t%s\t%s\t%s\n”%(loclat,loclng,country,level1,city)
print ss
return ss
print “got an error”
return “*********CHECK THIS: %s\n”%m

fp = open(“allcities_tabs.txt”,”w”)
i = 1
for m in mm:
print i,m
s = getinfo(m)
i += 1

print(“wrote file”)