Data Science: Parsing Location Input Fields

For the recent #DesignInTech Report survey, I needed to clean up the city/country field that I had specified in a free-text input field.

It took me an hour to find a way to make the task happen, and to look at the initial result quick and dirty result here it is:

designintech heatmap of the world.jpg
This map shows where all the survey participants of the 2018 #DesignInTech Report generally came from.

I’ve figured out how to make the heatmap myself (from scratch) but made a quick sketch today with this nice service by pulling out the lat/long coordinates after I processed the data.

The trick to doing it is:

  1. Install the Python module for googlemaps.
  2. Get a Google API key for Maps.
  3. Pop the key into the code below which uses this doc for reference.
  4. Feed the code a list of text fields in your ‘cities.txt’ and delimited by newlines.
  5. Use the resulting ‘allcities_tab.txt’ tab-delimited output file any way you like.

Enjoy! —JM


import googlemaps
from datetime import datetime

gmaps = googlemaps.Client(key='YOUR API KEY OVER HERE')
fp = open("cities.txt","r")
kk = fp.readlines()

mm = []
for k in kk:
	mm.append(k.strip())

def getcountry(ss):
	for s in ss:
		if 'country' in s['types']:
			return s['long_name']
	return 'UNKNOWN'

def getlocality(ss):
	for s in ss:
		if 'locality' in s['types']:
			return s['long_name']
	return 'UNKNOWN'

def getlevel1(ss):
	for s in ss:
		if 'administrative_area_level_1' in s['types']:
			return s['long_name']
	return 'UNKNOWN'

def getinfo(m):
	try:
		geocode_result = gmaps.geocode(m)
		g = geocode_result[0]
		addr = g['formatted_address']
		country = getcountry(g['address_components'])
		city = getlocality(g['address_components'])
		level1 = getlevel1(g['address_components'])
		loclat = g['geometry']['location']['lat']
		loclng = g['geometry']['location']['lng']
		ss = "%s\t%s\t%s\t%s\t%s\n"%(loclat,loclng,country,level1,city)
		print ss
		return ss
	except:
		print "got an error"
		return "*********CHECK THIS: %s\n"%m

fp = open("allcities_tabs.txt","w")
i = 1
for m in mm:
	print i,m
	s = getinfo(m)
<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>	fp.write(s.encode('utf8'))
	i += 1

fp.close()
print("wrote file")

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s