Project:

Wilderness Inquiry Mapping and Spatial Analysis System

INTRODUCTION

For more than 35 years, Wilderness Inquiry, a local non-profit organization, has been coordinating canoe, kayak, horse-pack, raft, hike, dogsled and safari experiences on public lands throughout the world. Wilderness Inquiry’s mission is to make the outdoors accessible to all–it is noble effort that I feel privileged to be apart of. Over the years I have served as trail guide, Webmaster, Director of Operations, and my current title is Technology Director. I now spend most of my time behind a desk developing and maintaining Wilderness Inquiry’s database but am rewarded every day living vicariously through hundreds of amazing trips and stories from people who have life changing experiences on the trail.

When I started at Wilderness Inquiry more than 10 years ago there was a National Geographic map of the world mounted on corkboard near the front door. The map had hundreds of different colored pushpins stuck in it. The pins were meticulously placed to represent where participants, donors, volunteers, and staff came from. It was an impactful map that showed Wilderness Inquiry’s extensive reach to people and places all over the world. I recall being very impressed by it until one day I asked my boss about it. “Take it down,” he said. “No one has maintained that thing in years, we’ve served 10,000 people since the last pin was updated.” He was right, and the reality was there wasn’t enough physical space to fit a majority of the pins we would need anyway. So the map came down... until now.

Wilderness Inquiry has a database burgeoning with information on destinations, itineraries, participants, donors, communications, and more but none of it is spatially referenced. Sure you can click a button to get directions to someone’s office and yes we provide Google Map links to event locations but while practical, none of these features provide real spatial insight into Wilderness Inquiry’s datasets. This year, I started to see examples of complex digital maps and began to learn the capacity of GIS programming­, and the conservations started. What if we could recreate that pin map electronically? What if we could generate it on the fly to examine subsets of data? What if we could show who travels to which destinations and where they come from? We could look for trends that would help focus future marketing efforts. Could we query everyone living within 50 miles from an event so we know who to send invitations to? Could we identify past participants who live in zip codes with high median household incomes to see if they would become donors and support our scholarship funds? The questions kept coming and the idea for a dynamic mapping interface and spatial analysis tool was born.

I set an initial goal of producing 3 types of maps based on subsets of people:

  1. All Contacts Pin Map: a map showing where donors, trip participants, staff, and volunteers come from.
  2. Zip Code Marketability Map: Thematic map showing marketability by zip code. The marketability will be calculated using a geoprocessing operation such as intersection of participant point features and zip code areas with ancillary demographic data.
  3. Where From, Where Go Map: Thematic map illustrating where people live correlated with where they chose to travel.

THE DATA

There are four categories of data required to build this mapping tool.  I will explore each one in this section. They are:

  • Proprietary data on the people and places of Wilderness Inquiry
  • Geocoded coordinates representing people and places (must be generated)
  • Google Earth interactive mapping platform
  • Census bureau demographic data

Currently Wilderness Inquiry has an extensive database built in FileMaker Pro. FileMaker is a powerful database platform that allows for rapid development of custom business solutions. Wilderness Inquiry’s FileMaker implementation is a homegrown customer relationship management system used to maintain relationships with about 40,000 people. There are trip participants, volunteers, donors, vendors, etc. from all over the country. The database is used for many other things–everything from donations to registrations, destinations to disability information, and images to equipment inventory but for the purpose this project we’ll focus on three tables: Contacts, Registrations, and Destinations. Information needed to produce the desired maps is or will be housed in one of these three (related) tables.

At the time of starting this project none of the information was georeferenced. The contacts table contains names, street addresses, and activity information for people that details who they are and how they have been involved. The registration table joins trip participants (from the contacts table) and dates of specific events and trips. It contains information on when a given trip occurred. The Destinations table contains information on where the trip occurred and what type of trip it was.

Latitude and longitude coordinates needed to be stored for contacts and destinations. I created new fields in each of these tables and wrote a Python script to accomplish the necessary geocoding (details in the next section). The next step was choosing a platform to map the newly georeferenced data. Ease of use, ease of deployment, data privacy, speed and cost were key factors in the selection process. Given that there was no money budgeted for this project, commercial products like ArcGIS were out of the question. I examined three open source alternatives: Quantum GIS, GRASS GIS, and Google Earth. Quantum and GRASS both play well with Python and offer extensive code libraries, PyQGIS and PyGRASS respectively. There are a lot of plug-ins available including one that allows you to combine Quantum and GRASS. While powerful, what I found when reading the documentation and attempting an install, is that Quantum was simply more than what was needed. The tasks at hand require a simple, yet engaging, informative mapping tool for an audience with little to no GIS experience. Google Earth, which happens to already be installed on most of my coworkers’ machines, fit the bill perfectly. Google Earth’s interface is intuitive, easy to use, and fast. It offers a robust application developer interface that is relatively easy to work with using Keyhole Markup Language (KML), a proprietary form of XML. It does not handle Python scripting directly (one could use Common Gateway Interface - CGI), but I quickly discovered writing KML using Python could be easily accomplished and the decision to use Google Earth was made.

Several questions to be answered using this new tool require demographic information in addition to proprietary and spatial data. For example, finding potential new donors by examining the median household incomes of the zip code areas in which they live or deciding if someone is deserving of a needs-based scholarship by examining the number of people on food-stamps or subsided housing in their neighborhood. This data is readily available from the US Census Bureau. Using http://factfinder2.census.gov I was able to customize and download a table by zip code tabulation area that contained all sorts of household income, poverty, and other demographic data from the 2010 census.

While not exactly GIS data, the following open source Python modules also played a critical role in the solution: PyFileMaker, GeoPy, and SimpleKML. PyFileMaker is an XML wrapper module that allows read and write access to FileMaker tables, records, and field values from within Python scripts in much the same way you would query, read, and write to a dBASE table with a cursor in ArcPy. GeoPy is a Python module that facilitates a couple basic geoprocessing functions, namely geocoding of street addresses and calculating Great-circle distances between points.

METHODS EXPLAINED

The implementation of this mapping tool first uses Python to geocode postal addresses and store the results in FileMaker Pro. It then queries and writes data to KML files which are ultimately displayed in Google Earth. In this section we’ll explore each one of these steps in detail.

Step 1) Geocoding

Geocoding a handful of addresses can be accomplished using one of dozens of websites; however, when you need to geocode more than 40,000 addresses options become much more limited. Research into free and paid geocoding services with application programming interfaces (API) led me GeoPy, an open source Python toolbox, which aids sending requests and interpreting responses from six different services. They are Google Maps, Yahoo! Maps, Microsoft’s Bing Maps, Geocoder.us, GeoNames, and MediaWiki pages. Each service has different requirements, limitations, capabilities, and end user license agreements (EULA). For example, Google Maps allows only 2000 requests per day with its free service and its EULA requires geocoded results to be displayed in its Google maps service–making it not an option for this project. After researching and testing all of the other options, Microsoft’s Bing Maps and Geocoder.us came out on top. Bing’s EULA requires creating an account but allows for 50,000 requests per day when used non-commercially (i.e. for education or non-profit under which Wilderness Inquiry qualifies). Another benefit of Bing maps is that it gracefully handles most PO Box addresses while other services often failed. As a backup option, Geocoder.us fit the bill with its only limitation being that requests must be throttled to one per 15 seconds.

The geocoder script (See Appendix A: FileMaker Geocoder Script Source Code) I wrote will first attempt to geocode the address using Bing Maps. If Bing fails then Geocoder.us is tried. If both fail then the script strips the street address and tries just the zip code with Bing which returns the zip code area centroid. Wilderness Inquiry has their mailing list scrubbed a couple times and because of this almost all addresses were correctly geocoded. A remaining 1% didn’t match usually because of typos or missing information–these records will be ignored until they can be manually cleaned up.

Lastly, in order to complete the Where From, Where Go Map, it was necessary to geocode Wilderness Inquiry’s trip destinations. Since there are relatively few active destinations, and they don’t change that often, writing a script wasn’t warranted. I manually geocoded the starting point for each destination using Google Earth.

Step 2) Constructing the Pin Map

Figure 1: The Contacts Pin Map (Color-coded pins represent different contact types)

With everything geocoded it was time to display the information in Google Earth using KML. The SimpleKML Python library played a critical role in helping to write standardized, usable KML files for Google Earth. The script (See Appendix B: Build Pin Map KML Script Source Code) queries the FileMaker database for all contacts with valid, geocoded addresses, and then loops through every record using SimpleKML to create a point feature for each person’s home location. The script takes about 90 seconds to build a map with 25,000 points.

Figure 2: Organizing of pins by folder, allowing layers to be turned on and off.

Organizing and styling the data was not easy. After much trial and error I figured out how to create a folder and shared style for each contact type. This allows each type of contact to be displayed using a unique color of pin and the ability to be turned on and off as a group (See Figure 2). To increase usability, the script uses additional data from the contact’s record to build a description linked to each pin. When clicked a call-out appears containing the contact’s name, type, address, and a hyperlink that takes you directly to their record in FileMaker Pro (See Figure 3).

Figure 3: Sample contact description including link to their record in FileMaker.

Step 3) Constructing the Marketability Map

The Marketability Map is a variation of the Pin Map described above that uses ancillary data to query the database for individuals living in areas of generalized wealth. To produce this map it was first necessary to obtain household income data from the US Census bureau and then relate this data to Wilderness Inquiry’s contacts using a relationship with the zip code as the primary key. Python code from the pin map script is modified to display participants only if they live in a zip code with an average household income greater than a user specified value. Finally, to see the zip code areas, the Google Earth layer More > US Government > Postal Code Boundaries must be turned on (See Figure 4)

Figure 4: Marketability map zoomed into an area of the twin cities metro showing only past participants who live in zip codes with mean household income levels above $100,000/year

Step 4) Constructing the Where From, Where Go Map

The Where From, Where Go Map is a powerfully illustrative map linking participants from Wilderness Inquiry trips to the destinations they chose to travel. The script which builds this map (See Appendix C: Build Where From, Where Go Map Source Code) is quite different than the previous two pin maps. First it queries Wilderness Inquiry’s registration table instead of the contacts table. The registration table links location data for both the participant and the destination. Using techniques analogous to the pin maps, points are plotted for the destination and the participant. Then, in a separate folder, linestring features are created linking the points together.

Figure 5: Where From, Where Go map displaying all trips for the year 2012

Figure 6: Yellowstone Backcountry trip, last 5 years.


Step 5) Query and map results based on great-circle distance from an address

This important feature allows analysis of contacts based on their spatial proximity to a user specified point. A common use case would be the need to send an invitation to everyone in our database that lives within 50 miles of a scheduled canoe workshop or promotional event. Since a majority of these events are held at our headquarters I decided to write a script that would calculate the distance from Wilderness Inquiry to every contact and permanently store the results in their record. I chose to store these distances to allow for faster queries. The second part allows the user to specify a starting address and distance threshold. In this script the distances must be calculated on the fly so it takes a bit longer but still produces results in less than a minute. I utilized the Python geocoder function to get the coordinates of the starting address but opted not to use GeoPy’s distance function. I was able to speed things up by using a custom function within FileMaker to compute great-circle distance between points using the Haversine formula. The Haversine formula is less accurate than the GeoPy distance function, because it is based on a spheroid model of earth as opposed to a geoid. The difference is not significant over small areas and since the function will primarily be used to examine close proximity (less than 100 miles) the Haversine function is plenty adequate. (See Appendix E: Distance from Wilderness Inquiry and Distance from User-specified Address Source Code)

RESULTS ANALYSIS

The maps resulting from the tools described above will prove extremely valuable to marketing, operations, fundraising and human resource efforts at Wilderness Inquiry. In this section, I’ll examine just a few of the ways this mapping tool will help benefit the mission of Wilderness Inquiry.

At most non-profits, marketing dollars are scarce, and Wilderness Inquiry is no exception. Marketing staff are frequently needing to make critical decisions regarding where to invest these limited resources. Using the contacts pin map and marketability pin map, staff can now see where existing customers live, where new customers are coming from, basic demographic info like household income levels, and more. They can determine what markets might be untapped and what markets might be worth investing more in. This information will become critical to choosing direct mail and email lists to focus on. Online and other media advertising can also be geographically focused to optimize spending on markets with higher probability of returns. For the wow factor, our comprehensive pin map can be printed and posted in our lobby again–or even displayed on our website as a version with personal data removed.

Operations and marketing staff will benefit from understanding patterns and trends that emerge on the Where From, Where Go map. For example, when looking at Yellowstone Backcountry trips over the last 5 years it is evident that no one living West of the Rockies signed up for this trip (See Figure 5). You can tell by significantly high numbers of participants from Washington, Oregon, and California that there are other trips popular with folks living in Western states but this isn’t one of them... don’t waste resources sending Yellowstone postcards to these places. Examining different years of trips as different layers is also possible by simply running the script for the different datasets and saving them to “My Places” in Google maps where you can hide and show them as layers. This will dramatically aid operations staff in making decisions about where and when to offer trips.

The biggest benefit of all may come from development staff being able to view, isolate, and target participants with means that could become donors. So often we have to take the shotgun approach and solicit so many people to get so few returns but with this tool we now have a high-powered sight on that shotgun. Excuse the gun metaphor but with this sight we can take fewer shots and hit more targets. This tool can easily be extended to include other demographic information such as poverty levels or race and ethnic background information, which would allow us to target our outreach and mission efforts. Human resources can also examine trends in where our staff members come from year after year and channel recruiting efforts accordingly.

The reality is the aforementioned uses are just the beginning. Once this mapping system is put into production more and more uses will arise. Additional demographic and proprietary data can be incorporated to help the number effective uses grow. 

PRESENTATION OF RESULTS

The preceding sections of this paper have attempted to describe the results this mapping system is capable of producing. It is, however, very difficult to describe an interactive mapping system in words...even screen captures cannot do it justice. Google Earth allows map users to explore data in a stunning 3-dimensional, spatially accurate context. Patterns, trends, and clusters emerge and inform. Custom layers allow maps to be filtered and refined. Google Earth’s default layers, which include political boundaries, photos, roads, buildings, and more, increase the utility of the maps for planning and understanding mapped phenomena. Meta data from Wilderness Inquiry’s database displayed in feature descriptions allows users to hone in on actual individuals and destinations instead of just generalized symbols (See Figure 8). In addition, I was able to make a legend describing the symbols used in Adobe Photoshop and place it in the lower left corner of the Pin Map in Google Earth using a KML overlay (See Figure 1).

Text Box: Figure 7: New Menu in FileMaker GUIFileMaker makes it easy to produce robust graphical user interfaces for relational databases quite easily. I built a new section, appropriately located within our contact table’s main menu, which allows users to initiate any of the aforementioned scripts with the click on the button. To visually entice users, and provide a slight insight to the type of map that can be produced, I used a screenshot of the Where From, Where Go map at the base of the menu (See Figure 7).

 

Figure 8: Examples of pin call-outs (Descriptive labels are constructed for features providing a rich user experience when interacting with the map including hyperlinks to take you to the record itself in FileMaker Pro and even picture thumbnails reinforcing the nature of the trip)

CHALLENGES, PROBLEMS, AND LESSONS LEARNED

The biggest implementation challenge was integrating FileMaker Pro and Python. The PyFileMaker module was essential but poorly documented and not widely used. It is limited in its ability to sort on more than one field and chokes on some advanced queries. There is no easy way to find records in FileMaker and pass the results to Python. The best I could do initially was create buttons to trigger the Python scripts from FileMaker with a few parameters (if desired). This means Python has to do or potentially redo all queries and does not allow for manual manipulation of the found set through omissions and additions of records as FileMaker does. I eventually found a workaround using EXtensible Stylesheet Language (XSL) to structure data exported directly from FileMaker as valid KML. FileMaker allows data to be easily output as XML, and XSL can translate XML to the desired KML format. I had hoped this method might be faster than using Python but it performs about the same and is limited in its ability to organize layers in the KML output. The sole advantage is allowing users to perform queries in FileMaker using built-in functionality they are used to. Given this, I deployed this method for only the Pin Maps and continued with Python for the other maps.

With more than one family member from the same household traveling to the same destination there are often multiple point and line features with exact overlap. Google handles this elegantly by exploding the points when you click on the top one. This forces you to see there are multiple and choose between them to see detailed information. The drawback is that it only does this when clicked so some under-representation may occur on the resulting maps sans interaction with the features. Future development could include marginal offsets for overlapping features that would better represent these multiples at commonly viewed map extents.

Geocoding with Geocoder.us is painfully slow given the mandatory 15 second delay...a better backup geocoding service may exist that could eliminate this bottle neck. For now, the geocoder script must be scheduled to run on a regular basis to keep up with new contacts being entered. Developing a script to geocode new addresses as they are entered in real-time could be beneficial especially if combined with the data entry process to highlight typos (i.e. address not found could indicate spelling errors, etc).

Powerful as it is, I learned that Google Earth does have limitations. When it comes to geoprocessing you get what you pay for and Google wants you to pay for its $400/yr Google Earth Pro. I had hoped to find more built-in functionality in the free version to use geoprocessing function like intersections, unions, clips, and build choropleth and other more advanced thematic maps but it isn’t possible unless you prepare the data using other software first. Google Earth can be thought of a robust map viewer, something like what the Acrobat Reader is to Adobe Acrobat for PDFs.

Even given its limitations, Python programming allowed integration of FileMaker Pro and Google Earth. Wilderness Inquiry now has a powerful mapping system that allows patterns and trends to be visualized in ways that data only can simply not convey. Given its dynamic nature, this tool will be of scalable, long-term benefit to Wilderness Inquiry’s mission with potential uses limited only by the imaginations of the people who use it.

References

Bing Maps Application Programmer Interface. Retrieved from http://www.microsoft.com/maps/developers/

Geocoder.us geocoding service. Retrieved from http://geocoder.us/

Google Earth. Retrieved from http://www.google.com/earth/

GeoPy Geocoding Toolbox for Python. Retrieved from https://code.google.com/p/geopy/

Simple KML – A Python package from create basic KML. Retrieved from https://code.google.com/p/simplekml/

PyFileMaker - Python Object Wrapper for FileMaker Server XML Interface. Retrieved from https://code.google.com/p/pyfilemaker/

XSL EXtensible Stylesheet Language tutorial. Retrieved from http://www.w3schools.com/Xsl/

KML Reference – Keyhole Markup Language – Google Developers. Retrieved from https://developers.google.com/kml/documentation/kmlreference

Haversine Formula – Wikepedia. Retrieved from http://en.wikipedia.org/wiki/Haversine_formula

Python Programming Languange Official Website. Retrieved from http://www.python.org/

United States Census Bureau American Fact Finder. Retrieved from http://factfinder2.census.gov

Appendix A: FileMaker Geocoder Script Source Code


Appendix B: Build Pin Map KML Script Source Code

Continued on next page…


Appendix C: Build Where From, Where Go Map Source Code
Where From, Where Go continued …

 


Appendix D: Distance from WI Source Code

 

<< Back