Evan Savage

Persistent Location Tracking: Looking For A Few Good Data Points

In this post, I revisit the question of whether Google Latitude meets my
persistent location tracking needs. In my previous post, I compared
Google Latitude to InstaMapper and concluded that the latter is too
battery-intensive. By looking at maps and base-level insights from the data,
I suggest that Google Latitude optimizes for battery life at the expense of
data quality.

Exhibit A: Some Maps #

I started gathering data on Oct. 3, 2012:

Since then, Valkyrie Savage and I have travelled to Boston and Chicago.
Our stopover in Phoenix is clearly visible at this scale. You can barely make
out our day trip to Mount Monadnock, NH over near Boston. Here's a
closer look at that trip:

Ouch. The data is noisy in some areas, sparse in others. It's fairly clear that
we took Hwy 2 over, but some of the GPS readings are miles off. Let's zoom in
on that hike:

Only five data points actually lie within the park/mountain boundaries. That's
five data points for a four-hour hike. Our Boston data is somewhat more
accurate:

Still, the red line cuts through city blocks with reckless abandon. Either
we're flying, or we're packing some incredibly efficient demolition equipment.

Here's the map for one of my more itinerant Bay Area days:

I cycled to a doctor's appointment, visited
BiD to hear Mary Czerwinski speak about emotion tracking, worked
from home for a bit, went into San Francisco to meet up with
Lev Popov, and finally dragged myself home again.

The BART ride into San Francisco is understandably sparse: most of it is
separated from cell towers and GPS satellites by rock and/or water.

Most of my travel is on foot, by bike, or via public transit. Not content
with the Mount Monadnock hike data, I tried another quick drive up into
Tilden:

Google Latitude captured just four points during the 20-minute drive.

Exhibit B: Some Analysis #

You can see the code for this analysis
here
and here.

After trudging through several lackluster map views, I'm left with a
nagging impression:

This data isn't that useful.

This impression deserves further analysis, so I grab the KML to answer some
of my questions. First off: how often is Google Latitude checking my location?

About every two minutes. GPS is a huge battery drain;
increasing the time between updates can help by allowing the GPS radio to
enter an idle state. How are those location readings scheduled?

Google Latitude really likes spacing its readings out by a whole number of
minutes.

How accurate is the data? The KML doesn't provide accuracy estimates
for its locations. Fortunately, the Google Latitude API does, so
I retrieve my data using this script and look at the accuracy readings:

Actually, the readings have fairly high accuracy. Only 7% of readings have a
reported error radius greater than 100m.

The maps above suggest that location readings are less accurate while
travelling at high speed. Is that true? The API provides speed estimates
for some readings, but this data is kind of sparse:

$ python speed.py < history.api
found 7429 speed values among 20898 readings

I try a different method: the Haversine distance formula, which
gives me the distance between two points on the Earth's surface:

def haversineDistance(A, B):
"""
Distance (in meters) between two Locations. Uses the Haversine formula.

See http://www.movable-type.co.uk/scripts/latlong.html for corresponding
JavaScript implementation.
"""

# Earth's radius in meters
R = 6371009
dLat = math.radians(B.lat - A.lat)
dLon = math.radians(B.lng - A.lng)
lat1 = math.radians(A.lat)
lat2 = math.radians(B.lat)
sLat = math.sin(dLat / 2.0)
sLon = math.sin(dLon / 2.0)
a = sLat * sLat + sLon * sLon * math.cos(lat1) * math.cos(lat2)
c = 2.0 * math.atan2(math.sqrt(a), math.sqrt(1.0 - a))
return R * c

I use this distance formula to get a plot of accuracy versus travelling speed:

No clear correlation here; there are low-quality readings at both low and high
speeds. There are several possible explanations:

To test this last hypothesis, I also plot sampling interval versus speed:

Nothing conclusive there.

Conclusion #

The problem appears to be sampling frequency. To reduce battery usage, Google
Latitude polls about once every two minutes.
While it has some mechanism for
polling more often in periods of high activity, it's unclear how that works.

Reliance on fixes from cell towers and WiFi may be reducing location quality
in more remote areas. Testing this hypothesis is difficult: how do you
quantify remote? One possibility is to compute nearest-neighbor distance
against a database of cities. Another confounding factor is the
reliability of those accuracy values. Improving upon that would likely
involve manual labelling.

Why Do This? #

Accuracy is not binary.

In Quantified Self applications, we use personal data to drive changes in our
lives.
We put a lot of trust in the accuracy and relevance of that data, and
we extend that trust to the tools and services that collect it.
We trust Fitbit to track our fitness.
We trust Zeo to improve our sleep.
We trust Lumosity to train our perception and attentiveness.

In giving so much trust to these tools, we sometimes forget that data are not
infallible.

Physics guarantees that there is no such thing as perfect data. All
data contain error.
As a system consisting of geosynchronous satellites that
travel at relativistically significant speeds and beam data
through our multilayered atmosphere to tiny chip radios sandwiched between
layers of dense circuitry, GPS is understandably error-prone.
When your chosen tools and services add noise on top of that, it's reasonable
to ask:

How much trust should I place in the output?