Evan Savage

Track Your Happiness: An Adventure In Data Extraction

In this post, I go over my first report from
Track Your Happiness, a tool that uses
the Experience Sampling Method for mood

My Report #

Charts #

My happiness is relatively constant across days of the week.


I'm happiest at the gym or in parks, with vacations and restaurants close
behind. "At Home" is mid-pack, with "At Work" near the bottom.


Fun, exercise, and food generate the most happiness. Passive actions such as
watching TV, commuting, and waiting rank much lower. Work is least

What are you doing?

Whether I want to perform a task is a much stronger determinant of happiness
than whether I have to:

Want to / Have to

I'm happier when outside.


I'm happier when alone.


Given that, it seems counterintuitive that I'm happier when interacting with
multiple people.

Number of people interacting with

I was also surprised by this one: I'm happiest when talking with acquaintances
or friends
and least happy when talking with family.

Who talking with

What Does It Mean? #

Even without considering the specific results, I have a few unanswered

There's also the issue of those surprising findings. Am I really less happy
when talking with Valkyrie Savage? To me, the most likely
explanation is trust: around her, I feel free to discuss negative aspects
of my life.
Doing so would necessarily involve fixating on those aspects,
which could account for some happiness reduction.

During this period, I was confronting doubt and frustration in
my job.
According to my personal data, I was also drinking heavily,
possibly as a means for coping with that negative emotion. (It doesn't help.)
Guilt is a potential factor; perhaps I felt
that I was always offloading that doubt and frustration onto her.

The problem, though, is that none of these explanations are testable. They seem
reasonable to me, but from a scientific standpoint they fail a simple criterion:

Upon viewing only my data, would an impartial stranger reach similar conclusions?

I can't see how they would, since my explanations involve
intricate self-knowledge
that is not represented in the data.

A Further Note On Significance #

Let's take a more critical look at this chart:

Who talking with

I mentioned that this data was drawn from a total of 50 samples. I'm assuming
that these bars represent average reported happiness in each category. But:

This leads to an important point:

Never present uncertain information as certain.

Digging Deeper #

Consider this chart:


Am I happier when I'm more focused? It's hard to tell from looking at this
chart. This is a prime use case for linear regression, but I don't have the
data! They claim to have plans for data export, but I haven't seen those
come to fruition. What now?

Data Extraction #

Fortunately, the chart was generated using the
(now deprecated) Image Charts functionality of the
Google Charts API. With Image Charts, you make requests to specially
encoded URLs:


You can see what all those parameters do here,
but the one I really care about is chd. This encodes the chart data
in the Simple Encoding Format. I'll walk through how to decode
this data.

As it stands, the value of chd is URL-encoded.
We need to decode those %3A and %2C escape sequences.

import urlparse
params = urlparse.parse_qs('chd=s%3AaUXKPnmomsWw0tSQnXaVrk%2CslrjjuZtXvZotualhrmepm')
chd = params['chd'] # 's:aUXKPnmomsWw0tSQnXaVrk,slrjjuZtXvZotualhrmepm'

The s: at the front means use the simple encoding. In that encoding, the
characters A-Za-z0-9 are mapped to values 0-61 in a, well, simple manner:

def _get_simple_value(c):
if c == '_':
return None
if 'A' <= c <= 'Z':
return ord(c) - ord('A')
if 'a' <= c <= 'z':
return 26 + ord(c) - ord('a')
if '0' <= c <= '9':
return 52 + ord(c) - ord('0')
raise ValueError('invalid character for simple encoding: {0}'.format(c))

Here the underscores _ indicate missing or null values. With this function,
recovering the original data from the chd param is a quick one-liner:

data = [map(_get_simple_value, s) for s in chd[2:].split(',')]

By default, the simple encoding maps onto an effective range of 1-100, so the
last step is to normalize this and zip() the lists into pairs:

def fitSimpleToRange(x, xmin, xmax):
if x is None:
return None
nx = x / 61.0
return (1.0 - nx) * xmin + nx * xmax

points = zip(
[fitSimpleToRange(x, 0, 100) for x in data[0]],
[fitSimpleToRange(y, 0, 100) for y in data[1]]

Done! I've packaged this up as chdecode,
which also deals with the
Basic Text and Extended Encoding formats.

Let's See Those Charts Again #

You can see the code for this analysis here.

Focus, productivity, and sleep quality all have minor positive correlations
with happiness:

Focus Productivity Sleep Quality

The most significant one is focus, but with $ p = 0.0927 $ it doesn't quite
make the 5% significance threshold.

Up Next #

This ends my series of posts on data collection and analysis for dealing
with panic disorder. In my next few posts, I'll talk about my plans for future