We Feel Fine / Methodology

Methodology

At the core of We Feel Fine is a data collection engine that automatically scours the Internet every ten minutes, harvesting human feelings from a large number of blogs. Blog data comes from a variety of online sources, including LiveJournal, MSN Spaces, MySpace, Blogger, Flickr, Technorati, Feedster, Ice Rocket, and Google.

We Feel Fine scans blog posts for occurrences of the phrases "I feel" and "I am feeling". This is an approach that was inspired by techniques used in Listening Post, a wonderful project by Ben Rubin and Mark Hansen.

Once a sentence containing "I feel" or "I am feeling" is found, the system looks backward to the beginning of the sentence, and forward to the end of the sentence, and then saves the full sentence in a database.

Once saved, the sentence is scanned to see if it includes one of about 5,000 pre-identified "feelings". This list of valid feelings was constructed by hand, but basically consists of adjectives and some adverbs. The full list of valid feelings, along with the total count of each feeling, and the color assigned to each feeling, is here.

If a valid feeling is found, the sentence is said to represent one person who feels that way.

If an image is found in the post, the image is saved along with the sentence, and the image is said to represent one person who feels the feeling expressed in the sentence.

Because a high percentage of all blogs are hosted by one of several large blogging companies (Blogger, MySpace, MSN Spaces, LiveJournal, etc), the URL format of many blog posts can be used to extract the username of the post's author. Given the author's username, we can automatically traverse the given blogging site to find that user's profile page. From the profile page, we can often extract the age, gender, country, state, and city of the blog's owner. Given the country, state, and city, we can then retrieve the local weather conditions for that city at the time the post was written. We extract and save as much of this information as we can, along with the post.

This process is repeated automatically every ten minutes, generally identifying and saving between 15,000 and 20,000 feelings per day.

We Feel Fine's data is stored in a database, and can be queried in any number of ways by people using the We Feel Fine applet.

When the applet is first opened, the initial dataset consists of the most recent 1,500 feelings collected by our system. The applet's panel can then be used to arbitrarily specify different populations, constrained by any combination of:

Feeling (happy, sad, depressed, etc.)
Age (in ten year increments - 20s, 30s, etc.)
Gender (male or female)
Weather (sunny, cloudy, rainy, or snowy)
Location (country, state, and/or city)
Date (year, month, and/or day)

Obviously, the more specific the population, the fewer feelings it will contain, and the less significant any associated statistical computations will be. For example, asking for feelings from "20 year old males in Bagdhad Iraq when it's rainy" might yield few or no feelings, whereas, asking for feelings from "20 year olds in New York City" would result in a larger number of feelings.

For any given population, the applet presents a number of different statistical views, offering insights into the traits of the specified population.

The "Mobs" movement of the piece shows distribution breakdowns of the chosen population along: feeling, gender, age, weather, and location. Mobs expresses the notion of "Most Common".

The "Metrics" movement of the piece shows the most representative traits of the chosen population along: feeling, gender, age, weather, and location. Metrics expresses the notion of "Most Salient".

"Most Common" is different from "Most Salient" in the following way:

"Most Common" will be more or less the same across different populations. For example, "better" is the most common feeling overall, so in most populations, "better" will be the most common feeling.
"Most Salient" expresses the ways in which a given population differs from the global average. For example, if most people feel "cold" .02% of the time, but Canadians feel "cold" 1.2% of the time, we claim "cold" to be especially salient among Canadians, because "cold" occurs among Canadians at 6 times the normal rate.

In making our salience computations, we are careful to avoid falsely claiming statistical significance. Salience computations count one individual blogger once and only once. For example, if there is one blogger in North Dakota who feels "magnificent" over and over again, it would be misleading to conclude that North Dakota as a state feels particularly "magnificent", just because of a single prolific blogger who happens to feel magnificent. So our magnificent North Dakotan would only be counted once. Similarly, we impose a threshold of at least four occurences for a given trait to be considered salient. For example, in a population of 100 feelings, say a given very obscure feeling like "downtrodden" occurs twice, representing 2% of the total feelings in that population. Say "downtrodden" usually occurs only .0003% of the time. It would be misleading to claim that this population feels particularly "downtrodden", just because two people out of 100 happened to feel that way. So we impose a minimum of four occurences in a given population for a trait to be considered for salience.

Furthermore, whenever possible the applet makes clear exactly what data, and how much, was used in making any salience claims, so viewers can discern for themselves how statistically significant the findings are.

The "Mounds" movement of the piece displays every valid feeling in our system, ordered and scaled to represent each feeling's frequency. This list is independent of the selected population, and is updated periodically as our database grows.

We Feel Fine only collects and displays data that was already posted publicly on the World Wide Web. We Feel Fine never associates individual human names with the feelings it displays, though it always provides a link to the blog from which any displayed sentence or picture was collected. Also, bloggers may make a blog post invisible to the We Feel Fine crawler by including the following code somewhere in the post: <script>nofeelings</script>.

The top 200 feelings were manually assigned colors that loosely correspond to the tone of the feeling. Happy positive feelings are bright yellow. Sad negative feelings are dark blue. Angry feelings are red. Calm feelings are green. And so on. A full list of all valid feelings, along with their counts and colors, is here.

There is no human involvement in We Feel Fine. The system runs autonomously, collecting and presenting data about human feelings.

We Feel Fine's data collection engine uses custom software written by Jonathan Harris and Sep Kamvar, using Java, Perl, MySQL and Apache. The applet was created using the excellent Processing software, by Ben Fry and Casey Reas. PHP is used for various housekeeping tasks on the server.

For the time being, We Feel Fine is closed source. However, the data is freely available through our public API.

We'd like to thank Ian Spiro, whose past and continuing contributions to the project have been invaluable, and Chris Miller, whose systems support and advice have allowed us to keep our stress level at a minimum.

We Feel Fine is an independent project conceived and created by Jonathan Harris and Sep Kamvar. It bears no affiliation to any company or organization.