Calculus is Impossible on Rainy Days

Monday, November 24, 2014 Lauren Nicolaisen 12 Comments

Calculus is Impossible on Rainy Days
And other spurious correlations...

One of the true joys of being a data scientist is digging into a new data set -- exploring a new field, figuring out how different things interact and discovering correlations. Each field has its own unique quirks -- different factors that end up having enormous influence on what you see in the data. And there’s one particularly enjoyable way to learn about these quirks: making the most absurd conclusions you possibly can.

Today, we’ll forget for a moment that correlation doesn’t imply causation, and discover some of the most baffling things that affect how difficult math is.

Disclaimer: None of the things I’m about to say are truly causal -- all of these statements are merely a result of confounding factors and spurious correlations -- studying math on rainy days is excellent for you, I promise.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


We all know that rainy and cold days feel dreary, dark, and more frustrating. But did you know that math is actually more difficult the colder it gets? Yep:

Slide1.png


If you look at accuracy across all math problems on Khan Academy, you’ll see that accuracy is almost 5% lower on the coldest days than the warmest days. This is a mind-bogglingly huge effect. Why does it happen? Is math really more difficult when it’s cold?

Of course not. What we’re really seeing is that seasonality has a huge effect on who is doing math problems. If we look at accuracy throughout the year, we see:

Slide5.png

The reason for these huge shifts is that there’s many different motivations for using Khan Academy: some folks are using Khan Academy for their own enrichment, enthusiastic about learning new things and reviewing things they have learned in the past, and these users are likely to continue to be active on Khan Academy throughout the entire year, including the summer and the holidays. However, a less motivated user may be less inclined to stay active when they’re not currently in school.

Here’s another fun fact: did you know that people are noticeably more accurate during football games? Afternoons during which there is a nationally-televised NFL game have an almost 1.5% higher accuracy rate:

Slide4.png
Fun Fact: If you zoom in far enough, all two-bar plots look extremely impressive.

Of course, as before, this is just because afternoon NFL games are all on Sunday (or Saturday in January!), and accuracy is far higher on the weekends than on weekdays:

Slide2.png

Similarly, users are more accurate during baseball games than basketball games (summer vs. winter), ice cream is absolutely awesome for your math abilities, ice skating is disastrous, and holidays are fantastic.

This ends up having significant implications for data science -- it’s very easy to reach highly misleading conclusions whenever you do anything that involves time. Testing out a new feature that has different effects on more vs. less engaged users can have wildly different effects depending upon the time of day, time of week, or even time of year that you launch it.

This might be obvious in any field when you launch something around the holidays or late at night, but for education in particular, the timing of back to school and school breaks are hugely important.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Quick question: What age group do you think is the most accurate on Khan Academy? The answer is 97 year olds. In fact, 97 year olds tend to answer over 85% of questions correctly, which is vastly higher than the average accuracy.

Why is this? It’s the same reason that the ‘best’ and ‘worst’ states in the U.S. are also the smallest ones -- smaller sample sizes have far higher variance. Only 17 users claim to be 97 years old, while younger ages typically have hundreds of thousands. Thus, while younger ages tend to be very close to the overall average, higher ages can vary wildly. Incidentally, the least accurate users are 99 year olds.

Another fun question: Which city has the highest mission completion rate in the world? You’re probably thinking this is another sample size trick, so let’s change it up slightly and ask: of cities with at least 100 purported users, which city has the highest mission completion rate?  

That would be Antarctic Great Wall Station, Antarctica. The average user from Antarctica has completed a staggering 2.3 entire missions.

BlogPost1.jpg

What’s causing this? Well, we’re all liars. When you select a city on Khan Academy, you choose from a dropdown menu of real cities -- so if you want to pick something ‘fun’, your options are somewhat limited. Antarctica is a pretty great choice.

In fact, 132 users claim to be from Antarctic Great Wall Station, Antarctica, which is pretty interesting when you consider that the fount of all true knowledge, Wikipedia, claims that the summer population is only 40 (winter: 14).

Users who choose this location also happen to be far more engaged, and far more accurate, than the average user. Other cities come pretty close: Nowhere Else, Tasmania, Australia is strangely popular too. In fact, since selecting a city is purely optional (and requires deliberately editing your profile), merely choosing one at all makes you far more accurate.

In conclusion,

  •      Calculus is impossible on rainy days.
  •      Watching football makes you far more accurate.
  •      Antarcticans are math experts.
  •      97-year-olds are excellent at math, 99-year-olds not as much.

Have any good spurious correlations you’d like to share, or curious about this data and how it was collected? Leave a comment below! For more about me, check out my personal blog: laurenatphysics.com

12 comments:

  1. You're my hero, Lauren! Correlation ≠ causation :)

    ReplyDelete
  2. Fun article! I look forward to more!

    ReplyDelete
  3. Hey, I enjoyed a lot this blog. I hope someday become a Data Scientist, it's absolutely amazing the mysterious work you do guys. Please keep submitting stuff amazing like this.
    I wonder how many people has complete all skills on Khan, do you know this? I'll be one more soon! :D

    ReplyDelete
  4. Hey, I enjoyed a lot this entry. I hope someday be a Data Scientist, it's absolutely amazing the mysterious work you do guys.
    Please keep submitting stuff amazing like this. I wonder how many people have complete all skills on Khan, do you know this? I'll be one more soon! :D

    ReplyDelete
  5. On a related topic of spurious correlations, this website : http://www.tylervigen.com/

    ReplyDelete
  6. Great article! I loved it. It is so interesting how making the wrong correlation can lead to really wrong conclusions.
    -Vera
    http://theflashwindow.weebly.com/

    ReplyDelete
  7. In fact, in addition to "negative" knowledge, such statistics may provide some positive results.
    From these data we may conclude, that people who want to be more original (such as coming up that they live in Antarctica) better solve mathematical problems.

    ReplyDelete
  8. Creative article! I have a slight grammar nitpick: 97-year-olds, etc., should be hyphenated.

    ReplyDelete