tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Aug 3, 2005 12:57pm

announcing vox delicii

I switched the data source for the News project from Google News to Del.icio.us Popular, and called the resulting piece Vox Delicii.

Here's why:

Since August 2004, Google's algorithm for determining news items worthy of inclusion on the "In The News" short list has become much more opaque, and a little less meaningful. Before that date, I believe it to have been based on objective popularity. George Bush was always in the news, and major news events such as the Richard Clarke / Condi Rice hearings in Spring 2004 were very well-represented. About a year ago, Google switched to an algorithm that appears to be based on the first derivative of popularity: major newsmakers such as George Bush no longer showed up, while lots of "flash in the pan" names did. It was a sensible decision on Google's part, but hell on anyone trying to do a visual analysis. The appearance of the heat map became significantly more chaotic, and it was more difficult to view patterns. In effect, the map above shows change over time, and doing so with information that is already showing change over time gives you a map of acceleration, which is more difficult to comprehend quickly and less interesting to watch.

Similarly, Google's process for determining what constitutes a "proper noun" is also opaque. When Sun and Microsoft settled their long-standing legal disputes, Sun Microsystems appeared here while Microsoft did not. I don't know why - maybe Google is only interested in pairs of capitalized words surrounded by non-capitalized words.

Meanwhile, Del.icio.us Popular is completely transparent. A quick glance at the list of popular items shows them to be organized by number of recent posts. A little digging shows that this number is probably based on the number of posts in the last 24 hours, so right away there's an objective method for understanding the source of the data.

In some ways, the Del.icio.us data is also more interesting for what it represents. The News information was based on news-room memes, and strongly influenced by the Associated Press and the general tendency for news sources to reprint each other's stories. Thus, it wasn't really graphing the mindshare of information deemed interesting by the general public, but by that of professional journalists and their employers and stockholders. Meanwhile, Del.icio.us popularity is a significantly more bottoms-up affair, tracking the oddball tastes of the geek set as it flocks to stories linked from Slashdot or BoingBoing. This is real, honest-to-goodness attention data and it should be fun to watch and analyze as the set grows.

Comments

Sorry, no new comments on old posts.

October 2017
Su M Tu W Th F Sa
    

Recent Entries

  1. planscore: a project to score gerrymandered district plans
  2. blog all dog-eared pages: human transit
  3. the levity of serverlessness
  4. three open data projects: openstreetmap, openaddresses, and who’s on first
  5. building up redistricting data for North Carolina
  6. district plans by the hundredweight
  7. baby steps towards measuring the efficiency gap
  8. things I’ve recently learned about legislative redistricting
  9. oh no
  10. landsat satellite imagery is easy to use
  11. openstreetmap: robots, crisis, and craft mappers
  12. quoted in the news
  13. dockering address data
  14. blog all dog-eared pages: the best and the brightest
  15. five-minute geocoder for openaddresses
  16. notes on debian packaging for ubuntu
  17. guyana trip report
  18. openaddresses population comparison
  19. blog all oft-played tracks VII
  20. week 1,984: back to the map

Archives