Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Aug 3, 2005 4:57pm

announcing vox delicii

I switched the data source for the News project from Google News to Del.icio.us Popular, and called the resulting piece Vox Delicii.

Here's why:

Since August 2004, Google's algorithm for determining news items worthy of inclusion on the "In The News" short list has become much more opaque, and a little less meaningful. Before that date, I believe it to have been based on objective popularity. George Bush was always in the news, and major news events such as the Richard Clarke / Condi Rice hearings in Spring 2004 were very well-represented. About a year ago, Google switched to an algorithm that appears to be based on the first derivative of popularity: major newsmakers such as George Bush no longer showed up, while lots of "flash in the pan" names did. It was a sensible decision on Google's part, but hell on anyone trying to do a visual analysis. The appearance of the heat map became significantly more chaotic, and it was more difficult to view patterns. In effect, the map above shows change over time, and doing so with information that is already showing change over time gives you a map of acceleration, which is more difficult to comprehend quickly and less interesting to watch.

Similarly, Google's process for determining what constitutes a "proper noun" is also opaque. When Sun and Microsoft settled their long-standing legal disputes, Sun Microsystems appeared here while Microsoft did not. I don't know why - maybe Google is only interested in pairs of capitalized words surrounded by non-capitalized words.

Meanwhile, Del.icio.us Popular is completely transparent. A quick glance at the list of popular items shows them to be organized by number of recent posts. A little digging shows that this number is probably based on the number of posts in the last 24 hours, so right away there's an objective method for understanding the source of the data.

In some ways, the Del.icio.us data is also more interesting for what it represents. The News information was based on news-room memes, and strongly influenced by the Associated Press and the general tendency for news sources to reprint each other's stories. Thus, it wasn't really graphing the mindshare of information deemed interesting by the general public, but by that of professional journalists and their employers and stockholders. Meanwhile, Del.icio.us popularity is a significantly more bottoms-up affair, tracking the oddball tastes of the geek set as it flocks to stories linked from Slashdot or BoingBoing. This is real, honest-to-goodness attention data and it should be fun to watch and analyze as the set grows.


Sorry, no new comments on old posts.

June 2023
Su M Tu W Th F Sa

Recent Entries

  1. Mapping Remote Roads with OpenStreetMap, RapiD, and QGIS
  2. How It’s Made: A PlanScore Predictive Model for Partisan Elections
  3. Micromobility Data Policies: A Survey of City Needs
  4. Open Precinct Data
  5. Scoring Pennsylvania
  6. Coming To A Street Near You: Help Remix Create a New Tool for Street Designers
  7. planscore: a project to score gerrymandered district plans
  8. blog all dog-eared pages: human transit
  9. the levity of serverlessness
  10. three open data projects: openstreetmap, openaddresses, and who’s on first
  11. building up redistricting data for North Carolina
  12. district plans by the hundredweight
  13. baby steps towards measuring the efficiency gap
  14. things I’ve recently learned about legislative redistricting
  15. oh no
  16. landsat satellite imagery is easy to use
  17. openstreetmap: robots, crisis, and craft mappers
  18. quoted in the news
  19. dockering address data
  20. blog all dog-eared pages: the best and the brightest