tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Apr 13, 2008 2:24am

index supercuts

Andy has a collection of fanboy supercuts, a "genre of video meme, where some obsessive-compulsive superfan collects every phrase/action/cliche from an episode (or entire series) of their favorite show/film/game into a single massive video montage." His collection includes some of the excellent and bizarre Lovelines isolation studies by Chuck Jones.

I'm reminded of how these constitute a kind of search index, a concept first introduced to me 11 years ago via Brian Slesinsky's Webmonkey article, Roll Your Own Search Engine. That was the first of many demystifications of big, web-scale technology for me. The thread running through all these fan cuts is the inverted index, identical to the concept introduced in that ancient article. An inverted index maps elements such as words to their source locations in a data corpus. Each of the pieces Andy links to is a kind of inverted index, pointing to locations of obscenities, audible inhalations, wilhelm screams, and so on.

The other thing it reminded me of was Simon Winchester's excellent book, The Professor And The Madman, an account of W.C. Minor's assistance in constructing the first edition of the Oxford English Dictionary. Minor was a confined lunatic with an extensive personal library, and the OED required that every sense of a word in its definition be traceable to an original, printed quotation. These were crowd-sourced from literate Englishmen of the time, but Minor's contribution went above and beyond because he noted interesting words as he read, constructing an inverted index of his library for OED-worthy terms. When dictionary editor James Murray needed a quotation for a particular word, there was a good chance Minor had already encountered and indexed it.

The works pointed to by Andy's blog post (and additions in the comments) are a special form of indexing, made possible by cheap communication and digital media. Let's hope the RIAA/MPAA don't fuck everything for an emerging form of media consumption.

December 2017
Su M Tu W Th F Sa
     
      

Recent Entries

  1. planscore: a project to score gerrymandered district plans
  2. blog all dog-eared pages: human transit
  3. the levity of serverlessness
  4. three open data projects: openstreetmap, openaddresses, and who’s on first
  5. building up redistricting data for North Carolina
  6. district plans by the hundredweight
  7. baby steps towards measuring the efficiency gap
  8. things I’ve recently learned about legislative redistricting
  9. oh no
  10. landsat satellite imagery is easy to use
  11. openstreetmap: robots, crisis, and craft mappers
  12. quoted in the news
  13. dockering address data
  14. blog all dog-eared pages: the best and the brightest
  15. five-minute geocoder for openaddresses
  16. notes on debian packaging for ubuntu
  17. guyana trip report
  18. openaddresses population comparison
  19. blog all oft-played tracks VII
  20. week 1,984: back to the map

Archives