tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Apr 13, 2008 2:24am

index supercuts

Andy has a collection of fanboy supercuts, a "genre of video meme, where some obsessive-compulsive superfan collects every phrase/action/cliche from an episode (or entire series) of their favorite show/film/game into a single massive video montage." His collection includes some of the excellent and bizarre Lovelines isolation studies by Chuck Jones.

I'm reminded of how these constitute a kind of search index, a concept first introduced to me 11 years ago via Brian Slesinsky's Webmonkey article, Roll Your Own Search Engine. That was the first of many demystifications of big, web-scale technology for me. The thread running through all these fan cuts is the inverted index, identical to the concept introduced in that ancient article. An inverted index maps elements such as words to their source locations in a data corpus. Each of the pieces Andy links to is a kind of inverted index, pointing to locations of obscenities, audible inhalations, wilhelm screams, and so on.

The other thing it reminded me of was Simon Winchester's excellent book, The Professor And The Madman, an account of W.C. Minor's assistance in constructing the first edition of the Oxford English Dictionary. Minor was a confined lunatic with an extensive personal library, and the OED required that every sense of a word in its definition be traceable to an original, printed quotation. These were crowd-sourced from literate Englishmen of the time, but Minor's contribution went above and beyond because he noted interesting words as he read, constructing an inverted index of his library for OED-worthy terms. When dictionary editor James Murray needed a quotation for a particular word, there was a good chance Minor had already encountered and indexed it.

The works pointed to by Andy's blog post (and additions in the comments) are a special form of indexing, made possible by cheap communication and digital media. Let's hope the RIAA/MPAA don't fuck everything for an emerging form of media consumption.

April 2017
Su M Tu W Th F Sa
      
      

Recent Entries

  1. building up redistricting data for North Carolina
  2. district plans by the hundredweight
  3. baby steps towards measuring the efficiency gap
  4. things I’ve recently learned about legislative redistricting
  5. oh no
  6. landsat satellite imagery is easy to use
  7. openstreetmap: robots, crisis, and craft mappers
  8. quoted in the news
  9. dockering address data
  10. blog all dog-eared pages: the best and the brightest
  11. five-minute geocoder for openaddresses
  12. notes on debian packaging for ubuntu
  13. guyana trip report
  14. openaddresses population comparison
  15. blog all oft-played tracks VII
  16. week 1,984: back to the map
  17. bike eleven: trek roadie
  18. code like you don’t have the time
  19. projecting elevation data
  20. the bike rack burrito n’ beer box

Archives