tecznotes

Michal Migurski's notebook, listening post, and soapbox. Subscribe to this blog. Check out the rest of my site as well.

Feb 14, 2005 8:07am

indie rock pete

From an essay on UltraGleeper, a Codecon presentation on web page recommendations, which I missed:

If you want best-sellers you can check the best-seller list. In my opinion, a good recommendation engine must have an element of anti-popularity. I implemented this in the Ultra Gleeper with something I call the Indie Rock Peter Principle, named after the comic strip character Indie Rock Pete, who has a kind of sadomasochistic relationship with popularity. The IRPP stipulates that beyond a certain point, additional recommendations (i.e. incoming links) decrease a page's score instead of increasing it. Not only does this clear the ground for real surprises, it acts as an additional, probabilistic filter for pages you've seen in casual browsing. Anything it misses will probably show up within a couple days in the weblogs you read.

Brilliant. (emphasis mine)

Feb 14, 2005 2:42am

codecon 2005 recap

We presented Mappr at Codecon 2005 yesterday, and I think it went pretty well. I wanted to go check out a few of today's sessions (notably Wheat and Incoherence), but circumstances prevailed against me, and I was only able to attend the Google reception Friday evening and most of Saturday's talks.

We were asked a few great questions, which I'll summarize here:

A few participants asked about the possibility of open-sourcing Mappr, or at the very least clarifying the licensing terms on the website, so decompilers would know what they were permitted to do, before doing it anyway. Clearly, we'll need to be more up-front about the terms for using the client-side map code, though this a question probably best asked of Tomas, who wrote it.

The open-source question is an interesting one, as it applies to web services. The GPL has addressed the licensing issues around machine-readable computer code, and Creative Commons has done the same for creative works, where derivative works and attribution are an issue. Web services are still a fairly major gray area, especially in the case of Mappr:

  • We don't actually own most of Mappr's data, since the photographs you see on the site are all there courtesy of the original photographer, subject to their terms, and served up via Flickr. If Flickr were to shut their doors, Mappr would disappear as well.
  • The geographical data we use to make our location guesses comes from US Government sources. It is public-domain, but it isn't "ours" in the strictest sense either.

The value Mappr has (and there must be value, for something to be licensable) is in the work it does, the service it performs. In a sense, a webservice is like the blue-collar worker of the 2.0 web, paid hourly for work done rather than salary for ideas generated; it's a role that resists easy intellectual property pigeonholing. What aspect of Mappr could be open-sourced? The client code, obviously, but this is in some ways the easiest part to reverse-engineer. The server-side location matching heuristics could be open sourced, but that wouldn't be worth much without the formatted place information culled from various sources. The photos themselves aren't ours to license.

One participant asked about explicitly geo-tagged images. This has been brought up before, and Flickr has just made this a lot easier by opening up EXIF data in their API. They still don't provide a way to search on EXIF, and I strongly suspect that not very many photos in the system have geo data in them.

I would approach the task of using explicit geo-data by merging it with existing place-name data: I would define a new type of relationship in Mappr ("Mappr believes this photo is near...") based on the closest named place to the photo's latitude and longitude. I would also allow for alternate ways to specify this information: EXIF is voodoo to a lot of users, and inaccessible once the image has been uploaded. It should be possible to expand our range of mappr-recognized tags to include something like "mappr:latitude:12.34". This has been brewing for a while, and Flickr's EXIF hooks are one additional fire under my ass to get it done.

Alon Salant's Photospace project, presented right after ours, handles the explicit geolocation issue nicely by providing for local maps and using RDF to link GPS tracklogs to terraserver maps. It's a little technically hardcore for Flickr users, I think, but definitely the approach of choice if you are interested in rigorously-specified locations for images. There were questions about our learning and feedback mechanisms. For example, if photos tagged with "Concrete" are consistently misplaced into Concrete, WA (ironically, the photo we currently have for that location is actually intended to be in Concrete. Try duck), how can the system learn to disassociate these tags? Right now it's a manual process: I mark placenames "Gary" or "Reading" as commons words or names that may not actually be placenames. Implementing any type of automated learning system is pretty far out of my reach right now, though I do have a copy of AIMA on my bookshelf that may help with this.

Er, that's it. Go Codecon!

March 2024
Su M Tu W Th F Sa
     
      

Recent Entries

  1. Mapping Remote Roads with OpenStreetMap, RapiD, and QGIS
  2. How It’s Made: A PlanScore Predictive Model for Partisan Elections
  3. Micromobility Data Policies: A Survey of City Needs
  4. Open Precinct Data
  5. Scoring Pennsylvania
  6. Coming To A Street Near You: Help Remix Create a New Tool for Street Designers
  7. planscore: a project to score gerrymandered district plans
  8. blog all dog-eared pages: human transit
  9. the levity of serverlessness
  10. three open data projects: openstreetmap, openaddresses, and who’s on first
  11. building up redistricting data for North Carolina
  12. district plans by the hundredweight
  13. baby steps towards measuring the efficiency gap
  14. things I’ve recently learned about legislative redistricting
  15. oh no
  16. landsat satellite imagery is easy to use
  17. openstreetmap: robots, crisis, and craft mappers
  18. quoted in the news
  19. dockering address data
  20. blog all dog-eared pages: the best and the brightest

Archives