The Craigslist problem

Prying data away from interfaces, and other notes on the public good.

Not everyone likes Craigslist’s interface. This article by Slate points to some rather obvious issues in case you needed reminding:

Craigslist is, to borrow a favorite phrase of Silicon Valley, ripe for disruption. Its bare-bones website looks like a relic of the early 2000s—and its user interface has hardly changed since then, as a trip through the Internet’s Wayback Machine indicates. Slate – Craigslist

The rest of the article goes on to detail a number of successors to the throne, each with independent ideas, funding and more. That part did not interest me as much. The reason why will become clear in a moment.

First, let’s ask the question, why is Craigslist so successful anyway? Because everyone (well, almost) uses it. Craigslist is a classic example of what people call network effects and bandwagon effects. Craiglist becomes more and more useful as more people use it, and other sites find it very hard to compete. The UsedEverywhere network in Greater Victoria comes close to competing on other classifieds, but for apartment rentals, Craigslist is still the big deal. Craigslist has a data monopoly, and fights hard to prevent competitors from accessing the data.

Padmapper screengrab

But, it does not have to be so. The availability of open access rental data (the same argument applies to job postings too) is a public good. It is also a free market good. It saves users time and reduces uncertainty if all their access points to information read from the same source. Or sources. They would no longer have to worry about looking up Craigslist, UsedEverywhere, kijiji, facebook, astrologers or tea leaves for their information. One site gives them all the information they need. It enables people providing services to compete on site design, value added features, and well, service.

So, if I think a site should be entirely map based, I can design one. If I only wanted two bedroom apartments within walking distance of the 14 bus route, or hell, within tumbling distance of my favourite ice-cream store, I should still be able to pull from the same database. Open rental information can become just another map layer. This is just one implementation idea, there are many others.1

Another advantage of centralized and open information is that it enables reviews and other persistent information. Right now, each listing for a suite/apartment is a discrete entity. The user has no idea if a suite comes up every few months because the landlord is terrible, or because there are racoons in the basement.

Really, we just need to jedi-mind-trick Craig into dispensing with the propriety monopoly on data.

Is it difficult? I don’t think so. Designing a rental database is not that difficult. The same standard information, location, size, rooms, amenities, images, reviews  and other media go into all listings. It’s not the technological challenges that hold open housing data back.

The hurdles are social and regulatory.

Some (not at all insurmountable) Challenges

  1.  Who owns the data? Open government data is an easy argument to make. The notion that residents who pay the government to provide services actually “own” the data that is generated by government seems fair and acceptable. However, the default for peer-created and moderated data is that it is owned by the site that provides the data portal. So, Craiglist, which wouldn’t exist without millions of people spending billions of hours creating data and flagging data for free somehow “owns” the data. A public and open database needs to shift this ownership paradigm away into a commons stewardship one based perhaps on a creative commons license.
  2. Who pays for the infrastructure? Right now, sites like Craigslist enable free rental and classifieds data by making money off other parts of their site, like job listings for example. This model may need to shift, maybe to some combination of a small listing fee from landlords, advertising, a license fee for apps to access the database, or a similar model to Craigslist where an open jobs listing database subsidizes the open rental database.
  3. Who moderates the content? Craiglist has extensive community moderation, and it seems to work well enough. The wikipedia community (for all its issues) points to a robust, mostly effective (if non-diverse) moderation community practice that is scalable, yet decentralized.

Some directions and ideas

  1. The open data movement needs to think about privately owned community generated data the same way it thinks about government data. This is going to take a while to accomplish, but there needs to be a bigger groundswell of people committed to liberating non-government community data. This wider acceptance is critical to liberating private data.
  2. One possible path: the more I think about this, the more I realize that Craigslist is already mostly there with the infrastructure and community. It just needs a community takeover. Someone could buy out Craigslist and their data, either directly on the market, or through some kind of mass appropriation. Or perhaps convince the Wikimedia Foundation or a similarly large organization to partner and share the costs. Ads?
  3. Decide on moderation and API policies, freely copying from Wikipedia and other massive open communities as necessary.

This model applies to classifieds, job postings and to that most egregious skimmer of them all, real estate.2

Until something happens, the Craiglist problem continues to be our problem. [end]

  1. Padmapper has already done a lot of the work on mapping.
  2. Also, it seems likely that this data challenge is very similar to the one that journalism faces. The data that flows from journalism, after all, is a public good. But it takes time and energy to acquire it. Food for thought.