ClickAider
You are currently browsing the Bogle’s Blog weblog archives.

Louis Gray: Mergelab Emerges to Streamline Friends’ Web Updates

Louis Gray, author of the Silicon Valley Blog, has written a good preview of Mergelab.

The Merge Fairy: automated merging of Subversion revisions

The Merge Fairy is a Python script that automates the process of merging changes from one Subversion branch to another, based on an XML configuration file that describes branches and their dependencies. For example, you might want bug fixes from the release branch to be automatically merged to the trunk branch.

In the event of a merge conflict or a build failure after merging, the Merge Fairy sends email requesting help from a human to make a manual merge, resuming automated merging once this done.

I originally wrote the Merge Fairy at Jobster, where it has been used for the past few years. We also use it at Mergelab. The Merge Fairy is now being released under an open source MIT license for general use.

If you are already familiar with how it works, you can now download mergefairy.zip. I will be reposting full documentation shortly; during my server migration, I unfortunately lost the documentation page I had written. (I realize this sounds like the internet equivalent of “the dog ate my homework”, but it’s true.)

Update:
The Merge Fairy and associated documentation is now hosted as Google Code project at http://code.google.com/p/svn-merge-fairy/

Newspapers must become a News Web

Much has been made of the sinking fortune of newspapers in the face of competition from the web. I know little about the news industry, but that won’t stop me here from putting in my technological two cents here.

The only way for the newspapers to survive and flourish on the internet is to become a news web. Becoming a web is much more than simply putting pages on the web, which is what newspapers have done so far.

Mosts papers today toss up a web site with an article archive and a superficial full text search engine, and call it a day. (It’s considered the height of innovation to not hide the archive behind a pay wall.) Google eats the newspaper’s lunch because Google can offer cross-cutting (though equally superficial) keyword search of all the newspapers.

Google enjoys a financial advantage because it catches users at the monetizable moment when they are searching. By the time the user clicks through to the article they are too deeply absorbed to click offsite and generate revenue for the paper.

This is ironic, because locked within the heads of reporters and editors is an understanding of the news that Google couldn’t possibly hope to recreate or even recapture through automated means.

What the newspapers need is technology that captures their unique understanding of the news, so that they, rather than Google, become the vertical search engine of choice for news. This is a tricky problem not only in information architecture but also in usability. How do you create a system that mere mortals can use, that allows semantically rich querying and browsing across events, people, and time? How do you coax and incent reporters and editors into capturing their implicit knowledge?

By saying that newspapers must “become a web”, I mean that they must convert their effectively flat archives into a meaningfully interlinked and semantically searchable web of news. I also mean to imply that newspapers must give up their walled-silo approach to information and support cross-cutting search and interlinking of content from all different publishers, otherwise Google and other search engines will continue to win out.

Here’s a concrete example. Suppose I want to find out what speeches John McCain made on Iraq in 2002. Here are the results from the
New York Times archive for “John McCain Iraq Speech”
, and here is the
Google News search for “John McCain Iraq Speech”
, both restricted to 2002.

Because both Google and the NYT offer only full-text search, relevance is poor, as is the quality of the summary snippets. For example, in cases like this one, the speeches on Iraq are actually by Bush, and McCain is quoted only in passing.

Imagine instead that the reporter who wrote the article had tagged the articles in the internal news database as a speech, and recording that the speaker was Senator John McCain. Then the news site could offer a search that precisely answered my query. Furthermore, it could easily offer a browseable web of related links, such as other speeches by John McCain and speeches by other politicians on Iraq.

By capturing the implicit knowledge of the reporter, the newspaper not only made their information asset more valuable but created a semantic web that is difficult for Google to compete with. On the internet as a whole, such a semantic web is perhaps a pipe dream, but within the specialized domain of professional news gathering it should be obtainable.

Newspapers need to realize that their futures lie in neither news nor paper, but in capturing and organizing online the meaning of a complex web of events.

Server Migration complete for berry411.com and thebogles.com

I am up and running on my new dual core server for thebogles.com and berry411.com.

It was a longer and bumpier transition than I had hoped for, with a fair amount of downtime, but I believe the outcome will be a faster and more stable server.

Please let me know if you encounter any issues.

Update: Several readers pointed out that Berry Bloglines was returning 503 errors, this is now fixed. I also improved performance usingdnsmasq, a local DNS cache, since I discovered that ServerPronto’s servers periodically become very slow.)