ClickAider

Adam Bosworth: Database Requirements in the Age of Scalable Services

IT Conversations: Adam Bosworth - MySQL Users Conference: “Building a system that is capable of handling one billion transactions a day is easier than it sounds. That is Adam Bosworth’s view and he should know because he works for a company that has managed to achieve this level of scale on a simple architecture based on commodity hardware and simple brute force algorithms. Adam covers a lot of ground in this presentation that focuses on the success of the web, the scalability of simplicity and the emergence of the information server.

It’s not always about finding a simple solution to a complex problem, occasionally it’s about simplifying the problem. Whereas complex solutions are brittle and break, simple solutions just tend to work. HTML, HTTP, RSS and ATOM all fall in this category; simple solutions that have been widely adopted and work well. Adam believes it is time for database vendors to reflect on how they can provide an open, simple data model to easily server up information similar to the way a web server delivers content to the browser. Delivering an information server that is capable of federating information across the web, intelligently caching and scaling linearly is the next big database challenge.”

I was originally slated to be on a team lead by Adam Bosworth when I started at Microsoft. Alas a Microsoft reorg intervened. He’s an extremely smart, yet grounded, thinker.

Adam Bosworth believe we need enabling technology and protocols to allow massively scalable queries of distributed data on the web– “the HTTP and HTML of data”, in Adam’s words.

To do that for he proposes a simple protocol for distributed queries–akin in simplicity to HTML– for talking to decentralized databases. A query engine and presentation layer could then combine results from a number of different query sources.

Most approaches to the distributed query problem have tended to focus on exposing data in a structured format that can then be crawled, indexed, and queried in a central location. For example, Google solves the massive scaling problems by having a massive farm of servers in house, and building a software layer that allows queries to be efficiently distributed across those machines.

Adam’s proposal would allow, in essence, a federated cloud of machines equal in power to Google. Ironically (considering Adam’s affiliation with Google) this would dilute the power of Google.

To allow the queries to scale linearly across machines, the queries need to be able to run at the item level, that means no joins.

An extended RSS/Atom format is used to describe queries and their results. The proposed query format is an extremely simple item-only query language. There are no joins, because to make the queries scale, each machine must be able to handle the query on its own.

Adam’s ran out of time during his talk, so he didn’t have room to talk about the hard issues. One of hard issues is the economics and ownership issues regarding the data. Why should I run queries on the behalf of anyone who asks, when my data may be sliced and diced beyond recognition by the time their done with it?

On the other hand, if restrictive terms and conditions are attached to the use of distributed query engines, it’s going to be a damper on the growth of the federated data network.

No Comments so far
Leave a comment


Leave a comment

(required)

(required)