ClickAider
You are currently browsing the Development category.
RSS feed

Snellspace: Critique of the Google Contacts Atom format

Good critique of the Google Contacts API over on Snellspace:

… as is typical of GData, all of the application specific data is contained in extension elements. While this is technically not a bad thing, it does mean that Google has effectively invented Yet Another Way of representing contact information. We already have vCard and hCard, both of which have been extremely effective and have a ton of existing application support. However, none of that existing code can be used with GData contacts. I understand why Google is using the extension elements but I disagree that the tradeoff is worth it.

Using extension elements basically means that “dumb clients” — Atom clients that do not understand the GData API, will not be able to do anything at all with the feed. There’s not even any displayable content that a human can use to extract information. Clients have to be written specifically against the GData API. Assuming that Microsoft also puts together a Contacts API, and assuming they also use extension elements to represent the data, client implementors end up having to write vendor specific code that does exactly the same thing. That’s bad.

Lotus, on the other hand, uses the hCard microformat in the payload, allowing standardized parsing and useful rendering even by downlevel clients.

The entry itself is generally just an envelope for an hCard and a linked vCard. Because the content element contains XHTML, dumb clients can render the feed content regardless of the fact that our hCard uses a number of Profiles-specific fields; and the linked vCard makes it significantly easier to integrate this feed with existing contact management systems. For instance, we have one customer who has used the Profiles feed to integrate directly into mobile device contact databases using the vcard links.

The main advantage the Profiles approach has over GData Contacts is that client applications do not have to write Profiles-specific code to integrate and interoperate. If the client understands Atom, XHTML, hCard and vCard, they can use the feed.

Standards are a good thing.

libxml_helper: a more delightful approach to parsing XML in Ruby

Libxml-Ruby is a blindingly fast way to parse XML in Ruby. However, many have complained that the interface is verbose and not especially Rubyish, and that the documentation lacks details and examples.

Thanks to the flexibility of Ruby, it’s easy to remedy this situation.

libxml_helper.rb, described here, opens up the XML::Node class from Libxml and adds helper functions that make it easier to use. The helpers also make it easier to use xpath with nodes that include default namespaces. (I intend to continue adding to the helpers to smooth rough edges as I encounter them.)

The interface was inspired by the fantastic HPricot, a “fast and delightful HTML parser”, and many HPricot examples will work unmodified.

Convenience functions

You can call to_xml_doc on any string to convert it into an XML::Document:

>> s = '<foo><author>p. bogle</author><bar>content</bar><bar>cont2</bar></foo>'
>> root = s.to_xml_doc.root

The at() method returns the first Node matching the given xpath:

>> root.at("author")
=> <author>p. bogle</author>

The search() method returns a list of Nodes matching the given xpath:

>> root.search("bar")
=> [<bar>content</bar>, <bar>content2</bar>]

search() can also be called with a block to iterate through each of the matching nodes:

>>  root.search("bar") do |bar| puts bar.xpath; end
/foo/bar[1]
/foo/bar[2]

The helper also improves the handling of default namespaces…

Read More

Urbanspoon adds the Bay Area and Los Angeles

Ethan writes on the Urbanspoon blog:

We’ve just started covering two more very important cities - or depending on how you look at it, about a hundred new cities.

Urbanspoon Bay Area - from Marin in the North to San Jose in the South, Oakland in the East to San Francisco in the West, with a combined total of 13,528 restaurants.

Urbanspoon Los Angeles - including a swath of other cities stretching from San Fernando to Long Beach, with 18,833 restaurants.

Urbanspoon is a cool search engine for restaurant information and reviews, combining links to newspaper reviews and other online sources with user contributed reviews.

(The founders of Urbanspoon are good buddies of mine, as you might guess from several points of integration.)

Open Search Platforms take root

Google’s announcement that they are discontinuing support for their Search API has added urgency to the development of open search platforms. 

Alexa and Nutch approach the problem from different directions, each of them interesting. 

Alexa provides a hosted, scalable search service that you pay for; you can access their full-text index of their web, distributing computations across their cloud, and creating new indices to capture the results of those computations. Costs are modest. 

Our goal is to give unparalleled and unlimited access to search. Just think of it… where else can you:

  • Take the reins of a Web crawler and direct it to crawl specific pages on specific domains and collect specific document types
  • Mine the documents in the crawl and generate custom indices
  • Reorder search results and create custom verticals
  • Use your own advertising solution

This is by no means a complete list. I just put it together to illustrate a point.

Where other search engines may give you access to their search results, they will tie your hands. You won’t be able to access the raw documents in their crawl, create your own index, reorder the results or even use your own advertising solution. In some extreme cases they will only provide results if you give over part of your page to their ajax script. Why would these search giants create search solutions that are obviously limited and of little use to inventors? Because they are not interested in helping to create their next competitor.

Alexa on the other hand… that’s exactly what we are here to do. We are here to build a platform for you. We are designing our services to be consumed and manipulated by developers and inventors. We fully expect that the next great search engine will be unimaginable to us and won’t be based on a plain vanilla search index from one of the big boys. It will be built and based on a new idea and it will require the kind of access that only Alexa can provide.

You can get started in the new and revamped Developer’s Corner.

Nutch, on the hand, is a fully open source web search engine; the work of creating a cloud to run Nutch on is up to you or a partner:

Web search is a basic requirement for internet navigation, yet the number of web search engines is decreasing. Today’s oligopoly could soon be a monopoly, with a single company controlling nearly all web search for its commercial gain. That would not be good for users of the internet.

Nutch provides a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.) All existing major search engines have proprietary ranking formulas, and will not explain why a given page ranks as it does. Additionally, some search engines determine which sites to index based on payments, rather than on the merits of the sites themselves. Nutch, on the other hand, has nothing to hide and no motive to bias its results or its crawler in any way other than to try to give each user the best results possible.

What is Evil 2.0?

Evil 1.0 was Microsoft.  Evil 2.0 is Apple and Google. 

I know Evil 1.0 well. I worked for it. I recognize Evil 2.0 easily by its signs.  I know the bad– and the good– that Evil can accomplish.

 Evil 2.0 must be understood.

Evil 1.0 was geeky and asocial. 

Evil 2.0 is cool and charismatic.

Evil 2.0 seduces.  We lust to touch the IPhone, even at the cost of our ability to choose software, carriers, and media formats.  No Skype or Ogg here, unless Apple and Cingular want them.

Evil 2.0 is lock in at the grandest scale. 

 

 

Evil 2.0 is smarter than you are.

Evil 2.0 misleads through paradox.

It is the evil that says “Don’t be evil”.  It’s the crawler that can’t be crawled, the datamining blackhole.

It’s the privately owned judge of who gets noticed on the internet.

Evil 2.0 is the Google search API that goes straight from beta to oblivion.

Evil 2.0 is all-knowing. No one knows exactly how much Google knows about you– except Google.

Take the thesis of Evil 1.0, synthesize with Evil 2.0, and toss in a good dash of Homeland Security, and we have a grave threat to freedom of software, thought and privacy. And we will greet our new overlords with flowers and iPhones.

Google more closed with data than Microsoft is with code?

Google is taking an approach towards search and datamining that is diametrically opposed towards the web search platform and Elastic Computing Cloud of Amazon/Alexa. 

Even as Amazon works to democratize access to web indexes and scalable web services, Google is increasingly becoming a data black hole, collecting and mining data from everywhere for its own purposes but exposes little of what what is mined to other web services.

Google’s Search API was always more of a toy than a real platform– the limit of 1000 queries per day was uselessly small, as was the limitation of only being able to access the first 1000 results of any query.

Recently, Google discontinued support for their Search API, and is no longer issuing new API keys.

What would people say if Microsoft stopped issuing “API keys” and documentation for Windows developers?

A project called EvilAPI aims to be a replacement for the SOAP Search API, and is an expression of the frustration felt by developers left out in the cold.

In response to Google’s discontinuation of support for their SOAP Search API, we have created the EvilAPI. The EvilAPI supports most of the same SOAP calls that Google’s SOAP Search API supports — it just doesn’t use their deprecated API to get the data. Instead, it uses page scraping. Evil? Maybe. But not nearly as evil as providing a powerful development tool for people who are loyal to Google and then discontinuing it without any warning or regard to their users.

Of course Google disallows EvilAPI in their terms of service and can break it at any time they chose.

There’s a clear parallel between Google’s closed approach to data and Microsoft’s closed approach to software, and a clear need for a more open marketplaces for data as pioneered by Amazon. Fortunately, developers do have a choice and a chance to exert market pressure on Google.

The best Rails is a virtual Rails: Virtualization for Mac and Windows

I’ve noticed an interesting trend towards mainstream developer use of virtual machines:  Those of us who develop in Rails in Jobster increasingly do so on Linux, either real or virtualized. 

Linux tends to be better supported than either Windows or OSX when it comes to Rails gems– the libxml gem for instance, is critical for fast XML parsing, but is difficult if not impossible to run on Windows. Linux is also better supported than Intel OSX for things like the closed source (boo hiss) Oracle OCI8 drivers.

There are even performance advantages. Tools like Subversion perform much faster in virtualized Linux than they do in native Windows, and even Rails startup time feels faster.

Virtualized Linux is now fast and affordable– VMWare Server is a free download for Windows, and Parallels workstation for the Mac is only $49.

A typical setup shares out the virtualized drive with the Windows or Mac box so that native development environment can continue to be used.  (The virtual machine host allocates a new dynamic IP address for the virtual machine and transparently bridges traffic to the virtual MAC address, so it truly appears as a separate machine to the local network.)

The last compelling aspect of virtualization is the ability to create a system image with all of the tools and gems a developer needs to be productive.  In a few minutes this image can be copied to a developers machine, be mounted, and running.

(If you use virtualized Ubuntu and want to share system images, you should definitely read this thread about a script to update the cached MAC address when you copy an image, otherwise your eth0 ethernet device won’t work in a cloned machine.)

It’s clearly only a matter of time before virtualization becomes mainstream for all users and not just developers. 

S3 + Rake = Easy Rails Backups

Peter Cooper blogs on Easy Backups for SVN Repositories, Databases, and Code using Amazon’s S3 service. 

Adam Greene has put together a great set of Rake tasks that use the Amazon S3 file storage service (and Amazon’s own Ruby API) to make backing up your Rails application’s code and databases easy. All it takes is a single call to Rake and you’re backed up on Amazon’s redundant, secure systems.

(Discovered via the Ruby Advent Calendar.) 

acts_as_ferrett: easy, efficient full-text search for Rails applications

Based on a tip from Russell Williams, I’ve played a little bit with acts_as_ferret and like what I see so far.   

Ferret is a port of the Lucene full-text engine for Ruby, and acts_as_ferret is a plugin for Rails that makes it easy to make any ActiveRecord model full-text searchable. 

Roman Mackovcak provides a set of ferret recipes, including how to do pagination, which is not supported out of the box. 

Ferret supports the same rich query syntax that Lucene does and can read files created using standard Lucene without conversions. Some of the supported query syntax options include wildcard searches (te?t matches test), fuzzy matching based on Levenshtein distance (dictonary~ matches dictionary), proximity searches, range searches, keyword weight, and boolean operators.

Update: Dion notes that Ferret no longer uses the Lucene file format, unfortunately.. The file format was changed in order to improve performance.

Hibernates loves Spring?

To accompany the rumored renewal of affections between Brad and Jen, eWeek is reporting a thaw in the chill between Hibernate and Spring:

Despite an often tense relationship between leaders in their respective open-source communities, the Spring and JBoss leaders are now talking about a truce.

Rod Johnson, chief executive of Interface21, the company that maintains the Spring Framework, told eWEEK that he would welcome an opportunity to work with JBoss. Johnson spoke with eWEEK just weeks after JBoss leader Marc Fleury told eWEEK he was open to working with the Spring community in some fashion.

The apparent thaw in the often chilly relationship could signal a big boon to Java developers who use the Spring Framework with JBoss’ Hibernate technology. Spring is a lightweight Java application framework that helps developers avoid the complexity of the Java 2 Platform Enterprise Edition (J2EE), while Hibernate is an object/relational persistence and query service for Java.

Reporting the conflict from the frontlines was Jobster’s senior war correspondant Scott Haug: 

The height of the friction between the two camps was perhaps best captured in a blog post from last year by Scott Haug, a developer at Jobster, entitled “Hibernate Hates Spring.”

However, many developers who posted comments to Haug’s post said they use both Spring and Hibernate, and many called for a truce.