ClickAider

An HPricot-style interface to LibXml-Ruby

Libxml-Ruby is a blindingly fast way to parse XML in Ruby. However, many have complained that the interface is verbose and not especially Rubyish, and that the documentation lacks details and examples.

Thanks to the flexibility of Ruby, it's easy to remedy this situation.

libxml_helper.rb, described here, opens up the XML::Node class from Libxml and adds a set off helper functions that make it easier to use. The helpers also make it easier to use xpath with default namespaces. (I intend to continue adding to the helpers to smooth rough edges as I encounter them.)

The interface was inspired by the fantastic HPricot, a "fast and delightful HTML parser", and many HPricot examples will work unmodified.

Convenience functions

You can call to_xml_doc on any string to convert it into an XML::Document:

>> s = '<foo id="1"><author>p. bogle</author><bar>content</bar><bar>cont2</bar></foo>'
>> root = s.to_xml_doc.root

The at() method returns the first Node matching the given xpath:

>> root.at("author")
=> <author>p. bogle</author>

The search() method returns a list of Nodes matching the given xpath:

>> root.search("bar")
=> [<bar>content</bar>, <bar>content2</bar>]

search() can also be called with a block to iterate through each of the matching nodes:

>>  root.search("bar") do |bar| puts bar.xpath; end
/foo/bar[1]
/foo/bar[2]

Namespace helpers

The handling of default namespaces in libxml-ruby is awkward because you have to remember to pass along an array of namespace strings to every find() method call, and because you have to repeat yourself about the href of the default namespace.

The helpers add a register_default_namespace function that makes this simpler.

Suppose you had XML like the following

XML:
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <feed xmlns="http://www.w3.org/2005/Atom"
  3.         xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
  4.         xmlns:gContact="http://schemas.google.com/contact/2008\"
  5.         xmlns:gd="http://schemas.google.com/g/2005">
  6.   <title type=\"text\">Phil Bogle's Contacts</title>
  7.    ...
  8. </feed>

Then you could say the following and have it work as expected:

root.register_default_namespace("atom")
root.search("atom:title")

Code for libxml_helper.rb

Here is the code for the helper. Just make sure to require this file somewhere in your app.

RUBY:
  1. require "xml/libxml"
  2.  
  3. class XML::Node
  4.   ##
  5.   # Open up XML::Node from libxml and add convenience methods inspired
  6.   # by hpricot.
  7.   # (http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics)
  8.   # Also:
  9.   #  * provide better handling of default namespaces
  10.  
  11.   # an array of default namespaces to past into
  12.   attr_accessor :default_namespaces
  13.  
  14.   # find the child node with the given xpath
  15.   def at(xpath)
  16.     self.find_first(xpath)
  17.   end
  18.  
  19.   # find the array of child nodes matching the given xpath
  20.   def search(xpath)
  21.     results = self.find(xpath).to_a
  22.     if block_given?
  23.       results.each do |result|
  24.         yield result
  25.       end
  26.     end
  27.     return results
  28.   end
  29.  
  30.   # alias for search
  31.   def /(xpath)
  32.     search(xpath)
  33.   end
  34.  
  35.   # return the inner contents of this node as a string
  36.   def inner_xml
  37.     child.to_s
  38.   end
  39.  
  40.   # alias for inner_xml
  41.  def inner_html
  42.     inner_xml
  43.   end
  44.  
  45.   # return this node and its contents as an xml string
  46.   def to_xml
  47.     self.to_s
  48.   end
  49.  
  50.   # alias for path
  51.   def xpath
  52.     self.path
  53.   end
  54.  
  55.   # provide a name for the default namespace
  56.   def register_default_namespace(name)
  57.     self.namespace.each do |n|
  58.       if n.to_s == nil
  59.         register_namespace("#{name}:#{n.href}")
  60.         return
  61.       end
  62.     end
  63.     raise "No default namespace found"
  64.   end
  65.  
  66.   # register a namespace, of the form "foo:http://example.com/ns"
  67.   def register_namespace(name_and_href)
  68.     (@default_namespaces ||= []) <<name_and_href
  69.   end
  70.  
  71.   def find_with_default_ns(xpath_expr, namespace=nil)
  72.     find_base(xpath_expr, namespace || default_namespaces)
  73.   end
  74.  
  75.   def find_first_with_default_ns(xpath_expr, namespace=nil)
  76.     find_first_base(xpath_expr, namespace || default_namespaces)
  77.   end
  78.  
  79.  
  80.   alias_method :find_base, :find unless method_defined?(:find_base)
  81.   alias_method :find, :find_with_default_ns
  82.  
  83.   alias_method :find_first_base, :find_first unless method_defined?(:find_first_base)
  84.   alias_method :find_first, :find_first_with_default_ns
  85. end
  86.  
  87. class String
  88.   def to_libxml_doc
  89.     xp = XML::Parser.new
  90.     xp.string = self
  91.     return xp.parse
  92.   end
  93. end