An HPricot-style interface to LibXml-Ruby
Libxml-Ruby is a blindingly fast way to parse XML in Ruby. However, many have complained that the interface is verbose and not especially Rubyish, and that the documentation lacks details and examples.
Thanks to the flexibility of Ruby, it's easy to remedy this situation.
libxml_helper.rb, described here, opens up the XML::Node class from Libxml and adds a set off helper functions that make it easier to use. The helpers also make it easier to use xpath with default namespaces. (I intend to continue adding to the helpers to smooth rough edges as I encounter them.)
The interface was inspired by the fantastic HPricot, a "fast and delightful HTML parser", and many HPricot examples will work unmodified.
Convenience functions
You can call to_xml_doc on any string to convert it into an XML::Document:
>> s = '<foo id="1"><author>p. bogle</author><bar>content</bar><bar>cont2</bar></foo>' >> root = s.to_xml_doc.root
The at() method returns the first Node matching the given xpath:
>> root.at("author")
=> <author>p. bogle</author>
The search() method returns a list of Nodes matching the given xpath:
>> root.search("bar")
=> [<bar>content</bar>, <bar>content2</bar>]
search() can also be called with a block to iterate through each of the matching nodes:
>> root.search("bar") do |bar| puts bar.xpath; end
/foo/bar[1]
/foo/bar[2]
Namespace helpers
The handling of default namespaces in libxml-ruby is awkward because you have to remember to pass along an array of namespace strings to every find() method call, and because you have to repeat yourself about the href of the default namespace.
The helpers add a register_default_namespace function that makes this simpler.
Suppose you had XML like the following
-
<?xml version="1.0" encoding="UTF-8"?>
-
<feed xmlns="http://www.w3.org/2005/Atom"
-
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
-
xmlns:gContact="http://schemas.google.com/contact/2008\"
-
xmlns:gd="http://schemas.google.com/g/2005">
-
<title type=\"text\">Phil Bogle's Contacts</title>
-
...
-
</feed>
Then you could say the following and have it work as expected:
root.register_default_namespace("atom")
root.search("atom:title")
Code for libxml_helper.rb
Here is the code for the helper. Just make sure to require this file somewhere in your app.
-
require "xml/libxml"
-
-
class XML::Node
-
##
-
# Open up XML::Node from libxml and add convenience methods inspired
-
# by hpricot.
-
# (http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics)
-
# Also:
-
# * provide better handling of default namespaces
-
-
# an array of default namespaces to past into
-
attr_accessor :default_namespaces
-
-
# find the child node with the given xpath
-
def at(xpath)
-
self.find_first(xpath)
-
end
-
-
# find the array of child nodes matching the given xpath
-
def search(xpath)
-
results = self.find(xpath).to_a
-
if block_given?
-
results.each do |result|
-
yield result
-
end
-
end
-
return results
-
end
-
-
# alias for search
-
def /(xpath)
-
search(xpath)
-
end
-
-
# return the inner contents of this node as a string
-
def inner_xml
-
child.to_s
-
end
-
-
# alias for inner_xml
-
def inner_html
-
inner_xml
-
end
-
-
# return this node and its contents as an xml string
-
def to_xml
-
self.to_s
-
end
-
-
# alias for path
-
def xpath
-
self.path
-
end
-
-
# provide a name for the default namespace
-
def register_default_namespace(name)
-
self.namespace.each do |n|
-
if n.to_s == nil
-
register_namespace("#{name}:#{n.href}")
-
return
-
end
-
end
-
raise "No default namespace found"
-
end
-
-
# register a namespace, of the form "foo:http://example.com/ns"
-
def register_namespace(name_and_href)
-
(@default_namespaces ||= []) <<name_and_href
-
end
-
-
def find_with_default_ns(xpath_expr, namespace=nil)
-
find_base(xpath_expr, namespace || default_namespaces)
-
end
-
-
def find_first_with_default_ns(xpath_expr, namespace=nil)
-
find_first_base(xpath_expr, namespace || default_namespaces)
-
end
-
-
-
alias_method :find_base, :find unless method_defined?(:find_base)
-
alias_method :find, :find_with_default_ns
-
-
alias_method :find_first_base, :find_first unless method_defined?(:find_first_base)
-
alias_method :find_first, :find_first_with_default_ns
-
end
-
-
class String
-
def to_libxml_doc
-
xp = XML::Parser.new
-
xp.string = self
-
return xp.parse
-
end
-
end