Rubyful Soup: A scraping tool for Ruby
Beautiful Soup is my favorite python parsing and scraping tool. I was delighted to discover that the author has created a Ruby port called Rubyful Soup
1. Rubyful Soup won’t choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and then run away.
2. Rubyful Soup provides a few simple methods and Ruby-like idioms for navigating and searching a parse tree: a toolkit for dissecting a document and extracting what you need. You don’t have to create a custom parser for each application. It’s more flexible and easier to learn than XPath.
I enjoy the author’s sense of humor. This comment block is from the original:
# Enterprise class names! It has come to our attention that some people
# think the names of the Beautiful Soup parser classes are too silly
# and “unprofessional” for use in enterprise screen-scraping. We feel
# your pain! For such-minded folk, the Beautiful Soup Consortium And
# All-Night Kosher Bakery recommends renaming this file to
# “RobustParser.py” (or, in cases of extreme enterprisitude,
# “RobustParserBeanInterface.class”) and using the following
# enterprise-friendly class aliases:
class RobustXMLParser(BeautifulStoneSoup):
pass
class RobustHTMLParser(BeautifulSoup):
pass
class RobustWackAssHTMLParser(ICantBelieveItsBeautifulSoup):
pass
class SimplifyingSOAPParser(BeautifulSOAP):
pass