ScrubyT 0.2.6 released
The authors of scrubyt just announced the release of scrubyt 0.2.6.
scrubyT is a Ruby scraping framework built on top of Hpricot and Mechanize; it’s most interesting feature is the ability to automatically derive XPath extraction expressions based on a “training” parser that includes specific examples of phrases to extract from pages. The new version includes some valuable improvements, including automatic crawling of detail pages and regex-based specification of example data.
We have just released the new version, 0.2.6 with some great new features, tons of bugfixes and lot of changes overall which should greatly affect the reliability of the system.
A lot of long-awaited features have been added: most notably, automatic crawling to the detail pages, which was the most requested feature in scRUBYt!’s history ever. I will add a tutorial and detailed example on how to use this feature, which enables you to easily crawl a whole site.
Another great addition is the improved example generation - you don’t have to use the whole text of the element you would like to match anymore - it is enough to specify a substring, and the first element that contains the string will be returned. Moreover, you can use also regular expressions, in which case the first element with text content matching the regexp will be returned. If this still won’t be enough, it is possible to create a compound example like this:
flight :begins_with => 'Arrival', :contains /d{4}/, :ends_with => '20:00'I guess it’s quite intuitive how should this work.
We have finished to fix an enormous amount of bugs and tested the whole system thoroughly, so the overall reliability should be improved a lot as opposed to the previous releases.
If you have any comments, questions, suggestions, please visit the brand new forum!
1 Comment so far
Leave a comment
[…] popular public links >> scrubyt scRUBYt! " Your First Extractor - a Simple to Learn and Use, yet… Saved by japanjill on Sat 13-12-2008 scrubyt rubyinline error on install Saved by esjp on Wed 10-12-2008 ScrubyT 0.2.6 released Saved by stefan on Sun 07-12-2008 links for 2008-04-25 Saved by KamikazieLX on Sat 22-11-2008 100+ Tools For a REAL Hacker Saved by HuzaifaMerchant on Tue 18-11-2008 Hip-Hop Muxtape Saved by unavalibleryan on Thu 30-10-2008 Attack of the Website Scrapers Saved by juarlapan on Thu 23-10-2008 .theDifinitive List* Saved by frykitty on Wed 22-10-2008 Import Tool - ScRUBYt Saved by joshuahinds on Tue 07-10-2008 RubyInline <= 3.6.4 Compiler Error on Leopard Saved by JeaniePinardDuhaime on Sat 04-10-2008 Ruby YahooFinance Module Saved by lingonen on Fri 19-9-2008 Euruko 2007 Review Saved by musicnation on Wed 10-9-2008 Scrubyt Saved by girngaz318 on Sun 07-9-2008 scRUBYt - Hot, New Ruby Web-Scraping Toolkit Released Saved by oberstr on Sun 24-8-2008 scRubyt Tutorial: Dogs of Hang Seng Index Saved by eirens on Fri 22-8-2008 […]
By Recent Links Tagged With "scrubyt" - JabberTags on 12.22.08 3:36 pm
Leave a comment