ClickAider
You are currently browsing the Bogle’s Blog weblog archives.

ScrubyT 0.2.6 released

The authors of scrubyt just announced the release of scrubyt 0.2.6

scrubyT is a Ruby scraping framework built on top of Hpricot and Mechanize; it’s most interesting feature is the ability to automatically derive XPath extraction expressions based on a “training” parser that includes specific examples of phrases to extract from pages.  The new version includes some valuable improvements, including automatic crawling of detail pages and regex-based specification of example data.

We have just released the new version, 0.2.6 with some great new features, tons of bugfixes and lot of changes overall which should greatly affect the reliability of the system.

A lot of long-awaited features have been added: most notably, automatic crawling to the detail pages, which was the most requested feature in scRUBYt!’s history ever. I will add a tutorial and detailed example on how to use this feature, which enables you to easily crawl a whole site.

Another great addition is the improved example generation - you don’t have to use the whole text of the element you would like to match anymore - it is enough to specify a substring, and the first element that contains the string will be returned. Moreover, you can use also regular expressions, in which case the first element with text content matching the regexp will be returned. If this still won’t be enough, it is possible to create a compound example like this:

flight :begins_with => 'Arrival', :contains /d{4}/, :ends_with => '20:00'

I guess it’s quite intuitive how should this work.

We have finished to fix an enormous amount of bugs and tested the whole system thoroughly, so the overall reliability should be improved a lot as opposed to the previous releases.

If you have any comments, questions, suggestions, please visit the brand new forum!

Detecting common Ruby errors at definition time using ParseTree and method_added

Mark Aiken pointed out a really bad Ruby gotcha-- Ruby doesn't automatically call the base class initializer when you use inheritance, nor does it warn you if you forget to call super. This behavior is different from just about every other object oriented language, and can lead to partially initialized objects and really hard to diagnose errors. (In his case, he was subclassing ActiveRecord::Base.)

Perhaps future versions of Ruby will fix this misfeature, but in the meanwhile Ruby is dynamic enough that it's possible to detect and warn specifically about such errors as soon as classes are loaded.

The following approach is really nothing more than proof of concept, but hopefully could be the basis for a real verifier that could be loaded in development mode to spot common errors. The idea is to write a verify_class function that uses ParseTree to convert a class into a set of sexps and then looks for common errors statically, like forgetting to call super in the initialize function of a class that has a class that has a superclass. (In addition, the verifier could potentially wrap methods to add additional runtime checks.) The method_added method is wrapped to call the verify_class method whenever a new method is added to the class.

RUBY:
  1. require 'ruby2ruby'
  2.  
  3. class Class
  4.   alias_method :method_added_orig, :method_added
  5.  
  6.   def verify_class(clazz)
  7.     for node in ParseTree.new.parse_tree(clazz)
  8.       if node[0] == :class && node[2] != [:const, :Object]
  9.         for method in node[3..-1]
  10.           if method[0..1] == [:defn, :initialize]
  11.             statements = method[2][1]
  12.             if statements.is_a?(Array) &&
  13.               !statements.detect {|s| s == [:zsuper]}
  14.               raise "Error, initialize defined in subclass without call to super"
  15.             end
  16.           end
  17.         end
  18.       end
  19.     end
  20.     true
  21.   end
  22.  
  23.  
  24.   def method_added(p)
  25.     method_added_orig(p)
  26.     if p = "initialize"
  27.       puts "Verifying #{p}"
  28.       verify_class(self)
  29.     end
  30.   end
  31.  
  32. end

The following irb session shows the verifier in action:

RUBY:
  1. irb(main):002:0> class Foo; def initialize; end; end
  2. Verifying initialize
  3. => nil
  4. irb(main):003:0> class Good < Foo; def initialize; super; end; end
  5. Verifying initialize
  6. => nil
  7. irb(main):004:0> class Bad < Foo; def initialize;  end; end
  8. Verifying initialize
  9. RuntimeError: Error, initialize defined in subclass without call to super

Urbanspoon adds the Bay Area and Los Angeles

Ethan writes on the Urbanspoon blog:

We’ve just started covering two more very important cities - or depending on how you look at it, about a hundred new cities.

Urbanspoon Bay Area - from Marin in the North to San Jose in the South, Oakland in the East to San Francisco in the West, with a combined total of 13,528 restaurants.

Urbanspoon Los Angeles - including a swath of other cities stretching from San Fernando to Long Beach, with 18,833 restaurants.

Urbanspoon is a cool search engine for restaurant information and reviews, combining links to newspaper reviews and other online sources with user contributed reviews.

(The founders of Urbanspoon are good buddies of mine, as you might guess from several points of integration.)

Announcing Beyond411 v3.9 (aka Berry411)

Beyond411 v3.9 is now available for OTA Install, you can upgrade using the "About/Upgrades" menu item.  The new features are a much cleaner look and feel and the restoration of the address book integration feature in a way that should compatible across all phones.

Update: 3.91 fixes an autocompletion bug and is a recommended update.

Free TaxCut Premium Download

It's tax season again and that means... procrastination!

If you're like me and have procrastinated this far, you can download H&R Block Taxcut for free courtesy of Travelocity. (Some report that you need to be using IE rather than Firefox for the download to work.)

Sounds too good to be true but I've downloaded it and it actually works. DeductionPro is included as well.

GoogleBase ineffectiveness

Jason posts some interesting statistics on GoogleBase effectiveness (or lack thereof):

Since July 27, 2006 Jobster's customers have posted 12,197 jobs via Jobster's hiring tools onto Googlebase. Those 12,197 jobs have received a grand total of 124 candidates from Googlebase. (No, that is not a typo. 124 is the actual number). Only 13 of those jobs -- 13 out of 12,197 -- have ever received more than 1 candidate. And only 2 jobs have received more than 2 candidates. So, yes, there's plenty of room for improvement.

When GoogleBase came out, I had high hopes that it represented a shift in the way jobs were advertised on the internet, though I never went as far as some did in imagining it to be an eBay killer.

Why hasn't GoogleBase achieved more traction, even with Google linking directly to it from their search results? I think part of is that Google directs all traffic to a geeky, one-size-fits-all UI that isn't particularly appealing or optimized for any particular vertical. (The UI certainly isn't up to the level of polish of other Google applications like GMail.)

They would do better off focusing on the core data platform and partnering with specialists in particular verticals to design a great user experience.

Red Mill Burgers makes the Wall Street Journal

In this Saturday's WSJ, Red Mill Burgers was highlighted along with eight other restaurants in a nationwide survey of the best burgers

A bunch of us make a weekly pilgrimage to the Interbay branch and love it- I'm a particular fan of the Verde Veg - so I was happy to see the acclaim. 

Digital Face Beautification

Digital Face Beautification, from SIGGRAPH 2006, presents fully automatic algorithms for face beautification.  (Via John

It says something that a machine can so easily capture our notions of physical beauty, and that beauty in music and language are so much harder to capture.

My First Experiment with Google Co-Op

Google co-op is quite interesting. I put together a simple career advice search engine on top of it.  The page also shows the ability to brand the results and embed them in a site of your choice.

The sites searched include Jobster career community, Wetfeet Guides, the Fortune Top 100 best places to work, Hoover's, and others.

Learning from your users

Jobster just launched a very early version of our Career Center on Facebook. Shipping early and incorporating learnings from early adopters is going to help make us the final version much stronger.

I'm impressed by the intelligence of the suggestions from the Facebook members who have participated with suggestions in the discussion board.

This prescient suggestion is from Neil Cauldwell, a consultant and student in Leeds, UK:

I'd recommend implementing contextual ads for the featured partners on the front page. For example, I'm in the UK, yet I'm seeing ads for the NYPD career opportunities. Why not make these more relevant? You'll get a much wider adoption of Jobster profiles through Facebook if visitors to the CC see jobs that match them. Just look at what Amazon are doing in regards to matching content to users.

Why not even create an algorithm that pulls up a job advert for each visitor based on their friend connections and networks. Say for example I have 50 friends who work for Royal Dutch Shell Plc. If I've just visited some Facebook friends in this network, the CC could track this and pull up the jobs which would keep me in touch with my friends.

This suggestion comes from Gwen, who runs a campus career center.

Hi! I had a quick question. While encouraging students to use Facebook to network for jobs is great, we also want them to use the resources close at hand (On campus interviewing and research opportunies, for example.)

How can we make our campus resources visible to our students?
Also, who are the people in the Career Center group? Mostly recruiters? Students? Other?

It's great to start developing a dialog with the Facebook community so that we can provide value both to them and our employer customers.