ClickAider
You are currently browsing the Bogle’s Blog weblog archives for the day Thursday, November 16th, 2006.

Ruby Module for Searching using the Alexa Web Services API

This post includes an improved version of the query example in Ruby for hitting the Alexa Web Search web service. Sample usage is as follows.

require 'alexa'
Alexa.search('bogle').each {|hit| puts hit.title}

For convenience, you might want to edit the code below to replace INSERT_YOUR_ACCESS_KEY_HERE with your actual access key and INSERT_YOUR_SECRET_ACCESS_KEY with your secret access key; otherwise these will need to be passed in as the AWSAccessKeyId and SecretAccessKey options.

Read Full Post

Ruby Threading

Ruby threads are useful, but it’s important to understand the limitations and pitfalls of the current implementation. This post is an attempt to pull together some of the key information from various places on web that helped me get up to speed.

Limitations of Ruby Threads

As explained in the Rubyspec wiki, all Ruby threads are serviced by a single native OS thread. The VM schedules threads by timeslicing at well-defined points in the VM.  This means that a single misbehaved Ruby thread could starve out all other Ruby threads (although the timeslicing points are many and hard to avoid), and that Ruby threads don’t take advantage of multiple processors or scale to high-end hardware. 

For this reason, applications that care about scale typically create multiple Ruby processes.  For instance, a web server might spawn multiple FastCGI worker processes are created rather than relying upon a single mulithreaded server as might happen in Java.

Using Threads with ActiveRecord

If you want to use ActiveRecord from multiple threads, you must include the following call in your code, as described here:

ActiveRecord::Base.allow_concurrency = true

Exceptions in threads

 As with other languages, be careful with unhandled exceptions thrown within threads. the default behavior is to silently kill the thread that caused the exception. The Threads and Exceptions section has helpful background.  When I started coding a multithreaded crawler, I failed to handle an exception causing all of my threads to eventually die, leaving me scratching my head about what happened to them. 

If you set Thread.abort_on_exception = true, then an unhandled exception will cause all threads to terminate. This can be helpful in debugging and making unhandled exceptions really obvious.