Impressum / Imprint

Ferret 0.11.8

Posted on August 06, 2010

Just in case you didn’t notice - Ferret development has continued during the last year.

Unfortunately no new gem has been released for a while, so in case you want to give it a try here’s how to build and install your very own Ferret 0.11.8.1:

For me the new version fixed some rare crashes occuring when running Ferret along with acts_as_ferret under Ruby Enterprise Edition 1.8.7. Reading the commit messages it also looks like this version is ready to compile and run with Ruby 1.9.

If you’re feeling lazy you can also download the Ferret 0.11.8.1 gem from this blog.

Just to prove I'm not dead,

Posted on February 18, 2010

I moved the ActsAsFerret wiki and issue tracker to Github and Lighthouse.

It’s not that I don’t like Redmine any more, but unfortunately it caused me trouble bringing the whole server to a halt every few weeks. I think I could have fixed that somehow but I feel that hosting Redmine isn’t one of my core competences…

Update: Here’s the script I used to import the existing Redmine Tickets into Lighthouse.

Passenger versus safe_erb

Posted on November 07, 2008

Today I had some fun tracking down a weird problem with safe_erb. While everything worked fine running Mongrel in development mode, safe_erb complained about outputting tainted strings for every link generated by Rails’ link_to and URL helpers running on mod_rails in production mode.

Some digging around led me to the root of the problem - in production my app needs to live inside a subdirectory and so I used Passengers RailsBaseURI directive to tell it so. The value configured this way ends up tainted in AbstractRequest’s relative_url_root for some reason, which in turn makes every URL generated by Rails tainted.

Solution


class ActionController::AbstractRequest
  def relative_url_root_with_untaint
    relroot = relative_url_root_without_untaint
    relroot.untaint if relroot =~ /^\/[a-zA-Z0-9]*$/ or relroot.blank?
    return relroot
  end
  alias_method_chain :relative_url_root, :untaint
end

This untaints the relative_url_root value if it matches the regexp. Place into application.rb or some file that is required during application startup to fix the problem. I’m still not sure whether this is a bug and if so, whose bug it is - should (if possible at all) mod_rails untaint this value in the first place, or is it a bug with Rails not escaping something somewhere?

The fact that URLs used with Rails’ form helpers didn’t yield safe_erb errors, but those supplied to link_to did makes me think that there’s at least some inconsistency in the way URLs are treated by Rails’ helpers…

Get a new bike, powered by Rails

Posted on June 25, 2008

Working with webit! I recently built the dynamic parts of the new website of Fahrrad-XXL, a quite large group of bike dealers here in Germany. Besides the product catalog, which is maintained via a separate Rails app, there’s also lots of static content which is managed with the help of Bricolage, a Perl CMS generating static html pages.

I originally intended to name this post Find a new bike, powered by Ferret, because the full text search is one of the coolest features of the site. But maybe I’m a bit biased here ;-). In fact, I’ll delay the Ferret stuff to a later post and instead tell you about another interesting aspect of this project.

Integration of CMS driven static html with Rails

As long as the static pages are completely static that’s no big deal and we already did that in other projects: just have the CMS generate your Rails layouts to ensure visual consistency across the site, and let it publish it’s files into public/. But what do you do if your static pages aren’t really that static?

At www.fahrrad-xxl.de you have a watch list where you can remember bikes you want to revisit later on. The number of bikes currently on this list is shown on every page, regardless of whether it stems from the CMS or comes out of the Rails application.

So how can you do this? The first solution that comes to mind is of course to embed some ERb rendering the watch list status, and pipe each and every page through Rails. This might have worked but why take on the whole overhead of Rails compared to serving the file directly through the web server just for that tiny little number?

I wanted to do better, and remembered a blog post talking about nginx server side includes I stumbled across a while ago. The SSI feature of nginx is really cool because it allows you to include dynamically generated content retrieved via HTTP somewhere in your page.

So what we did was building an action that just rendered that tiny snippet showing the status of your watch list, and put an SSI include directive pointing to this action on every page:

Before serving such a page to the user, nginx will parse the page for any SSI directives, retrieve the content from the location specified there, and replace each directive with the content:

The good

So what have we gained? It’s still one request to the rails application per page, but it’s one that’s substantially faster because it only needs to output a few bytes of text instead of the whole page.

In fact we started using nginx’ SSI in several other places, too, which yielded another benefit of this approach: with pages containing multiple SSI directives you will experience multiple concurrent requests hitting the Rails app, which is a good thing because by splitting the task of rendering a page into several smaller tasks these may be distributed across multiple CPU cores and/or physical servers.

Having pages composed of multiple small snippets also eases caching: the decision whether a particular piece of content is eligible for caching and when it has to be expired is far easier to make for a small and focused snippet, than for a whole page potentially containing data with different life cycles. For example in our case we could easily page cache the watch list status snippet and expire it once the user modifies his watch list. Or, even better, use memcached to store the rendered snippet and have nginx retrieve it directly from there as shown in the blog post I mentioned above.

The ugly

There are some things you should be aware of when trying this approach out:

No setting of cookies

Your actions rendering stuff included via SSI cannot set cookies because the response you send is received by nginx, and not by the client. And since it wouldn’t make any sense for nginx to try and merge headers from multiple SSI responses into the single response that gets sent back to the client, it silently drops all those headers. So, no cookies, and especially no modification of session data if you use the cookie based session store. Which leads us to the next point:

Don’t use Rails’ stock session stores

At least if you intend to have multiple SSI-directives on a single page and the corresponding actions are session-aware (i.e. you don’t call session :off for them). After processing a request Rails by default writes back the session data to the configured session store, even if you didn’t touch the session at all. That doesn’t hurt much as long as you don’t have concurrent requests for the same session, or none of those requests modifies the session state.

With nginx and SSI you will have concurrent requests for the same session, so now if at least one of these requests changes session data, there’s a good chance it won’t end up stored correctly just because another request finished a bit later, overwriting the updated session data with stale data read from the session store before the change has been saved by the first request. Fun stuff, cost me a day to debug ;-)

To get around this issue we’re using the SmartSessionStore plugin. You can find out more about the issue and the plugin in this fine blog post at texperts.com.

Acts_as_ferret moved to Git

Posted on May 17, 2008

I’ve been using git-svn for a while now with nearly all my svn-based projects, so this was a logical step when Rubyforge started offering git hosting a while ago.

So now you can use

git clone git://rubyforge.org/actsasferret.git

to get your local copy of acts_as_ferret.

I plan to keep the svn repository’s trunk in sync with the master branch, so svn users aren’t left out in the cold.

Saxon government's press releases now powered by JRuby on Rails

Posted on April 07, 2008

Medienservice Sachsen Last week, the Medienservice, the platform via which the saxon government publishes its press releases to journalists and to the public, has been relaunched. It now runs on a cluster of JBoss servers that are part of the official saxon e-government platform. While the public web frontend might look like just another Blog-like application to you, I assure you that the stuff that happens in the background is anything but simple - there’s a lot of stuff going on like deferred publishing, publishing press releases only to subscribed journalists, and sending out press releases in four different formats including PDF and XML, to only name a few.

As far as I know this is the first public german JRuby on Rails application - one more reason for me to be proud of being part of the team at webit! that built this baby.

RDig 0.3.5

Posted on February 26, 2008

RDig is a tiny web and file system crawler built on top of the Ferret search engine. It’s one of my less active side projects and from what I can tell doesn’t have a very large user base. However there are some people out there who actually use it, and some of those people even tell me so and suggest new features from time to time :-)

Limit crawling depth

You can now configure a maximum crawling depth to restrict RDig to only index pages up to this level. For example, setting config.crawler.max_depth = 1 will make RDig only index the configured start pages, and pages the start pages directly link to. You get the picture I guess.

This option is especially useful if restricting RDig to a pre-defined number of hosts is not an option for your use case, but you still don’t intend to have it crawl the whole web.

HTTP proxy auth support

If you are behind a proxy and have to use HTTP Basic Authentication with it to get through, you can specify proxy url, user name and password:

cfg.crawler.http_proxy = "http://yourproxy:8080"
cfg.crawler.http_proxy_user = "username"
cfg.crawler.http_proxy_pass = "secret"

Under the hood

I put some work into refactoring parts of RDig in order to make integration with acts_as_ferret easier. I’ll write more about that in another post.

Get it!

RDig is available as a gem via Rubyforge.

Regexps on steroids with Ruby 1.8.x

Posted on January 27, 2008

Ruby 1.9 comes with a new powerful regular expression engine called Oniguruma. It sports better handling of UTF8 encoded content, plus goodies like positive and negative look-behind or named matches. Here’s a good overview about these and some more of the new features of Oniguruma.

There are two ways to get Oniguruma into a pre-1.9 Ruby: You can patch the Ruby source tree with Oniguruma and build your own Ruby, or use the Oniguruma gem, which makes it fairly easy to use the new style regular expressions in any Ruby 1.8.x project. Here’s how:

$ wget http://www.geocities.jp/kosako3/oniguruma/archive/onig-4.7.1.tar.gz
$ tar xzf onig-4.7.1.tar.gz
$ cd onig-4.7.1
$ ./configure --prefix=/usr
$ make
$ sudo make install
$ sudo gem install oniguruma

Note the prefix argument in the call to configure - it should point to the location of your current ruby installation. So if your ruby executable is located in /usr/bin, you’ll have to use /usr here as shown above.

If everything went well so far, try it out in irb:

require 'rubygems'
require 'oniguruma'
reg = Oniguruma::ORegexp.new '(?.*)(a)(?.*)'
match = reg.match( 'terraforming' )
puts match[0]         <= 'terraforming'
puts match[:before]   <= 'terr'
puts match[:after]    <= 'forming'

The downside of not having Oniguruma patched into a self-compiled version of Ruby is that something like

'terraforming' =~ /(?.)(a)(?.)/
won’t work because it will be handled by your Ruby version’s built in regexp rengine.

Encrypted root and swap plus suspend to disk with Gutsy

Posted on January 24, 2008

In order to give my trusted T42 a slight speed up I decided to replace the built-in 5400rpm hdd with something faster. I decided to go with Seagate’s ST910021A which seems to be a great choice from what I can tell so far - it’s noticeably faster and despite it’s 7200 rpm it’s nearly as quiet as the original 80GB disk from Hitachi.

But I digress. Initially I just wanted to copy over all the stuff and be done with it, but then I took the chance to do a fresh install of Ubuntu so I could try out the hard disk encryption setup that has been introduced in the alternate installer of 7.10. Until then I only had an encrypted /home, which was pretty useless since most of the time my notebook isn’t shut down but hibernated, and I never needed to type in my passphrase upon resume…

Grails project begging for attention

Posted on January 14, 2008

Sorry, but I can’t think of any other reason why Graeme Rocher might write such crap.

Among the points he makes when trying to convince his readers why they should choose Grails over Rails, there are at most two or three which are somewhat reasonable, namely those dealing with integrating your application with external J2EE based services. I completely agree that these are valid points when comparing Grails running inside a fully-fledged J2EE container to Rails running in, say, Mongrel. But since Rails runs fine in a J2EE environment as well, that’s an unfair and misleading comparison.

Using container-managed database connections via JNDI in a JRuby on Rails web app is no problem at all, neither is using Quartz to schedule Rails background jobs, just to name a few examples. I don’t see anything stopping people from using the whole range of J2EE container features in their JRuby/Rails applications once they need to do so.

The whole ‘Grails is more enterprisey than Rails’ argumentation falls apart once you stop comparing apples to oranges, and slap JRuby + Rails onto that damn app server.

Grails 1.0 coming out within the month

Uh cool, yeah. Until now I thought statements like this were more a specialty of closed source vendors trying to convince their potential clients not to check out the competition. Looks like they can’t wait to finally attach that decision maker friendly 1.0 label to Grails ;-)

Anyway, I feel we’re all going to have an interesting time in the future watching how the competition between JRuby/Rails and Groovy/Grails goes on. After all, competition tends to lead to better products in the end.