Impressum / Imprint

Passenger versus safe_erb

Posted on November 07, 2008

Today I had some fun tracking down a weird problem with safe_erb. While everything worked fine running Mongrel in development mode, safe_erb complained about outputting tainted strings for every link generated by Rails’ link_to and URL helpers running on mod_rails in production mode.

Some digging around led me to the root of the problem - in production my app needs to live inside a subdirectory and so I used Passengers RailsBaseURI directive to tell it so. The value configured this way ends up tainted in AbstractRequest’s relative_url_root for some reason, which in turn makes every URL generated by Rails tainted.

Solution


class ActionController::AbstractRequest
  def relative_url_root_with_untaint
    relroot = relative_url_root_without_untaint
    relroot.untaint if relroot =~ /^\/[a-zA-Z0-9]*$/ or relroot.blank?
    return relroot
  end
  alias_method_chain :relative_url_root, :untaint
end

This untaints the relative_url_root value if it matches the regexp. Place into application.rb or some file that is required during application startup to fix the problem. I’m still not sure whether this is a bug and if so, whose bug it is - should (if possible at all) mod_rails untaint this value in the first place, or is it a bug with Rails not escaping something somewhere?

The fact that URLs used with Rails’ form helpers didn’t yield safe_erb errors, but those supplied to link_to did makes me think that there’s at least some inconsistency in the way URLs are treated by Rails’ helpers…

Get a new bike, powered by Rails

Posted on June 25, 2008

Working with webit! I recently built the dynamic parts of the new website of Fahrrad-XXL, a quite large group of bike dealers here in Germany. Besides the product catalog, which is maintained via a separate Rails app, there’s also lots of static content which is managed with the help of Bricolage, a Perl CMS generating static html pages.

I originally intended to name this post Find a new bike, powered by Ferret, because the full text search is one of the coolest features of the site. But maybe I’m a bit biased here ;-). In fact, I’ll delay the Ferret stuff to a later post and instead tell you about another interesting aspect of this project.

Integration of CMS driven static html with Rails

As long as the static pages are completely static that’s no big deal and we already did that in other projects: just have the CMS generate your Rails layouts to ensure visual consistency across the site, and let it publish it’s files into public/. But what do you do if your static pages aren’t really that static?

At www.fahrrad-xxl.de you have a watch list where you can remember bikes you want to revisit later on. The number of bikes currently on this list is shown on every page, regardless of whether it stems from the CMS or comes out of the Rails application.

So how can you do this? The first solution that comes to mind is of course to embed some ERb rendering the watch list status, and pipe each and every page through Rails. This might have worked but why take on the whole overhead of Rails compared to serving the file directly through the web server just for that tiny little number?

I wanted to do better, and remembered a blog post talking about nginx server side includes I stumbled across a while ago. The SSI feature of nginx is really cool because it allows you to include dynamically generated content retrieved via HTTP somewhere in your page.

So what we did was building an action that just rendered that tiny snippet showing the status of your watch list, and put an SSI include directive pointing to this action on every page:

Before serving such a page to the user, nginx will parse the page for any SSI directives, retrieve the content from the location specified there, and replace each directive with the content:

The good

So what have we gained? It’s still one request to the rails application per page, but it’s one that’s substantially faster because it only needs to output a few bytes of text instead of the whole page.

In fact we started using nginx’ SSI in several other places, too, which yielded another benefit of this approach: with pages containing multiple SSI directives you will experience multiple concurrent requests hitting the Rails app, which is a good thing because by splitting the task of rendering a page into several smaller tasks these may be distributed across multiple CPU cores and/or physical servers.

Having pages composed of multiple small snippets also eases caching: the decision whether a particular piece of content is eligible for caching and when it has to be expired is far easier to make for a small and focused snippet, than for a whole page potentially containing data with different life cycles. For example in our case we could easily page cache the watch list status snippet and expire it once the user modifies his watch list. Or, even better, use memcached to store the rendered snippet and have nginx retrieve it directly from there as shown in the blog post I mentioned above.

The ugly

There are some things you should be aware of when trying this approach out:

No setting of cookies

Your actions rendering stuff included via SSI cannot set cookies because the response you send is received by nginx, and not by the client. And since it wouldn’t make any sense for nginx to try and merge headers from multiple SSI responses into the single response that gets sent back to the client, it silently drops all those headers. So, no cookies, and especially no modification of session data if you use the cookie based session store. Which leads us to the next point:

Don’t use Rails’ stock session stores

At least if you intend to have multiple SSI-directives on a single page and the corresponding actions are session-aware (i.e. you don’t call session :off for them). After processing a request Rails by default writes back the session data to the configured session store, even if you didn’t touch the session at all. That doesn’t hurt much as long as you don’t have concurrent requests for the same session, or none of those requests modifies the session state.

With nginx and SSI you will have concurrent requests for the same session, so now if at least one of these requests changes session data, there’s a good chance it won’t end up stored correctly just because another request finished a bit later, overwriting the updated session data with stale data read from the session store before the change has been saved by the first request. Fun stuff, cost me a day to debug ;-)

To get around this issue we’re using the SmartSessionStore plugin. You can find out more about the issue and the plugin in this fine blog post at texperts.com.

Acts_as_ferret moved to Git

Posted on May 17, 2008

I’ve been using git-svn for a while now with nearly all my svn-based projects, so this was a logical step when Rubyforge started offering git hosting a while ago.

So now you can use

git clone git://rubyforge.org/actsasferret.git

to get your local copy of acts_as_ferret.

I plan to keep the svn repository’s trunk in sync with the master branch, so svn users aren’t left out in the cold.

Saxon government's press releases now powered by JRuby on Rails

Posted on April 07, 2008

Medienservice Sachsen Last week, the Medienservice, the platform via which the saxon government publishes its press releases to journalists and to the public, has been relaunched. It now runs on a cluster of JBoss servers that are part of the official saxon e-government platform. While the public web frontend might look like just another Blog-like application to you, I assure you that the stuff that happens in the background is anything but simple - there’s a lot of stuff going on like deferred publishing, publishing press releases only to subscribed journalists, and sending out press releases in four different formats including PDF and XML, to only name a few.

As far as I know this is the first public german JRuby on Rails application - one more reason for me to be proud of being part of the team at webit! that built this baby.

RDig 0.3.5

Posted on February 26, 2008

RDig is a tiny web and file system crawler built on top of the Ferret search engine. It’s one of my less active side projects and from what I can tell doesn’t have a very large user base. However there are some people out there who actually use it, and some of those people even tell me so and suggest new features from time to time :-)

Limit crawling depth

You can now configure a maximum crawling depth to restrict RDig to only index pages up to this level. For example, setting config.crawler.max_depth = 1 will make RDig only index the configured start pages, and pages the start pages directly link to. You get the picture I guess.

This option is especially useful if restricting RDig to a pre-defined number of hosts is not an option for your use case, but you still don’t intend to have it crawl the whole web.

HTTP proxy auth support

If you are behind a proxy and have to use HTTP Basic Authentication with it to get through, you can specify proxy url, user name and password:

cfg.crawler.http_proxy = "http://yourproxy:8080"
cfg.crawler.http_proxy_user = "username"
cfg.crawler.http_proxy_pass = "secret"

Under the hood

I put some work into refactoring parts of RDig in order to make integration with acts_as_ferret easier. I’ll write more about that in another post.

Get it!

RDig is available as a gem via Rubyforge.

Regexps on steroids with Ruby 1.8.x

Posted on January 27, 2008

Ruby 1.9 comes with a new powerful regular expression engine called Oniguruma. It sports better handling of UTF8 encoded content, plus goodies like positive and negative look-behind or named matches. Here’s a good overview about these and some more of the new features of Oniguruma.

There are two ways to get Oniguruma into a pre-1.9 Ruby: You can patch the Ruby source tree with Oniguruma and build your own Ruby, or use the Oniguruma gem, which makes it fairly easy to use the new style regular expressions in any Ruby 1.8.x project. Here’s how:

$ wget http://www.geocities.jp/kosako3/oniguruma/archive/onig-4.7.1.tar.gz
$ tar xzf onig-4.7.1.tar.gz
$ cd onig-4.7.1
$ ./configure --prefix=/usr
$ make
$ sudo make install
$ sudo gem install oniguruma

Note the prefix argument in the call to configure - it should point to the location of your current ruby installation. So if your ruby executable is located in /usr/bin, you’ll have to use /usr here as shown above.

If everything went well so far, try it out in irb:

require 'rubygems'
require 'oniguruma'
reg = Oniguruma::ORegexp.new '(?.*)(a)(?.*)'
match = reg.match( 'terraforming' )
puts match[0]         <= 'terraforming'
puts match[:before]   <= 'terr'
puts match[:after]    <= 'forming'

The downside of not having Oniguruma patched into a self-compiled version of Ruby is that something like

'terraforming' =~ /(?.)(a)(?.)/
won’t work because it will be handled by your Ruby version’s built in regexp rengine.

Encrypted root and swap plus suspend to disk with Gutsy

Posted on January 24, 2008

In order to give my trusted T42 a slight speed up I decided to replace the built-in 5400rpm hdd with something faster. I decided to go with Seagate’s ST910021A which seems to be a great choice from what I can tell so far - it’s noticeably faster and despite it’s 7200 rpm it’s nearly as quiet as the original 80GB disk from Hitachi.

But I digress. Initially I just wanted to copy over all the stuff and be done with it, but then I took the chance to do a fresh install of Ubuntu so I could try out the hard disk encryption setup that has been introduced in the alternate installer of 7.10. Until then I only had an encrypted /home, which was pretty useless since most of the time my notebook isn’t shut down but hibernated, and I never needed to type in my passphrase upon resume…

Grails project begging for attention

Posted on January 14, 2008

Sorry, but I can’t think of any other reason why Graeme Rocher might write such crap.

Among the points he makes when trying to convince his readers why they should choose Grails over Rails, there are at most two or three which are somewhat reasonable, namely those dealing with integrating your application with external J2EE based services. I completely agree that these are valid points when comparing Grails running inside a fully-fledged J2EE container to Rails running in, say, Mongrel. But since Rails runs fine in a J2EE environment as well, that’s an unfair and misleading comparison.

Using container-managed database connections via JNDI in a JRuby on Rails web app is no problem at all, neither is using Quartz to schedule Rails background jobs, just to name a few examples. I don’t see anything stopping people from using the whole range of J2EE container features in their JRuby/Rails applications once they need to do so.

The whole ‘Grails is more enterprisey than Rails’ argumentation falls apart once you stop comparing apples to oranges, and slap JRuby + Rails onto that damn app server.

Grails 1.0 coming out within the month

Uh cool, yeah. Until now I thought statements like this were more a specialty of closed source vendors trying to convince their potential clients not to check out the competition. Looks like they can’t wait to finally attach that decision maker friendly 1.0 label to Grails ;-)

Anyway, I feel we’re all going to have an interesting time in the future watching how the competition between JRuby/Rails and Groovy/Grails goes on. After all, competition tends to lead to better products in the end.

Job scheduling with JRuby and Rails

Posted on January 12, 2008

As promised earlier, here’s the first of several articles I’m planning to write about running Rails on JRuby. Originially I wanted to start this little series with some kind of ‘getting started with JRuby on Rails’ guide. Since I didn’t find the time (or, say, motivation) to write one for weeks now, I decided to skip right through to some more advanced topics. So for this post, I’m assuming you already got your hello world JRuby on Rails project up and running and deployed to the application server of your choice. If this is not the case, have a look in the Wiki for documentation about getting started. It’s also worth following the various relevant blogs, as well as the jruby and jruby-extras mailing lists.

Ok, this post is titled Job scheduling with JRuby and Rails for a reason, so let’s get started with this now.

While Rails itself nowadays runs quite flawless inside application servers like JBoss or Glassfish, the usual way to handle background job (push it to some external daemon) doesn’t fit a J2EE application server environment particularly well. Besides the fact that I couldn’t think of an easy way to get BackgrounDrb running from my application’s WAR file, it simply doesn’t feel right to have any extra daemons like BackgrounDRb running besides that fat application server.

As it turns out, there are several ways to do better.

The GoldSpike solution

The kind folks from the jruby-extras team provide two servlets, RailsTaskServlet, and RailsPeriodicalTaskServlet as part of the GoldSpike plugin. These servlets can be used to run arbitrary Ruby code inside the context of your Rails application either once or periodically every n seconds. To schedule a job running YourJobClass.do_stuff every minute, you would place the following declaration into your web.xml:

<servlet>
  <servlet-name>periodicalTask</servlet-name>
  <servlet-class>org.jruby.webapp.RailsPeriodicalTaskServlet</servlet-class>
  <load-on-startup>1</load-on-startup>
  <init-param>
    <param-name>interval</param-name>
    <param-value>60</param-value>
    <param-name>script</param-name>
    <param-value>YourJobClass.do_stuff</param-value>
  </init-param>
</servlet>

Oh the joy of XML configuration files ;-)

While this works great, I had to run a job not every some seconds, but once a week on a defined day and time. Besides that, declaring a separate servlet for each single background job seems like overkill. Back from my Java days I knew that the Quartz library provided exactly what I needed - support for cron patterns. So the challenge was to get Quartz run my Ruby script inside the context of my application.

The rails_quartz plugin

Our target platform was JBoss, which already includes the Quartz library. I’m not sure about other app servers, if yours doesn’t have this already, just download quartz and put the jar including any dependencies somewhere in your application so it ends up inside the WEB-INF/lib folder of your WAR file. With Warbler, which is my preferred way to package JRuby on Rails applications, RAILS_ROOT/lib is good place, since it will pick up any jars from there automatically.

How does it work?

The plugin provides a ContextListener, which, when declared in web.xml, looks for any job declarations and tells the Quartz Scheduler about them. Here’s an example web.xml snippet scheduling a job to run every friday at 10 am:

<context-param>
  <param-name>yourJobCommand</param-name>
  <param-value>YourJobClass.do_stuff</param-value>
</context-param>
<context-param>
  <param-name>yourJobCronPattern</param-name>
  <param-value>0 0 10 ? * 6</param-value>
</context-param>

<listener>
  <listener-class>org.jruby.webapp.quartz.QuartzContextListener</listener-class>
</listener>

As you see, I use context parameters to configure the command to run, and the cron pattern to use. You can declare any number of jobs you want, just stick to the naming scheme for the parameters: <jobName>Command and <jobName>CronPattern so the listener can find out which pattern belongs to which job.

You can get the plugin here: https://projects.jkraemer.net/svn/plugins/jruby/quartz_rails/ . As always, any feedback is welcome.

Strange Mongrel 1.1 error (solved)

Posted on January 07, 2008

Right after updating Mongrel, gem_plugin and some other (probably unrelated) gems today, Mongrel didn’t like me any more, refusing to start up any Rail s application. Instead, I got this nice message complaining about a missing init.rb in the activerdf gem:

** Rails loaded.
** Loading any Rails specific GemPlugins
Exiting
/usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:27:in 'gem_original_require': no such file to load -- /usr/lib/ruby/gems/1.8/gems/activerdf-1.6.1/lib/activerdf/init.rb (MissingSourceFile)

It took me a while to find the fix on rubyforge via google, so maybe this helps somebody else having the same problem.