Impressum / Imprint

RDig 0.3.5

Posted on February 26, 2008

RDig is a tiny web and file system crawler built on top of the Ferret search engine. It’s one of my less active side projects and from what I can tell doesn’t have a very large user base. However there are some people out there who actually use it, and some of those people even tell me so and suggest new features from time to time :-)

Limit crawling depth

You can now configure a maximum crawling depth to restrict RDig to only index pages up to this level. For example, setting config.crawler.max_depth = 1 will make RDig only index the configured start pages, and pages the start pages directly link to. You get the picture I guess.

This option is especially useful if restricting RDig to a pre-defined number of hosts is not an option for your use case, but you still don’t intend to have it crawl the whole web.

HTTP proxy auth support

If you are behind a proxy and have to use HTTP Basic Authentication with it to get through, you can specify proxy url, user name and password:

cfg.crawler.http_proxy = "http://yourproxy:8080"
cfg.crawler.http_proxy_user = "username"
cfg.crawler.http_proxy_pass = "secret"

Under the hood

I put some work into refactoring parts of RDig in order to make integration with acts_as_ferret easier. I’ll write more about that in another post.

Get it!

RDig is available as a gem via Rubyforge.

acts_as_ferret 0.4.3

Posted on November 18, 2007

Long time since the last release (not counting the short-lived 0.4.2 …), and I guess most people already use trunk anyway, but for the faint of heart, here’s the new stable version of your favourite Rails fulltext search plugin.

As always, get it via svn from svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret. More installation information can be found on the acts_as_ferret Trac site.

No big news feature-wise, I already wrote about the more important features when I added them to trunk:

Going through the timeline looking for some cool feature I didn’t already write about I found several smaller things worth mentioning:

Dynamic document specific boosts

This comes in handy if you want to have search results automatically ranked by a criteria which is different for each record, e.g. the popularity of an article in your shop:


class Article
  acts_as_ferret :boost => :popularity
  def popularity
    # return dynamic boost value for this document
  end
end

You may also apply the dynamic boost to a specific field (or even different boosts to different fields), so it only is applied when a hit occurs in the boosted field. This way you can choose at query time if you want to have the boosting applied or not. Just query either the boosted fields, or the normal ones:


class Article
  acts_as_ferret :fields => { 
                             :title               => {}, 
                             :boosted_title => { :boost => :rating } 
                         }
  def rating
    # return rating of this article
  end

  # value for the boosted title field
  def boosted_title
    title
  end
end

New and better start/stop scripts

The DRb server now has a unified start/stop script and it ships with scripts for using the it as a Windows system service. Thanks to Peter Jones and Herryanto Siatono for contributing these.

Also the acts_as_ferret gem now has got an installer that will install the server script and sample config into your Rails project:


$ gem install acts_as_ferret
$ rails test
$ cd test/
$ aaf_install
$ script/ferret_server -e production start

And your DRb server is up and running. Easy, isn’t it?

No more :remote => true

Last but not least, aaf now is a bit more clever and goes into remote mode automatically if the DRb server is configured for the current environment. If for whatever reason you don’t want that, use :remote => false.

Keep an eye on your DRb server with Monit

Posted on October 21, 2007

Many people nowadays seem to use monit to ensure their Rails application is always up and running, and maybe even to get notified in case of any problems like unusual high load or memory usage.

Since acts_as_ferret doesn’t really like it when the DRb server has gone away, it’s a good idea to not only monitor your Mongrels, but also the DRb server itself. So here’s for you a small snippet of monit configuration derived from one I’m using elsewhere:

# monit configuration snippet to watch the Ferret DRb server shipped with
# acts_as_ferret
check process ferret with pidfile /path/to/ferret.pid

    # username is the user the drb server should be running as (It's good practice
    # to run such services as a non-privileged user)
    start program = "/bin/su -c 'cd /path/to/your/app/current/ && RAILS_ENV=production script/ferret_start' username"
    stop program = "/bin/su -c 'cd /path/to/your/app/current/ && RAILS_ENV=production script/ferret_stop' username"

    # cpu usage boundaries
    if cpu > 60% for 2 cycles then alert
    if cpu > 90% for 5 cycles then restart

    # memory usage varies with index size and usage scenarios, so check how
    # much memory your DRb server uses up usually and add some spare to that
    # before enabling this rule:
    # if totalmem > 50.0 MB for 5 cycles then restart

    # adjust port numbers according to your setup:
    if failed port 9010 then alert
    if failed port 9010 for 2 cycles then restart
    group ferret

As you can see it’s pretty straightforward, well, maybe except the start/stop commands which took me a few iterations to get right. I also added this to the acts_as_ferret distribution: monit-example.

Raise your hand please if you're using Ferret

Posted on September 09, 2007

Working on acts_as_ferret for more than one and a half year now, I’m really interested in why and how people are using it in their applications. Also, a list of Ferret-powered projects will be a good starting point for people who are still looking for a search solution for their Rails app.

So if you’re using Ferret, please drop me a line, comment here, or even add your application to the Powered by Ferret page over at the Ferret project’s Trac. Ideally you would also post some facts like index size, your production environment or even performance numbers. Personally I’d also like to know if you’re using acts_as_ferret, or, if that’s not the case, why you decided to go with pure Ferret.

I’ll also mention some sites that make use of Ferret in my talk at Railsconf Europe next week. So if you’re looking for some publicity, what are you waiting for?

Faster indexing with acts_as_ferret

Posted on September 02, 2007

Does your application operate on large chunks of records that are indexed by acts_as_ferret? If the answer is yes, then this is for you:

By combining two brand new features of acts_as_ferret you may now speed up batch operations like this: First, disable acts_as_ferret indexing for the model class in question. Then do your updates, but be sure to remember the primary keys of the modified records. After that, re-enable acts_as_ferret and index all modified or created records at once:

Model.disable_ferret
# create or modify records here, collect ids in id_array
Model.enable_ferret
Model.bulk_index(id_array)

You may also use the block syntax to have aaf be re-enabled automatically:

id_array = []
Model.disable_ferret do
  # create or modify records here, collect ids in id_array
end
Model.bulk_index(id_array)

Pagination goodness

Posted on August 27, 2007

Finally implementing pagination for your acts_as_ferret search results is as easy as it should be (at least if you’re using aaf trunk, everybody else will have to wait for the soon-to-be-released 0.4.2):


@results = Model.find_with_ferret params[:query], :page => params[:page], 
                                                  :per_page => 10

Acts_as_ferret’s SearchResults class now gives you all you need to implement a helper rendering your pagination links:


@results.page           # => current page
@results.page_count     # => total number of pages
@results.previous_page  # => index of previous page or nil if on the first page
@results.next_page      # => index of next page or nil if on the last page

Best of all, this even works when you combine your query with ActiveRecord conditions.

Hint: really lazy people install the will_paginate plugin and use it’s will_paginate helper method to get their pagination links for free!

acts_as_ferret 0.4.1

Posted on July 17, 2007

Besides several small tweaks and fixes, this release introduces index versioning to the DRb server. Now everytime you call MyModel.rebuild_index, a new index is built in the background. During that process the original index is still in place, so ideally nobody will notice. Once the rebuild is done, acts_as_ferret will switch over and use the new index.

Disclaimer: until now any modifications done to model instances while the rebuild runs will go into the old index, so you’ll have to keep track of these yourself.

MySQL users with a large number of records might as well notice a speed increase with index rebuilds, thanks to this clever exploitation of several MySQL specifics.

Btw, if you’re looking for some capistrano recipes for managing your ferret server, have a look at the palmtree project.

Rails-Konferenz 2007

Posted on June 26, 2007

is over. It’s been a great day full of interesting talks and interesting people.

Pictures from the conference are here, there and there.

Here’s my (german) talk about Ferret and acts_as_ferret. It’s done with S5, you can reach the outline (which has more text) by moving the mouse to the lower right and clicking Ø.

Update: Slides from the conference are available as PDF now, too.

Next is RailsConf Europe.

Random Notes

Posted on May 14, 2007
  • Acts_as_ferret’s DRb server now handles index rebuilds nicely. Every rebuild will create an entirely new index. While reindexing runs your application will continue to use the old index. That should bring us a huge step towards the next release of aaf.

  • Online chat site Lingr is using acts_as_ferret to provide it’s users with full text search across everything that ever has been said there. They built a neat analyzer to handle multilingual content (including non-latin languages like Chinese or Japanese) in both documents and queries which should be open sourced soon.

  • Last but not least, I’ll talk about Ferret and acts_as_ferret at Rails-Konferenz.

Acts_as_ferret DRb server benchmarks

Posted on April 04, 2007

Caleb Jones did some in-depth benchmarking of acts_as_ferret (using the built-in DRb server) and acts_as_solr (accessing the Java/Lucene-based Solr server).

Results were really close together, however I’m glad to inform you that acts_as_ferret won in 3 out of 4 disciplines :-)

What's new in acts_as_ferret Part 2 - Lazy loading

Posted on March 26, 2007

Besides the integrated DRb server, the lazy loading of AR objects is the other big new feature in acts_as_ferret.

Until now, aaf only used the Ferret index for retrieval of the primary keys of records matching a query. These then were used to get the relevant objects out of the database via ActiveRecord. But Ferret can also store and retrieve the contents of any other field you put into the index.

This especially comes in handy for all kinds of live searches. Just store the relevant fields directly in the index and get them back from your search without a single database query. Unfortunately you’d now have to deal with two different kinds of result objects in your views - those from the index and those from your db. Not so with acts_as_ferret.

Acts_as_ferret makes the whole thing completely transparent - you can use your aaf query result as if it were a normal AR model instance. Behind the scenes aaf will only fetch the record from your databse when it needs to, that is when you ask for an attribute that could not be loaded from the index.

Read on for a short introduction with some sample code.

acts_as_ferret 0.4.0 RIE

Posted on March 24, 2007

It’s been some time since the last release, but finally here it is - the DRb powered Remote Indexing Edition of acts_as_ferret.

As you already might have guessed, among several bug fixes the major new feature is the built in DRb server that allows any number of process or even physical machines to access a single central Ferret index instance - saving you memory and the possible locking hassles.

To install this release, use

script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

A RubyGem is available, too. For more info about the DRb server, please read on.

Great acts_as_ferret tutorial

Posted on February 20, 2007

Greg Pollack has written up an extensive tutorial about using acts_as_ferret. It covers close to everything you might ever need to know if you want to do fast full text search in your Rails app.

acts_as_ferret 0.3.1

Posted on January 21, 2007

With several minor fixes and some small extensions like the various ways to conditionally disable automatic index updates I would not call this a big release.

However it’s the first release that is available as a RubyGem for system-wide installation, too. After installing the gem all you need to do is add the line

require 'acts_as_ferret'

to your environment.rb.

The API docs for the latest release now can be found at RubyForge, too. I’m still unsure if I should switch over the subversion repo to RubyForge, too. With the move to RubyForge I’d lose the cool Trac/subversion integration features, but on the other hand, trac has a real spam problem, anyway. Maybe I’ll have a look for an alternative Wiki/Bug reporting platform that is more spam proof to entirely replace Trac. Ideas anybody?

acts_as_ferret covered in a book!

Posted on December 19, 2006

Christian Hellsten and Jarkko Laine feature Ferret and acts_as_ferret in their book Beginning Ruby on Rails E-Commerce.

As a part of their example project, an online book store, they show how to realize a full text search with acts_as_ferret, from the acts_as_ferret statement in the model up to the controller action and corresponding view. They even cover how to index content from related objects - book authors in this case - through an indexed instance method.