Impressum / Imprint

RDig 0.3.0

Posted on April 28, 2006

In addition to crawling web sites, RDig now can index local documents. Just give it one or more file:/ URLs pointing to the directories to index, optionally define some filename inclusion/exclusion patterns and there you go.

Document locations can be rewritten to ease linking to them in a web based search frontend. To rewrite all file:/base/* URIs to http://www.mydomain.com/virtual_dir/, you say

1
2
3
4
5
6

cfg.index.rewrite_uri = lambda do |uri| 
  uri.path.gsub!(/^\/base\//, '/virtual_dir/')
  uri.scheme = 'http'
  uri.host = 'www.mydomain.com'
end

in your RDig config file.

Also there’s a new feature for PDF content extraction: titles are now extracted from PDF meta data with the help of the pdfinfo utility.

Have fun!