Crawl Anywhere 3.0.4 available

The new version 3.0.4 of Crawl Anywhere is now available. 

New features

Crawler

  • Monitor crawl progression in administration status page (processed pages, estimate remaining pages, elapsed time, estimate remaining time)
  • Choose an alternate crawler output queue for a target

Pipeline

  • New MetaExtractor stage

Bug fix

MongoDB related fixes

  • compatibility issue with MongoDB 2.4.4
  • too many open connections

Html page cleaning

  • Snacktory method enhancement (less agressive)

Upgrade from v3.0.3 to v3.0.4

Update the following directories :

  • bin
  • lib
  • web

About configuration files

In "config/pipeline/simplepipeline.xml" file, the new setting for LanguageDetector stage is :

Database

Backup first your crawler database !!!

Execute the upgrade script

mysql -uxxx -pxxx crawler < install/crawler/mysql/upgrade/v3-0-4.sql

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">