Crawl Anywhere 1.2.1 available

The new version 1.2.1 of Crawl Anywhere is now available. This release includes pipeline enhancements and bug fixes.

Pipeline

The pipeline is now multi-threaded. With a crawler configured to crawl more then 8 web sites simultaneously, the pipeline was the bottleneck.

With a 8 cores processor, the benchmarks give :

Threads Documents processed per hour
1 32.000
2 55.000
4 90.000

 

For Hurisearch, we now crawl 32 web sites simultaneously and the pipeline use 4 threads.  

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">