Crawl-Anywhere release 4.0.0 final available

We are pleased to announce Crawl-Anywhere release 4.0.0 final. This release is available and tagged under Github : https://github.com/bejean/crawl-anywhere/tree/4.0.0

This release is the last one before finale release delivery.

4.0.0 release highlights :

Download page : http://www.crawl-anywhere.com/download-crawl-anywhere/

Crawl-Anywhere 4.0.0-release-candidate available

We are pleased to announce Crawl-Anywhere 4.0.0 release candidate. This release is available and tagged under Github : https://github.com/bejean/crawl-anywhere/tree/4.0.0-release-candidate

4.0.0 release candidate highlights :
  • Fixe some installation documentation errors
  • Provide Solr 4.3.0 binary
  • Issue #33 – Admin UI doubles backslash
  • Issue #35 – Allow wildcards in Host aliases
  • issue #36 – Recrawling does not start
  • issue #40 – Crawl web site with basic authentication sheme

label_new_blue_small
As requested in Crawl-Anywhere forum, a virtual appliance is now available in download page.

Crawl-Anywhere 4.0.0-beta-1 available

We are pleased to announce Crawl-Anywhere 4.0.0 beta-1 release. This release is available and tagged under Github : https://github.com/bejean/crawl-anywhere/tree/4.0.0-beta-1

4.0.0 beta-1 release highlights :
  • issue #21 – Indexer – optimize for Solr 4.x and remove dependency to Apache Commons HttpClient 3.x
  • issue #22 – Indexer – remove elasticsearch 0.20 dependency and so lucene 3.6.2 dependencies
  • issue #24 – Crawler admin UI is looking for crawler.properties
  • issue #26 – Crawler admin UI error while reading log
  • issue #28 – Scripts tools

Crawl-Anywhere 4.0.0 Alpha-4 available

We are pleased to announce Crawl-Anywhere 4.0.0 alpha-4 release. This release is available and tagged under Github : https://github.com/bejean/crawl-anywhere/tree/4.0.0-alpha-4

4.0.0 alpha-4 release highlights :
  • issue #17 – Add a maximum fetch rate per minute
  • issue #18 – Use base href element

 

 

 

 

 

Crawl-Anywhere 4.0.0 Alpha-3 available

We are pleased to announce Crawl-Anywhere 4.0.0 alpha-3 release. This release is available and tagged under Github : https://github.com/bejean/crawl-anywhere

4.0.0 alpha-3 release highlights :
  • issue #09 – Provide version information
  • issue #10 – Enable disabled v3.0.3 features
  • issue #15 – Allow alternate Solr UpdateRequestHandler in the indexer 

 

 

Crawl Anywhere 3.0.4 available

The new version 3.0.4 of Crawl Anywhere is now available. 

New features

Crawler

  • Monitor crawl progression in administration status page (processed pages, estimate remaining pages, elapsed time, estimate remaining time)
  • Choose an alternate crawler output queue for a target

Pipeline

  • New MetaExtractor stage

Bug fix

MongoDB related fixes

  • compatibility issue with MongoDB 2.4.4
  • too many open connections

Html page cleaning

  • Snacktory method enhancement (less agressive)

Upgrade from v3.0.3 to v3.0.4

Update the following directories :

  • bin
  • lib
  • web

About configuration files

In "config/pipeline/simplepipeline.xml" file, the new setting for LanguageDetector stage is :

Database

Backup first your crawler database !!!

Execute the upgrade script

mysql -uxxx -pxxx crawler < install/crawler/mysql/upgrade/v3-0-4.sql

Crawl-Anywhere 4.0.0 Alpha-2 available

We are pleased to announce Crawl-Anywhere 4.0.0 alpha-2 release. This release is available and tagged under Github – https://github.com/bejean/crawl-anywhere

4.0.0 alpha-2 release highlights :
  • Bugs fixing in crawler and pipeline modules
  • Multilingual Solr analyzer disrupted. We now use one field per language and edismax for searching. No more need to use a patched version of Solr.
  • Solr 4.3.0 configuration files provided (required for tags cloud feature)
  • New improved search interface
  • Updated installation instructions : http://www.crawl-anywhere.com/installation-v400/
  • Updated build process

 

 

Crawl-Anywhere is now open source

A lot of users asked for it, so starting version 4, Crawl-Anywhere becomes an open-source project.

More information here : https://github.com/bejean/crawl-anywhere#readme

Any feed-back or promotion (blog posts, twits, …) are welcome.

Crawl Anywhere 3.0.3 available

The new version 3.0.3 of Crawl Anywhere is now available. This is a bug fix release.

The issue was in the administration web service (crawlerws.war). A dependency was missing.

Upgrade from v3.0.2 to v3.0.3

You just need to update your webapps/crawlerws/crawlerws.war file.

Extract the crawl-anywhere-3.0.3.tar.gz archive somewhere.

Crawl Anywhere 3.0.2 available

The new version 3.0.2 of Crawl Anywhere is now available. This is a bug fix release.

The issue was in the search application. An issue with facets.

Upgrade from v3.0.1 to v3.0.2

You just need to update your web/search directory

Page 1 of 3123