Crawl Anywhere 1.2.0 available

The new version 1.2.0 of Crawl Anywhere is now available.

This version introduces the following new concepts :

  • Account

Web sites to be crawled are created under an account

You can create several accounts

Each account manage a set of web sites

It is possible to create users in order to manage a specific account

  • Target

A target is a Solr core.

You can associate several targets to an account

You can specify a default target for an account

A web site from an account can be indexed in any target (Solr core) associated to its account

  • Engine

An engine is a instance of crawler/pipeline/indexer

You can associate one engine to an account

Various account can share the same engine

You can deploy several engines on several servers

An engine will crawl and index the web sites from the account they are associated to

  • Web site settings

New way to define starting urls (web site, rss, page of links)

New way to specify crawling rules

 

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">