Roadmap

In this page, we want provide information about Crawl-Anywhere roadmap.

Version 4.0 – expected in October 2013

This version will include major improvements and refactoring. The main objective is to make Crawl-Anywhere an open-source project available under Github.

Crawler module refactoring

  • use Apache Httpclient 4.x
  • use java executors framework
  • remove mysql, use only mongodb
  • Indexer (medium)
  • Search interface (big)
Remove MySQL dependency and use only MongoDB  
Add REST WS API
  • administration
  • monitoring
Make Crawl-Anywhere an open-source project
  • general source refactoring
  • better source documentation

Crawl-Anywhere is now an open-source project hosted on Github. You can get and test the current alpha version at https://github.com/bejean/crawl-anywhere

Version 4.1 – expected in December 2013

This version should include improvements like elasticsearch integration.

elasticsearch integration This integration has impacts in nearly all modules :

  • Web administration (light)
  • Web crawler (light)
  • Pipeline (medium)
  • Indexer (medium)
  • Search interface (big)