I'm personally happy moving ahead with Datapark. The rest of them fail
for various (sometimes multiple) reasons.
If it turns out that it doesn't work too well or otherwise sucks we can
just retire it. I think it's well worth persuing for now.
Once we have a package, it should be easy to setup a instance and get a
full crawl done and see how well it works out.
Steps I see:
- Package it and get that reviewed (I'm happy to review).
- Setup test instance
- Identify all the resources we want it to crawl and crawl them.
(will need to adjust threads and such here, also may need to adjust
robots.txt to allow our crawler to crawl more). Ideally after a full
crawl, it can do checks pretty quickly.
- Adjust results
* May need to look at tagging pages or resources so they are
* May need to fix it so csrf tokens aren't saved in results.
* May need to teach it what LANG some things are and favor
things from your current LANG.
* May need to drop some results/sites out.
- Theme search page (Sounds like there's a good start/possibly done
- Change search fields/add them
* Change the wiki to call this.
* possibly add search field to all apps?
infrastructure mailing list