This change upgrades Heritrix 3 to Java 8 (and back to 7 - see later comments), and updates almost all dependencies under heritrix-common to the latest versions. This includes Kryo and the Berkeley DB JE used for state management. These later versions appear to be less bugger and more robust, but this as the changes run deep into the core of the crawler they will require much testing. (i.e. don't merge this, at least not yet!)
The new BDB-JE has better support for alerting to GC issues and looks out for Thread.interrupt() events, after which it will consider the DB environment tainted. Therefore interrupts should be avoided in the code and threads should be shut down properly instead.
The OneLineSimpleLayout class has also been moved into heritrix-commons so it can be used.
For reasons that are not clear, the FetchStats class needed a new constructor so Kryo could rebuild it, passing in a comparator.
The main outstanding issue is that the org.archive.crawler.selftest.StatisticsSelfTest test class fails (blocking any later tests).
该提问来源于开源项目:internetarchive/heritrix3