1. message:Value of illegal type: 'org.archive.crawler.settings.ModuleType', 'org.archive.crawler.framework.Frontier' was expected.: Value of illegal type: 'org.archive.crawler.settings.ModuleType', 'org.archive.crawler.framework.Frontier' was expected. Exception:No associated exception.
2. message:On crawl: question Unable to setup crawl modules exception:java.lang.ClassCastException: org.archive.crawler.settings.ModuleType cannot be cast to org.archive.crawler.framework.Frontier Stacktrace: java.lang.ClassCastException: org.archive.crawler.settings.ModuleType cannot be cast to org.archive.crawler.framework.Frontier at org.archive.crawler.framework.CrawlController.setupCrawlModules(CrawlController.java:675) at org.archive.crawler.framework.CrawlController.initialize(CrawlController.java:381) at org.archive.crawler.admin.CrawlJob.setupForCrawlStart(CrawlJob.java:853) at org.archive.crawler.admin.CrawlJobHandler.startNextJobInternal(CrawlJobHandler.java:1144) at org.archive.crawler.admin.CrawlJobHandler$3.run(CrawlJobHandler.java:1127) at java.lang.Thread.run(Thread.java:619)
3. message:Wrong document type 'crawl-order' in 'file:/c:/heritrix/jobs/question-20141005032127804/order.xml', line: 1, column: 160 exception:No associated exception.
[b]解决方案[/b]:一般都是由于处理器链没有正确设置而导致 比如,在应该是Prefetcher的地方,设置成了Writer。这样就会导致错误 请严格按照以下方式来设置: 1. frontier org.archive.crawler.frontier.BdbFrontier 2. scope org.archive.crawler.scope.BroadScope 3. Prefetcher org.archive.crawler.prefetch.Preselector org.archive.crawler.prefetch.PreconditionEnforcer 4. Fetcher org.archive.crawler.fetcher.FetchDNS org.archive.crawler.fetcher.FetchHTTP 5. Extractor org.archive.crawler.extractor.ExtractorHTTP org.archive.crawler.extractor.ExtractorHTML (这里可以按自己的需要多添几个,比如ExtractorSWF、ExtractorJS什么的,但是前两个是必不可少的) 6. Writer 可以是MirrorWriter或ARCWriter,一般建议使用MirrorWriter 7. PostProcessor org.archive.crawler.postprocessor.CrawlStateUpdater org.archive.crawler.postprocessor.LinksScoper org.archive.crawler.postprocessor.FrontierScheduler (FrontierScheduler可以自行扩展,按书上的方法) |
|