本文共 677 字,大约阅读时间需要 2 分钟。
问题描述:执行爬网页命令时出现错误:
#./bin/nutch crawl urls.test/iniurl -depth 1
crawl started in: crawl-20110504193156
rootUrlDir = urls.test threads = 10 depth = 1 indexer=lucene Injector: starting at 2011-05-04 19:31:56 Injector: crawlDb: crawl-20110504193156/crawldb Injector: urlDir: urls.test Injector: Converting injected urls to crawl db entries. Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.nutch.crawl.Injector.inject(Injector.java:217) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
原因:apache-nutch-1.2-src.tar.gz解压的文件夹这没有plugins文件夹
解决方法:将apache-nutch-1.2-bin.tar.gz中的plugins文件夹拷贝过来
转载地址:http://mkavi.baihongyu.com/