Apache Nutch 1.2 Released
时间:2010-09-25
来源:红薯
Nutch 是一个开源Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。
Apache Nutch 1.2 包含了不少的改进和bug修复,详情请看 CHANGES 文件。
你可以通过下面地址下载最新版的 Apache Nutch:
http://www.apache.org/dyn/closer.cgi/nutch/
CHANGES:
* NUTCH-901 Make index-more plug-in configurable (Markus Jelsma via mattmann)
* NUTCH-908 Infinite Loop and Null Pointer Bugs in Searching (kubes via mattmann)
* NUTCH-906 Nutch OpenSearch sometimes raises DOMExceptions (Asheesh Laroia via ab)
* NUTCH-862 HttpClient null pointer exception (Sebastian Nagel via ab)
* NUTCH-905 Configurable file protocol parent directory crawling (Thorsten Scherler, mattmann, ab)
* NUTCH-877 Allow setting of slop values for non-quote phrase queries on query-basic plugin (kubes via jnioche)
* NUTCH-716 Make subcollection index filed multivalued (Dmitry Lihachev via jnioche)
* NUTCH-878 ScoringFilters should not override the injected score
* NUTCH-870 Injector should add the metadata before calling injectedScore (jnioche via mattmann)
* NUTCH-858 No longer able to set per-field boosts on lucene documents (ab)
* NUTCH-869 Add parse-html back (jnioche)
* NUTCH-871 MoreIndexingFilter missing date format (Max Lynch via mattmann)
* NUTCH-696 Timeout for Parser (ab, jnioche)
* NUTCH-857 DistributedBeans should not close their RPC counterparts (kubes)
* NUTCH-855 ScoringFilter and IndexingFilter: To allow for the propagation of URL Metatags and their subsequent indexing (Scott Gonyea via mattmann)
* NUTCH-677 Segment merge filering based on segment content (Marcin Okraszewski via mattmann)
* NUTCH-774 Retry interval in crawl date is set to 0 (Reinhard Schwab via mattmann)
* NUTCH-697 Generate log output for solr indexer and dedup (Dmitry Lihachev, Jeroen van Vianen via mattmann)
* NUTCH-850 SolrDeleteDuplicates needs to clone the SolrRecord objects (jnioche)
* NUTCH-838 Add timing information to all Tool classes (Jeroen van Vianen, mattmann)
* NUTCH-835 Document deduplication failed using MD5Signature (Sebastian Nagel via ab)
* NUTCH-831 Allow configuration of how fields crawled by Nutch are stored / indexed / tokenized (Jeroen van Vianen via mattmann)
* NUTCH-278 Fetcher-status might need clarification: kbit/s instead of kb/s shown (Alex McLintock via mattmann)
* NUTCH-833 Website is still Lucene branded (mattmann, Alex McLintock)
* NUTCH-832 Website menu has lots of broken links - in particular the API docs (Alex McLintock via mattmann)
热门阅读
-
office 2019专业增强版最新2021版激活秘钥/序列号/激活码推荐 附激活工具
阅读:74
-
如何安装mysql8.0
阅读:31
-
Word快速设置标题样式步骤详解
阅读:28
-
20+道必知必会的Vue面试题(附答案解析)
阅读:37
-
HTML如何制作表单
阅读:22
-
百词斩可以改天数吗?当然可以,4个步骤轻松修改天数!
阅读:31
-
ET文件格式和XLS格式文件之间如何转化?
阅读:24
-
react和vue的区别及优缺点是什么
阅读:121
-
支付宝人脸识别如何关闭?
阅读:21
-
腾讯微云怎么修改照片或视频备份路径?
阅读:28