Google新一代实时搜索系统的核心机制
时间:2010-10-07
来源:cnblogs
最近,Google发布一篇关于其新一代实时搜索系统核心机制的论文《Large-scale Incremental Processing Using Distributed Transactions and Notifications》,在这篇论文中介绍名为“Percolator”的一个基于BigTable的系统,在功能上其非常类似传统数据库的触发器(Trigger),但是在伸缩性方面有其独到的设计,下面是其摘要、下载地址和相关文章等。
摘要
Updating an index of the web as documents are crawled requires continuously transforming a large repository of existing documents as new documents arrive. This task is one example of a class of data processing tasks that transform a large repository of data via small, independent mutations. These tasks lie in a gap between the capabilities of existing infrastructure. Databases do not meet the storage or throughput requirements of these tasks: Google’s indexing system stores tens of petabytes of data and processes billions of updates per day on thousands of machines. MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency.
We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator,we process the same number of documents per day,while reducing the average age of documents in Google search results by 50%.
下载地址
http://research.google.com/pubs/archive/36726.pdf
相关文章
Google’s Colossus Makes Search Real-Time By Dumping MapReduce
热门阅读
-
office 2019专业增强版最新2021版激活秘钥/序列号/激活码推荐 附激活工具
阅读:74
-
如何安装mysql8.0
阅读:31
-
Word快速设置标题样式步骤详解
阅读:28
-
20+道必知必会的Vue面试题(附答案解析)
阅读:37
-
HTML如何制作表单
阅读:22
-
百词斩可以改天数吗?当然可以,4个步骤轻松修改天数!
阅读:31
-
ET文件格式和XLS格式文件之间如何转化?
阅读:24
-
react和vue的区别及优缺点是什么
阅读:121
-
支付宝人脸识别如何关闭?
阅读:21
-
腾讯微云怎么修改照片或视频备份路径?
阅读:28














