[1]杨瑞,胡弘思,张文波,等.一种分布式网络爬虫的设计与实现[J].江西师范大学学报(自然科学版),2013,(04):382-386.
 YANG Rui,HU Hong-si,ZHANG Wen-bo,et al.Design and Implementation of a Distributed Web Crawler[J].,2013,(04):382-386.
点击复制

一种分布式网络爬虫的设计与实现()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2013年04期
页码:
382-386
栏目:
出版日期:
2013-09-01

文章信息/Info

Title:
Design and Implementation of a Distributed Web Crawler
作者:
杨瑞;胡弘思;张文波;姚天昉
上海交通大学计算机科学与工程系,上海,200240
Author(s):
YANG Rui;HU Hong-si;ZHANG Wen-bo;YAO Tian-fang
关键词:
分布式系统网络爬虫设计
Keywords:
distributed systemweb crawlerdesign
分类号:
TP391
文献标志码:
A
摘要:
利用用户指定的关键字和搜索引擎生成URL种子,通过分布式网络爬虫抽取符合用户需求的网页作为研究所用的语料.实验结果表明:分布式网络爬虫可以较好地解决在短时间内抽取大量语料的需求.
Abstract:
User-specified keywords to generate URL seeds by search engine has been used.Webpage for user's requirements as research corpus through distributed web crawler has been extracted.Experiments show that the distributed web crawler can be good solution to extract a large number of corpora in a short time.

参考文献/References:

[1] Tripathy A,Patra P K.A web mining architectural model of distributed crawler for internet searches using page rank algorithm [EB/OL].
[2012-08-18].http:∥wenku.baidu.com/view/03181bd084254b35eefd3412.
[2] 周立柱,林玲.聚焦爬虫技术研究综述 [J].计算机应用,2005,25(9):1965-1969.
[3] Radhakishan V,Yaser F,Selvakumar S.CRAYSE:design and implementation of efficient text search algorithm in a web crawler [EB/OL].
[2012-08-19].http:∥libra.msra.cn/Publication/14414792/crayse-design-and-implementation-of-efficient-text-search-algorithm-in-a-web-craw-ler.
[4] Shekhar S,Agrawal R,Arya K V.An architectural framework of a crawler for retrieving highly relevant web documents by filtering replicated web collections [EB/OL].
[2012-08-19].http:∥dl.acm.org/citation.cfm?id=1844773.
[5] Zhu Kunpeng,Xu Zhiming,Wang Xiaolong,et al.A full distributed web crawler based on structured network [M].Berlin:Springer,2008:478-483.
[6] 李晓明,李星.搜索引擎与Web挖掘进展论文集 [C].北京:高等教育出版社,2003:1-8.
[7] Robert C M.Krishna B.SPHINX:a framework for creating personal,site-specific Web crawlers [J].Computer Networks and ISDN Systems,1998,39(1/7):119-130.
[8] 闵秋应,况庆强.改进型BP神经网络自适应均衡器设计 [J].江西师范大学学报:自然科学版,2012,36(3):276-278.
[9] 周模,张建宇,代亚非.可扩展的DHT网络爬虫设计和优化 [J].中国科学:信息科学,2010,40(9):1211-1222.
[10] 王珏.重叠型P2P网络中的查询负载均衡策略研究 [J].江西师范大学学报:自然科学版,2012,36(3):292-296.
[11] 姜梦稚.基于Java的多线程网络爬虫设计与实现 [J].微型电脑应用,2010,26(7):21-22.

备注/Memo

备注/Memo:
国家自然科学基金(60773087)
更新日期/Last Update: 1900-01-01