摘 要
随着互联网的日益壮大,搜索引擎技术飞速发展。搜索引擎已成为人们在浩瀚的网络世界中获取信息必不可少的工具,利用何种策略有效访问网络资源成为专业搜索引擎中网络爬虫研究的主要问题。文章介绍了搜索引擎的分类及其工作原理.阐述了网络爬虫技术的搜索策略,对新一代搜索引擎的发展趋势进行了展望。
本文通过主题爬虫实现对与图片相关信息的搜集,存储在数据库中,并将这些信息在web端分类显示,同时在web端提供信息检索功能,登录注册功能,信息评论功能。主题爬虫的实现采用向量空间模型进行主题判别,增强型PangRank算法(EPR算法)进行URL筛选。
关键词: 图片,爬虫,检索
Abstract
With the Internet growing rapid development, search engine technology. Search engine has become people to obtain information essential in the vast network in the world of tools, use what kind of strategy of effective access cyber source has become the main problem of professional web crawler in search engine. This paper introduces the classification and search engine working principle was described. Technology of web crawler search strategy, the development trend of the new generation of search engine is prospected.
This thesis realizes the collection of relevant information of the picture through the theme crawler, storage in the database and the information in the web client classification, and at the end of the web provides information retrieval function, the login function information comment function. The realization of the theme crawler uses vector space model to carry on the subject discrimination, enhanced PangRank algorithm (EPR algorithm) for URL screening.
Key words: image, crawler, search
目 录