• 首页
  • 小学语文
  • 中学语文
  • 中学英语
  • 免费论文
  • 教学随笔
  • 学生作文
  • 综合考试
  • 试题教案
  • 育儿话题
  • 教学资源
  • 编程技术
  • 博客
  • pre-emptive multithreading web spider

    日期:2005-01-20  地址:  作者:
     

    pre-emptive multithreading web spider


    this article was contributed by sim ayers.


    the win32 api supports applications that are pre-emptively multithreaded. this is a very useful and powerful feature of win32 in writing mfc internet spiders. the spider project is an example of how to use preemptive multithreading to gather information on the web using a spider/robot with the mfc wininet classes.

    this project produces a spidering software program that checks web sites for broken url links. link verification is done only on href links. it displays a continously updated list of urls in a clistview that reports the status of the href link. the project could be used as a template for gathering and indexing information to be stored in a database file for queries.

    search engines gather information on the web using programs called robots. robots (also called web crawlers, spiders, worms, web wanderers, and scooters) automatically gather and index information from around the web, and then put that information into databases. (note that a robot will index a page, and then follow the links on that page as a source for new urls to index.) users can than construct queries to search these databases to find the information they want.

    by using preemptive multithreading, you can index a web page of url links, start a new thread to follow each new url link for a new source of urls to index.

    the project uses the mdi cdocument used with a custom mdi child frame to display a ceditview when downloading web pages and a clistview when checking url links. the project also uses the cobarray, cinternetsession, chttpconnection, chttpfile, and cwinthread mfc classes. the cwinthread class is used to produce multiple threads instead of using the asynchronous mode in cinternetsession, which is realy left over from the winsock 16 bit windows platform.

    the spider project uses simple worker threads to check url links or download a web page. the cspiderthread class is derived from the cwinthread class so each cspiderthread object can use the cwinthread message_map() function. by declaring a "declare_message_map()" in the cspiderthread class the user interface is still responsive to user input. this means you can check the url links on one web server and at the same time download and open a web page from another web server. the only time the user interface will become unresponsive to user input is when the thread count exceedes maximum_wait_objects which is defined as 64.

    in the constructor for each new cspiderthread object we supply the threadproc function and the thread paramters to be passed to the threadproc function.

      
    
    
    

    对 pre-emptive multithreading web spider 文章的评论    [查看网友评论]

    验证码:
    匿名发表: