An open solution to discover the graph structure of World Wide Web

Show simple item record

dc.contributor.advisor Vincent W. Freeh, Committee Chair en_US
dc.contributor.advisor Frank Mueller, Committee Member en_US
dc.contributor.advisor Xuxian Jiang, Committee Member en_US Chen, Kunsheng en_US 2010-04-02T18:01:41Z 2010-04-02T18:01:41Z 2009-12-23 en_US
dc.identifier.other etd-12222009-192226 en_US
dc.description.abstract The World Wide Web is a large complex network of inter-linked web pages. Understanding this structure is of immense benefit both economically and socially. Currently, there is incomplete or sparse information about the graph structure of the Web in the public domain. The full data is closely-guarded by a handful of corporations. Nevertheless, studies on the topological structure of World Wide Web benefit not only scientists and e-commerce merchants but also common users. A better understanding of such a structure helps scientists to develop new technologies to improve the Internet. It also assists companies to build optimal e-commerce solutions to fulfill their business needs. The goal of this thesis is to evaluate an open source solution to mapping the structure of the Web. In support of this thesis, we have implemented a prototype using existing open source software including voluntary computing library BOINC (Berkeley Open Infrastructure Network Computing) and Hadoop MapReduce framework. We utilize the computing power and disk space from BOINC to perform data collection and Hadoop MapReduce framework to perform data analysis on a large set of data.. Contribution of our research includes a low-cost open solution of a distributed web crawling system using BOINC and a URL ranking system utilizing Hadoop MapReduce framework. We also provide a feasibility study on crawling the web using the above solution and present experimental results. en_US
dc.rights I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. en_US
dc.subject graph structure of World Wide Web en_US
dc.subject distributed web crawler en_US
dc.title An open solution to discover the graph structure of World Wide Web en_US MS en_US thesis en_US Computer Science en_US

Files in this item

Files Size Format View
etd.pdf 247.5Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record