An open solution to discover the graph structure of World Wide Web
| dc.contributor.advisor | Vincent W. Freeh, Committee Chair | en_US |
| dc.contributor.advisor | Frank Mueller, Committee Member | en_US |
| dc.contributor.advisor | Xuxian Jiang, Committee Member | en_US |
| dc.contributor.author | Chen, Kunsheng | en_US |
| dc.date.accessioned | 2010-04-02T18:01:41Z | |
| dc.date.available | 2010-04-02T18:01:41Z | |
| dc.date.issued | 2009-12-23 | en_US |
| dc.degree.discipline | Computer Science | en_US |
| dc.degree.level | thesis | en_US |
| dc.degree.name | MS | en_US |
| dc.description.abstract | The World Wide Web is a large complex network of inter-linked web pages. Understanding this structure is of immense benefit both economically and socially. Currently, there is incomplete or sparse information about the graph structure of the Web in the public domain. The full data is closely-guarded by a handful of corporations. Nevertheless, studies on the topological structure of World Wide Web benefit not only scientists and e-commerce merchants but also common users. A better understanding of such a structure helps scientists to develop new technologies to improve the Internet. It also assists companies to build optimal e-commerce solutions to fulfill their business needs. The goal of this thesis is to evaluate an open source solution to mapping the structure of the Web. In support of this thesis, we have implemented a prototype using existing open source software including voluntary computing library BOINC (Berkeley Open Infrastructure Network Computing) and Hadoop MapReduce framework. We utilize the computing power and disk space from BOINC to perform data collection and Hadoop MapReduce framework to perform data analysis on a large set of data.. Contribution of our research includes a low-cost open solution of a distributed web crawling system using BOINC and a URL ranking system utilizing Hadoop MapReduce framework. We also provide a feasibility study on crawling the web using the above solution and present experimental results. | en_US |
| dc.identifier.other | etd-12222009-192226 | en_US |
| dc.identifier.uri | http://www.lib.ncsu.edu/resolver/1840.16/1176 | |
| dc.rights | I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. | en_US |
| dc.subject | graph structure of World Wide Web | en_US |
| dc.subject | distributed web crawler | en_US |
| dc.title | An open solution to discover the graph structure of World Wide Web | en_US |
Files
Original bundle
1 - 1 of 1
