Web Information Retrieval using Web Document Structures.

Show full item record

Title: Web Information Retrieval using Web Document Structures.
Author: Namjoshi, Nihar
Advisors: Dr. Robert StAmant, Committee Chair
Dr. Christopher Healey, Committee Member
Dr. James Lester, Committee Member
Abstract: Information domains such as the World Wide Web have enormous information content. The task of extracting information relevant to a particular topic, or trying to predict what sort of information a user is seeking is not a trivial task. For a user, finding information relevant to a particular area of interest can be inconvenient and sometimes frustrating as well. Studies have shown that when users are faced with such a task, they may get easily bored and thus leave a Web site. Traditional Information Retrieval techniques rely on measures such as the frequency of a word in a given document, or the hyperlink connectivity of that particular web document. This approach may not necessarily bring out the important words or terms in a document and thus could be less effective while returning search results for queries. In our approach, we rely not only on the actual text in the document, but we also use the inherent formatting elements in Web pages, derived from the Hyper Text Markup Language (HTML) syntax to support our process of information extraction. We use rules to assign measures to important terms in a document in order to facilitate the relevant Information Extraction. We evaluated our system by asking users to test it and in addition, we compared our results with the results from a conventional search engine.
Date: 2004-01-08
Degree: MS
Discipline: Computer Science
URI: http://www.lib.ncsu.edu/resolver/1840.16/896


Files in this item

Files Size Format View
etd.pdf 615.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record