PROJECT DESCRIPTION
This interactive web application demonstrates an end-to-end process for measuring document similarity by using Euclidean distance and Cosine Similarity. The web application allows users to either create simple documents or select predefined documents, to construct weighted query and to visualize the similarity between the query and the documents in both 2D and 3D document space side by side, which enable human to easily comprehend the relationship between the query and the documents in a document space.
PROJECT OBJECTIVE
Document Similarity in Space is a web application for a final project of Information Storage and Retrieval class at Univerisity of Pittsburgh.
ABOUT EUCLIEAN DISTANCE AND COSINE SIMILARITY
The Euclidean distance between points p and q is the length of the line segment connecting them.
Cosine similarity is a measure of similarity between two vectors by finding the cosine of the angle between them.
APPLICATION END-TO-END PROCESS
To visualize the document similar in space, users start to use this application by adding a query and documents into the system via the Html forms. After Html form submission, the javascript embbed in the page will process the Html forms and pass data of the document vectors to the Java Applet to display the similarity between a query and the documents in 2D and 3D space.
PROJECT CONTRIBUTION
Leading and Forming a Team; Researching; Graphic User Interface Design and Coding; Web Application Integration (since each team member is assigned to develop a different piece of software)
PROJECT CREDITS
My Team Member: Shruti Parikh, Sueyeon Syn and Zhiwen Yu
PROJECT ARTIFACTS
Document Similarity In Space Project Website
Demo Page
C. Faloutsos, K. Lin, FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets
|