Group Project : Visualization of Document Space

Document Selection

There is a webpage created which has ten documents created in it. So when the user clicks on any of the documents, the documents' content will be shown in a text area beside the documents and he can select the documents he wishes to. There are twenty query words along with 10 weights associated with them. So the user can choose their query words along with weights.

For the model that we have used, each document and the query are treated as vectors. And we have calculated the distance using Cosine Measure and Euclidean Distance. First the frequency of each term is found and then the document vectors are normalized using the below formula:
vector = ( term k frequency ) / sqrt [ ( term 1 freq )² + ( term 2 freq )² +......+ ( term n freq )² ]
Similarly all the other co-ordinates of the document vector and query are calculated.

We wrote functions which calculate the frequency of the query words in every document. Based on this frequency, it calculates the vector values. The logic used in the function to calculate how many times the terms in the query are present in the documents is as follows: The query words are searched in the documents and if they match then the counter is incremented (substr function is used for this purpose). So, like this, there are 20 functions (for 20 query words) to check in the documents and 20 functions (for 20 query words) to check in the query as the query is also treated as a document. However, in the functions for the query, the logic is different. In this case, we just have to see if the query word is the same and if it is, then consider the weight that the user chooses and assign the frequency accordingly.

Then based on these frequencies, the vector values are calculated using the formula mentioned above as an array. Then these document and query vectors are passed to the FastMap algorithm part to get the positioning numbers.

Created By: Shruti Parikh, Sueyeon Syn, Kittipong Techapanichgul, Zhiwen Yu

December 16, 2004