Document Selection
There is a webpage created which has ten
documents created in it. So when the user clicks on any of the
documents, the documents' content will be shown in a text area
beside the documents and he can select the documents he wishes
to. There are twenty query words along with 10 weights associated
with them. So the user can choose their query words along with
weights.
For the model that we have used, each document
and the query are treated as vectors. And we have calculated
the distance using Cosine Measure and Euclidean Distance. First
the frequency of each term is found and then the document vectors
are normalized using the below formula:
vector
= ( term k frequency ) / sqrt [ ( term 1 freq )2
+ ( term 2 freq )2 +......+ ( term n freq )2
]
Similarly all the other co-ordinates of the document vector
and query are calculated.
We wrote functions which calculate the frequency
of the query words in every document. Based on this frequency,
it calculates the vector values. The logic used in the function
to calculate how many times the terms in the query are present
in the documents is as follows: The query words are searched
in the documents and if they match then the counter is incremented
(substr function is used for this purpose). So, like this, there
are 20 functions (for 20 query words) to check in the documents
and 20 functions (for 20 query words) to check in the query
as the query is also treated as a document. However, in the
functions for the query, the logic is different. In this case,
we just have to see if the query word is the same and if it
is, then consider the weight that the user chooses and assign
the frequency accordingly.
Then based on these frequencies, the vector
values are calculated using the formula mentioned above as an
array. Then these document and query vectors are passed to the
FastMap algorithm part to get the positioning numbers.