Users can see the similarity distance value between two documents by clicking on the two observed documents. Users may select observed documents more than two in case that users want to see the similarity distance value inside a set of documents (it is the subset of the documents in space).
If users want to see the distance between a document and other documents in space, they can simply click on a wanted document once.
Besides, if users want to see the whole similarity distance values in the space in nice format, users can select "show similarity distance table" in view menu tab. The whole distance values will be put in the matrix to increaser users' readabiity.
There are two ways for importing the similarity distance data to the tool to display. First way is to indicate the URL of a specific datasource containing a xml data file as applet parameter. For this case, the tool will open as service mode. By simply add "param" tag inside the "applet" tag, the data will be loaded to the tool. If the file has incorrect format, the tool will open as standalong mode instead. An example on how to indicate the datasource is shown below.
<applet code="SpaceDisplay.class" codebase = "bin/" name="3dspace" width="700" height="500">
<param name=datasource value="http://www.sis.pitt.edu/~ktech/XML/dist1.xml">
</applet>
Another way is to run the tool as standalone and then open the similarity distance data file, which only text or xml file format is accepted. Also, the tool allows users to drag a data file (but it must be .txt or .xml only) and drop it to the applet.
Step 1
Step 2
Step 3
The tool accepts two file formats that are text file (.txt) and xml file (.xml). For the text file case, the file structure is shown below.
A,B,C,D,E
0, 10, 20, 30, 40
10, 0, 10, 20, 30
20, 10, 0, 10, 20
30, 20, 10, 0, 10
40, 30, 20, 10, 0
The unique name will be given of the first row. Second row presents the distance between document A and any other document in space. Specifically speaking, document A is far from document B 10 units;document A is far from document C 20 units and so are other documents. Because it has 5 documents in the space, there are 5 rows in similarity distance expression part. You can download an example of text file used as input here: distancetest.txt
However, I would recommend users to create the similarlity distance data file in xml format because it brings more powerful input expression. For xml format case, The tool requires two main part of document information for space display.
The first part is individual information of a document in space. It must contain id, title, size, content and type of the document.
<document id="1" title="Document Title A" sizeMB="1.0" content="Test Content" fileType="txt">
The second part is called "similaritydistance" part which define the similarity distance between a document and any other document. It must be indicated inside the document tag.
<similaritydistance id="2" distance="10.0" />
<similaritydistance id="3" distance="100.0" />
<similaritydistance id="4" distance="30.0" />
<similaritydistance id="5" distance="100.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="20.0" />
The ids will be uniquely assign by users to any given document. However, users must keep in mind that ids will be used to indicate dissimilar documents. So, please ensure that the ids are not duplicate or redundant in order to prevent an incorrect space presentation. Each document can be described in xml fle as shown below.
<document id="1" title="Document Title A" sizeMB="1.0" content="Test Content" fileType="txt">
<similaritydistance id="2" distance="10.0" />
<similaritydistance id="3" distance="100.0" />
<similaritydistance id="4" distance="30.0" />
<similaritydistance id="5" distance="100.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="20.0" />
</document>
According to the expression above, the distance between document no.1 and the document no.2 is equal to 10 units and so are other pairs. An example of the complete xml file which has 10 documents in the space is provided as follow:
<?xml version="1.0" encoding="UTF-8" ?>
<documents documentcount="10">
<document id="1" title="Document Title A" sizeMB="1.0" content="Test Content" fileType="txt">
<similaritydistance id="2" distance="10.0" />
<similaritydistance id="3" distance="100.0" />
<similaritydistance id="4" distance="30.0" />
<similaritydistance id="5" distance="100.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="20.0" />
</document>
<document id="2" title="Document Title B" sizeMB="1.0" content="Test Content" fileType="doc">
<similaritydistance id="1" distance="10.0" />
<similaritydistance id="3" distance="75.0" />
<similaritydistance id="4" distance="85.0" />
<similaritydistance id="5" distance="15.0" />
<similaritydistance id="6" distance="150.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="20.0" />
</document>
<document id="3" title="Document Title C" sizeMB="1.0" content="Test Content" fileType="htm">
<similaritydistance id="1" distance="100.0" />
<similaritydistance id="2" distance="75.0" />
<similaritydistance id="4" distance="60.0" />
<similaritydistance id="5" distance="100.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="15.0" />
<similaritydistance id="8" distance="10.0" />
<similaritydistance id="9" distance="5.0" />
<similaritydistance id="10" distance="40.0" />
</document>
<document id="4" title="Document Title D" sizeMB="1.0" content="Test Content" fileType="txt">
<similaritydistance id="1" distance="30.0" />
<similaritydistance id="2" distance="85.0" />
<similaritydistance id="3" distance="60.0" />
<similaritydistance id="5" distance="20.0" />
<similaritydistance id="6" distance="20.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="140.0" />
<similaritydistance id="10" distance="40.0" />
</document>
<document id="5" title="Document Title E" sizeMB="1.0" content="Test Content" fileType="html">
<similaritydistance id="1" distance="100.0" />
<similaritydistance id="2" distance="15.0" />
<similaritydistance id="3" distance="100.0" />
<similaritydistance id="4" distance="20.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="100.0" />
<similaritydistance id="8" distance="40.0" />
<similaritydistance id="9" distance="40.0" />
<similaritydistance id="10" distance="60.0" />
</document>
<document id="6" title="Document Title F" sizeMB="1.0" content="Test Content" fileType="N/A">
<similaritydistance id="1" distance="200.0" />
<similaritydistance id="2" distance="150.0" />
<similaritydistance id="3" distance="200.0" />
<similaritydistance id="4" distance="20.0" />
<similaritydistance id="5" distance="200.0" />
<similaritydistance id="7" distance="40.0" />
<similaritydistance id="8" distance="50.0" />
<similaritydistance id="9" distance="60.0" />
<similaritydistance id="10" distance="200.0" />
</document>
<document id="7" title="Document Title G" sizeMB="1.0" content="Test Content" fileType="zip">
<similaritydistance id="1" distance="20.0" />
<similaritydistance id="2" distance="20.0" />
<similaritydistance id="3" distance="15.0" />
<similaritydistance id="4" distance="20.0" />
<similaritydistance id="5" distance="100.0" />
<similaritydistance id="6" distance="40.0" />
<similaritydistance id="8" distance="100.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="20.0" />
</document>
<document id="8" title="Document Title H" sizeMB="1.0" content="Test Content" fileType="office">
<similaritydistance id="1" distance="40.0" />
<similaritydistance id="2" distance="40.0" />
<similaritydistance id="3" distance="10.0" />
<similaritydistance id="4" distance="40.0" />
<similaritydistance id="5" distance="40.0" />
<similaritydistance id="6" distance="50.0" />
<similaritydistance id="7" distance="100.0" />
<similaritydistance id="9" distance="50.0" />
<similaritydistance id="10" distance="15.0" />
</document>
<document id="9" title="Document Title I" sizeMB="1.0" content="Test Content" fileType="xls">
<similaritydistance id="1" distance="50.0" />
<similaritydistance id="2" distance="50.0" />
<similaritydistance id="3" distance="5.0" />
<similaritydistance id="4" distance="140.0" />
<similaritydistance id="5" distance="40.0" />
<similaritydistance id="6" distance="60.0" />
<similaritydistance id="7" distance="50.0" />
<similaritydistance id="8" distance="50.0" />
<similaritydistance id="10" distance="50.0" />
</document>
<document id="10" title="Document Title J" sizeMB="1.0" content="Test Content" fileType="N/A">
<similaritydistance id="1" distance="20.0" />
<similaritydistance id="2" distance="20.0" />
<similaritydistance id="3" distance="40.0" />
<similaritydistance id="4" distance="40.0" />
<similaritydistance id="5" distance="60.0" />
<similaritydistance id="6" distance="200.0" />
<similaritydistance id="7" distance="20.0" />
<similaritydistance id="8" distance="15.0" />
<similaritydistance id="9" distance="50.0" />
</document>
</documents>
You can download an example of MXL files used as input here: distancetest.xml or icontest.xml