DISCOWordSpaces
DISCOWordSpaces
by Peter Kolb
DISCO Word Spaces is a tab widget for retrieving semantically similar words and collocations
for a query word. The similarities have been computed on the basis of large text corpora for different
languages and domains. DISCO Word Spaces helps at ontology building by suggesting similar and related words.
Contents
Versions & Compatibility
This section lists available versions of DISCOWordSpaces.
No version information available.
If you click on the button below to add a new version of DISCOWordSpaces, you will be asked to define a page title for the new version. Please adhere to the naming convention of DISCOWordSpaces X.X.X when you define the new page!
Changelog
No version information available.
Download
DISCO and DISCO Word Spaces are freely available and open source. They are licensed under the Apache License.
Download the plug-in: http://www.linguatools.de/disco/DISCO4Protege3-v1.0.zip
Installation
- Download
DISCO4Protege3-v1.0.zip
to theplugins
subdirectory of your Protege directory and unzip it. - Start Protege. In the menu select
Project
, thenConfigure
. A list with tab widgets will be displayed. SelectDISCOWordSpaces
and click the OK button. Now a new tab entitled DISCOWordSpaces should appear in the Protege window. - Before you can query a word space with the Protege plug-in you have to download a word space. You can find a list with the currently available word spaces at http://www.linguatools.de/disco/disco-download_en.html. In the table click on a word space name (in the column "Packet Name") and follow the download instructions on the upcoming page.
The DISCO Word Spaces plug-in has been tested with Protege 3.4.4 on Linux (Ubuntu 8.04, SuSe 9.1), Windows XP, and Mac OS X.
Using the DISCO Word Spaces plug-in
Open Word Space Directory
The first thing to do is to select a word space. On the upper right of the DISCO Word Spaces tab there is an entry field labeled "word space directory". Click the "Browse" button on the right of the entry field. Then select a word space that you have downloaded beforehand.
Query the Word Space
If you have opened the desired word space directory, you can now query the word space. Enter a word in the entry field labeled "enter query word" in the upper left of the tab, then click the search button or press enter. The result of the query will be shown in the table on the lower left of the tab.
The result table has six columns. In the second column the collocations for the search word are shown, in the fifth column the similar words are shown. Both columns are ordered by decreasing significance or similarity value, respectively. The significance value of the collocations is displayed in the third column, the similarity score in the sixth. Columns one and four contain the ranks.
Between the search entry field and the result table there is a field that shows the corpus frequency of the search word.
If an error occurs during the search an error message (in red) will be shown in the middle right of the tab.
Copy & Paste
In the result table you can select an entry by clicking the respective table cell. Copy the selected word to the clipboard using CTRL+C. You can now paste the copied word for instance into an entry field in Protege's classes tab using CTRL+V.
Specify the Number of Results
You can specifiy the maximum number of results that will be returned by a search using the entry field labeled "maximum number of results". The default value is 50. You can enter any whole number greater than 0. However, the number of collocations and similar words in the word space databases is limited, therefore the number of results returned may be lower than the specified number.
Compute Similarity
On the right side below the "maximum number of results" entry field there is a panel where you can compute the semantic similarity score between two input words. You can learn more about the similarity measures "DISCO1" and "DISCO2" in the following paper:Peter Kolb (2008). DISCO: A Multilingual Database of Distributionally Similar Words [1]. In A. Storrer et al. (Eds.), KONVENS 2008 - Ergänzungsband: Textressourcen und lexikalisches Wissen, Berlin 2008.
Contact
If you have any questions about DISCO Word Spaces send an email to peter.kolb@linguatools.org.