DISCOWordSpaces

From Protege Wiki
Jump to: navigation, search

DISCOWordSpaces

by Peter Kolb

Screenshot

Type Tab Widget
Author(s) Peter Kolb
Last Update 2010/06/19
License Apache License Version 2.0
Homepage DISCOWordSpaces website
For Application
Topic(s)
Affiliation

DISCO Word Spaces is a tab widget for retrieving semantically similar words and collocations for a query word. The similarities have been computed on the basis of large text corpora for different languages and domains. DISCO Word Spaces helps at ontology building by suggesting similar and related words.

Versions & Compatibility

This section lists available versions of DISCOWordSpaces.

No version information available.

If you click on the button below to add a new version of DISCOWordSpaces, you will be asked to define a page title for the new version. Please adhere to the naming convention of DISCOWordSpaces X.X.X when you define the new page!

Changelog

No version information available.


Download

DISCO and DISCO Word Spaces are freely available and open source. They are licensed under the Apache License.

Download the plug-in: http://www.linguatools.de/disco/DISCO4Protege3-v1.0.zip

Installation

  • Download DISCO4Protege3-v1.0.zip to the plugins subdirectory of your Protege directory and unzip it.
  • Start Protege. In the menu select Project, then Configure. A list with tab widgets will be displayed. Select DISCOWordSpaces and click the OK button. Now a new tab entitled DISCOWordSpaces should appear in the Protege window.
  • Before you can query a word space with the Protege plug-in you have to download a word space. You can find a list with the currently available word spaces at http://www.linguatools.de/disco/disco-download_en.html. In the table click on a word space name (in the column "Packet Name") and follow the download instructions on the upcoming page.

The DISCO Word Spaces plug-in has been tested with Protege 3.4.4 on Linux (Ubuntu 8.04, SuSe 9.1), Windows XP, and Mac OS X.

Using the DISCO Word Spaces plug-in

Open Word Space Directory

The first thing to do is to select a word space. On the upper right of the DISCO Word Spaces tab there is an entry field labeled "word space directory". Click the "Browse" button on the right of the entry field. Then select a word space that you have downloaded beforehand.

Query the Word Space

If you have opened the desired word space directory, you can now query the word space. Enter a word in the entry field labeled "enter query word" in the upper left of the tab, then click the search button or press enter. The result of the query will be shown in the table on the lower left of the tab.

The result table has six columns. In the second column the collocations for the search word are shown, in the fifth column the similar words are shown. Both columns are ordered by decreasing significance or similarity value, respectively. The significance value of the collocations is displayed in the third column, the similarity score in the sixth. Columns one and four contain the ranks.

Between the search entry field and the result table there is a field that shows the corpus frequency of the search word.

If an error occurs during the search an error message (in red) will be shown in the middle right of the tab.

Copy & Paste

In the result table you can select an entry by clicking the respective table cell. Copy the selected word to the clipboard using CTRL+C. You can now paste the copied word for instance into an entry field in Protege's classes tab using CTRL+V.

Specify the Number of Results

You can specifiy the maximum number of results that will be returned by a search using the entry field labeled "maximum number of results". The default value is 50. You can enter any whole number greater than 0. However, the number of collocations and similar words in the word space databases is limited, therefore the number of results returned may be lower than the specified number.

Compute Similarity

On the right side below the "maximum number of results" entry field there is a panel where you can compute the semantic similarity score between two input words. You can learn more about the similarity measures "DISCO1" and "DISCO2" in the following paper:
Peter Kolb (2008). DISCO: A Multilingual Database of Distributionally Similar Words [1]. In A. Storrer et al. (Eds.), KONVENS 2008 - Ergänzungsband: Textressourcen und lexikalisches Wissen, Berlin 2008.

Contact

If you have any questions about DISCO Word Spaces send an email to peter.kolb@linguatools.org.