This section contains a short guide on how to use the Extracted Concepts list of the corpus management.
The Extracted Concepts list contains all concepts from the thesaurus that have been detected in the uploaded documents.
In your opened corpus, click the Extracted Concepts (1) tab to open the Extracted Concepts list. Details about available options find below.
- Use Search Concepts (2) to filter for concepts, click Search to start the search. A list of results will be displayed. Click Reset to display the whole list.
- In the drop down Search Criteria (3) you can select to filter concepts found in the corpus or alternatively to filter concepts not found in the corpus.
Use the Show Matching Terms icon (4) to display a list of extracted terms that are similar to the respective concept. This helps to identify synonyms (alternative labels) and new narrower concepts which can be added to a concept with a few mouse clicks.
You can use the Add to Blacklist icon (5) to blacklist a concept, in order to exclude it from the extraction results.
The Extracted Concept list provides an overview of how often these concepts were found in the document corpus. It also lists the most frequently used label of each concept. In addition, the broader concepts and concept schemes of those concepts are displayed.
You can sort the first three table columns by clicking the table headers. The image shows a table sorted by Relevance column:
The table columns in the Extracted Concepts tab can be used to sort for Preferred Label, Frequency and Most Frequent Label.
The columns and their content provide the following information:
- Preferred Label: the actual label of the concept contained in the thesaurus that has been extracted from the corpus documents.
- Frequency: the total number of times a concept has been found in the corpus documents.
- Relevance: displays the scores of concepts found inside the corpus during the Corpus Analysis. Use these scores as information about the validity of the extracted concepts. The relevance of concepts here is calculated similarly to the terms in the Extracted Terms list.
- Most Frequent Label: displays labels of concepts that have been found as part of the term or as phrase in the corpus documents.
- Broader Concepts: displays the skos:broaders for the respective concept in that table row.
- Concept Scheme: displays the concept scheme that concept is part of.