3 Terms and definitions
For the purpose of ISO 24611, the following terms and definitions
apply.
1
associative relation
relation by which a linguistic unit is associated with other units.
It is a virtual association which does not requires their effective
presence and differs from a paradigmatic relation in that the latter
only refers to linguistic units associated by substitutability.
2
closed data category
data category whose content is constrained by a list
of permissible values which comprise its conceptual domain
NOTE: A typical closed data category might be /grammatical
number/, which can have as its content the values:
/singular/, /plural/ or /dual/.
3
conceptual domain
finite list of simple data categories that may
be the values of a complex data category
4
data category
result of the specification of a given data field or the content of
a closed data field
NOTE: A data category is to be used as an elementary descriptor in a
linguistic structure or an annotation scheme. Examples are:
/term/, /definition/, /part of speech/ and
/grammatical gender/. Data categories for the management of
lexical resources and terminology are comparable to data element
concepts in ISO/IEC 11179-3:2003.
5
directed acyclic graph
DAG
graph with directed edges and no cycle
6
discourse
7
feature specification
the assignment of a value to a feature. In MAF, a feature shall
denote a morpho-syntactic feature of a
linguistic unit, such as the mood or tense of a verb.
8
feature structure
a set of feature specifications, used in MAF to
express morpho-syntactic content.
9
finite state automata
FSA
finite set of transitions from state to state, with an initial state
and a final one
See also DAG.
10
form
any sequence of letters, pictograms and numerals used to write or
pronounce a word
11
grammatical category
See also part of speech
12
inflection
modification or marking of a so that it reflects
grammatical (i.e. relational) information, such as grammatical
gender, tense, person, etc.
13
inflection paradigm
a table illustrating the forms of an inflected
word
14
inflected form
form that a word can take when used in a sentence or a phrase
NOTE: An inflected form of a word is associated with a combination
of morphological features, such as grammatical number or case.
15
lattice
term often used in the NLP community to denote (with some slight
confusion with the notion of algebraic lattice), an directed
acyclic graph with an initial node and a final node.
See also DAG
See also FSA
16
lemma – lemmatized form
class of inflected forms differing only by
inflectional morphology. A lemma is usually referred to by one of
these forms, arbitrarily chosen (e.g., infinitive for French verbs).
17
lexeme
lexical morpheme to be distinguished from a grammatical
morpheme by the fact that it belongs to an open list and that it
bears an autonomous signification.
18
lexicon
resource comprising a collection of inflected
forms or lemmas for a given language
19
morpheme
smallest linguistic unit bearing a signification in a discourse and
that cannot be divided into smaller meaningful units. A morpheme is
either grammatical (grammeme) or lexical (lexeme).
20
morphological feature
morpho-syntactic feature
category induced from the inflected form of a
word
NOTE: ISO 12620 provides a comprehensive list of values for European
languages. An example of a morphological feature is:
/grammatical gender/.
21
morphology of a word
morpho-syntax of a word
description comprising the lemmatized form or forms
of a word, plus additional information on its /part of
speech/ data categories, possibly its inflectional
paradigm or paradigms, and possibly its explicitly listed inflected
forms.
NOTE: The term morpho-syntax is often used in place of morphology as
it describes such features as number, gender, case etc. which are
essential for syntactic agreement.
22
multi-word expression
MWE
an expression composed of an ordered group of words that has
properties that are not predictable from the properties of the
individual words or of their normal mode of combination.
NOTE: The group of words making up an MWE can be continuous or
discontinuous.
EXAMPLE: "father in law" or "to be over the moon" that mean
something different from what they appear to mean.
23
natural language processing
NLP
the field of study covering knowledge and techniques which allow computerized
processing of linguistic data.
This field combines a variety of skills including linguistics,
mathematical logic, statistics, and algorithms.
24
open data category
data category whose content cannot be fully enumerated
due to the organic nature of language
EXAMPLE: Typical open data categories might include /term/,
/lemma/.
25
syntagmatic relation
relation by which linguistic units in a discourse are associated.
26
morpho-syntactic tag
to an associative relation corresponds a feature, for which the
related entities share the same value. The morpho-syntactic tag
lists some of these features (part-of-speech, grammatical category,
etc.).
27
part of speech
category assigned to a word based on its grammatical and semantic
properties
See also grammatical category
NOTE: ISO 12620 provides a comprehensive list of values for European
languages. Examples of such values are: /noun/ and
/verb/.
28
token
non-empty contiguous discourse sequence identified as such by a
morpho-phonological analysis or an automatic processing of the
discourse.
This can involve the recognition of a regular or algebraic language
(matching of the separators), or a lexicological analysis
(recognition of roots, morphological derivation and inflection,
etc.).
29
tokenization
the process identifying tokens
30
word-form
morpho-syntactic unit
contiguous or non-contiguous entity from a speech or text sequence
identified as such in an associative relation.
This identification is the basis of morpho-syntactic tagging
(part-of-speech, grammatical category, agreement
feature, etc.). Morpho-syntactic units may have no acoustic or
graphic realization, or correspond to one or more
tokens.
31
romanization
transliteration from a non-Latin script
into a Latin script
32
script
set of graphic characters used for the written form of one or more
languages (ISO/IEC 10646-1, 4.14)
33
simple data category
data category that may be the possible content of a
closed data category, but that cannot itself be
further sub-divided
EXAMPLE: /masculine/, /feminine/, and
/neuter/ are possible simple data categories associated with
the conceptual domain of the closed
data category /grammatical gender/ as it is associated
with the German language.
34
transcription
form resulting from a coherent method of writing down speech sounds
35
transliteration
form resulting from the conversion of one writing system into
another
36
word
in the context of a given language, is a description composed of at
least a part of speech and a lemmatized
form
NOTE: The description can include more morphological information
and/or syntactic and semantic information. A word is either a single
word or a multi-word expression.
37
word class
See also part of speech