Previous Up Next

3  Terms and definitions

For the purpose of ISO 24611, the following terms and definitions apply.

1
associative relation

relation by which a linguistic unit is associated with other units. It is a virtual association which does not requires their effective presence and differs from a paradigmatic relation in that the latter only refers to linguistic units associated by substitutability.



2
closed data category
data category
whose content is constrained by a list of permissible values which comprise its conceptual domain


NOTE: A typical closed data category might be /grammatical number/, which can have as its content the values: /singular/, /plural/ or /dual/.



3
conceptual domain

finite list of simple data categories that may be the values of a complex data category



4
data category

result of the specification of a given data field or the content of a closed data field


NOTE: A data category is to be used as an elementary descriptor in a linguistic structure or an annotation scheme. Examples are: /term/, /definition/, /part of speech/ and /grammatical gender/. Data categories for the management of lexical resources and terminology are comparable to data element concepts in ISO/IEC 11179-3:2003.



5
directed acyclic graph
DAG

graph with directed edges and no cycle



6
discourse




7
feature specification

the assignment of a value to a feature. In MAF, a feature shall denote a morpho-syntactic feature of a linguistic unit, such as the mood or tense of a verb.



8
feature structure

a set of feature specifications, used in MAF to express morpho-syntactic content.



9
finite state automata
FSA

finite set of transitions from state to state, with an initial state and a final one

See also DAG.



10
form

any sequence of letters, pictograms and numerals used to write or pronounce a word



11
grammatical category

See also part of speech



12
inflection

modification or marking of a so that it reflects grammatical (i.e. relational) information, such as grammatical gender, tense, person, etc.



13
inflection paradigm

a table illustrating the forms of an inflected word



14
inflected form

form that a word can take when used in a sentence or a phrase


NOTE: An inflected form of a word is associated with a combination of morphological features, such as grammatical number or case.



15
lattice

term often used in the NLP community to denote (with some slight confusion with the notion of algebraic lattice), an directed acyclic graph with an initial node and a final node.

See also DAG
See also FSA



16
lemma – lemmatized form

class of inflected forms differing only by inflectional morphology. A lemma is usually referred to by one of these forms, arbitrarily chosen (e.g., infinitive for French verbs).



17
lexeme

lexical morpheme to be distinguished from a grammatical morpheme by the fact that it belongs to an open list and that it bears an autonomous signification.





18
lexicon

resource comprising a collection of inflected forms or lemmas for a given language



19
morpheme

smallest linguistic unit bearing a signification in a discourse and that cannot be divided into smaller meaningful units. A morpheme is either grammatical (grammeme) or lexical (lexeme).



20
morphological feature
morpho-syntactic feature

category induced from the inflected form of a word


NOTE: ISO 12620 provides a comprehensive list of values for European languages. An example of a morphological feature is: /grammatical gender/.



21
morphology of a word
morpho-syntax of a word

description comprising the lemmatized form or forms of a word, plus additional information on its /part of speech/ data categories, possibly its inflectional paradigm or paradigms, and possibly its explicitly listed inflected forms.


NOTE: The term morpho-syntax is often used in place of morphology as it describes such features as number, gender, case etc. which are essential for syntactic agreement.



22
multi-word expression
MWE

an expression composed of an ordered group of words that has properties that are not predictable from the properties of the individual words or of their normal mode of combination.


NOTE: The group of words making up an MWE can be continuous or discontinuous.


EXAMPLE: "father in law" or "to be over the moon" that mean something different from what they appear to mean.



23
natural language processing
NLP

the field of study covering knowledge and techniques which allow computerized processing of linguistic data. This field combines a variety of skills including linguistics, mathematical logic, statistics, and algorithms.



24
open data category
data category
whose content cannot be fully enumerated due to the organic nature of language


EXAMPLE: Typical open data categories might include /term/, /lemma/.



25
syntagmatic relation

relation by which linguistic units in a discourse are associated.



26
morpho-syntactic tag

to an associative relation corresponds a feature, for which the related entities share the same value. The morpho-syntactic tag lists some of these features (part-of-speech, grammatical category, etc.).



27
part of speech

category assigned to a word based on its grammatical and semantic properties

See also grammatical category


NOTE:
ISO 12620 provides a comprehensive list of values for European languages. Examples of such values are: /noun/ and /verb/.





28
token

non-empty contiguous discourse sequence identified as such by a morpho-phonological analysis or an automatic processing of the discourse.

This can involve the recognition of a regular or algebraic language (matching of the separators), or a lexicological analysis (recognition of roots, morphological derivation and inflection, etc.).





29
tokenization

the process identifying tokens



30
word-form
morpho-syntactic unit

contiguous or non-contiguous entity from a speech or text sequence identified as such in an associative relation. This identification is the basis of morpho-syntactic tagging (part-of-speech, grammatical category, agreement feature, etc.). Morpho-syntactic units may have no acoustic or graphic realization, or correspond to one or more tokens.



31
romanization
transliteration
from a non-Latin script into a Latin script



32
script

set of graphic characters used for the written form of one or more languages (ISO/IEC 10646-1, 4.14)



33
simple data category
data category
that may be the possible content of a closed data category, but that cannot itself be further sub-divided


EXAMPLE: /masculine/, /feminine/, and /neuter/ are possible simple data categories associated with the conceptual domain of the closed data category /grammatical gender/ as it is associated with the German language.



34
transcription

form resulting from a coherent method of writing down speech sounds



35
transliteration

form resulting from the conversion of one writing system into another



36
word

in the context of a given language, is a description composed of at least a part of speech and a lemmatized form


NOTE:
The description can include more morphological information and/or syntactic and semantic information. A word is either a single word or a multi-word expression.



37
word class

See also part of speech




Previous Up Next