Update: 2015-12-09 11:35 PM -0500


Lexicology : Inflection


collection by U Kyaw Tun, M.S. (I.P.S.T., U.S.A.). Not for sale. No copyright. Free for everyone. Prepared for students and staff of TIL Research Station, Yangon, MYANMAR :  http://www.tuninst.net , www.romabama.blogspot.com

Lexical semantics
History of Lexical semantics
Lemma (head word)
Stem vs. lemma


UKT Notes
• lemmatisation • lexeme • lexicography • syntactic category 

Noteworthy passages in this file: (always check with the original section from which they are taken.)
• A stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, given the word "produced", its lemma is "produce", however the stem is "produc-." This is because there are words such as production.
• ... a lexeme in many languages will have many different forms.

UKT: Now that we are going to study Lexicology, we should at least know what a lexeme is. And more: lemma .

by UKT based on Wikipedia: http://en.wikipedia.org/wiki/Lexicology 090802

Lexicology (from lexiko-, in the Late Greek lexikon) is that part of linguistics which studies words, their nature and meaning, words' elements, relations between words (semantical relations), words groups and the whole lexicon.

The term first appeared in the 1820s, though there were lexicologists in essence before the term was coined. Computational lexicology as a related field (in the same way that computational linguistics is related to linguistics) deals with the computational study of dictionaries and their contents. An allied science to lexicology is lexicography, which also studies words in relation with dictionaries - it is actually concerned with the inclusion of words in dictionaries and from that perspective with the whole lexicon. Therefore lexicography is the theory and practice of composing dictionaries. Sometimes lexicography is considered to be a part or a branch of lexicology, but the two disciplines should not be mistaken: lexicographers are the people who write dictionaries, they are at the same time lexicologists too, but not all lexicologists are lexicographers. It is said that lexicography is the practical lexicology, it is practically oriented though it has its own theory, while the pure lexicology is mainly theoretical.

UKT: Contd. in History of Lexical semantics below.

Lexical semantics

by UKT based on Wikipedia: http://en.wikipedia.org/wiki/Lexicology 090802


Semantical relations between words are manifested in respect of homonymy, antonymy, paronymy, etc. Semantics usually involved in lexicological work is called lexical semantics. Lexical semantics is somewhat different from other linguistic types of semantics like phrase semantics, semantics of sentence, and text semantics, as they take the notion of meaning in much broader sense. There are outside (although sometimes related to) linguistics types of semantics like cultural semantics and computational semantics, as the latest is not related to computational lexicology but to mathematical logic. Among semantics of language, lexical semantics is most robust, and to some extend the phrase semantics too, while other types of linguistic semantics are new and not quite examined.

UKT: Contd. in History of Lexical semantics below.

History of Lexical semantics

by UKT based on Wikipedia: http://en.wikipedia.org/wiki/Lexicology 090802

Lexical semantics may not be understood without a brief exploration of its history.

Prestructuralist semantics

Semantics as a linguistic discipline has its beginning in the middle of the 19th century, and because linguistics at the time was predominantly diachronic, thus lexical semantics was diachronic too -- it dominated the scene between the years of 1870 and 1930. [1] Diachronic lexical semantics was interested without a doubt in the change of meaning with predominantly semasiological approach, taking the notion of meaning in a psychological aspect: lexical meanings were considered to be psychological entities), thoughts and ideas, and meaning changes are explained as resulting from psychological processes.

Structuralist and neostructuralist semantics

With the rise of new ideas after the ground break of Saussure's work, prestructuralist diachronic semantics was considerably criticized for the atomic study of words, the diachronic approach and the mingle of nonlinguistics spheres of investigation. The study became synchronic, concerned with semantic structures and narrowly linguistic.

Semantic structural relations of lexical entities can be seen in three ways:

• semantic similarity
• lexical relations such as synonymy, antonymy, and hyponymy
• syntagmatic lexical relations were identified

As structuralist lexical semantics was revived by neostructuralist not much work was done by them, it is actually admitted by the followers.

It may be seen that WordNet "is a type of an online electronic lexical database organized on relational principles, which now comprises nearly 100,000 concepts" as Dirk Geeraerts [2] states it.

Chomskyan school

Followers of Chomskyan generative approach to grammar soon investigated two different types of semantics, which, unfortunately, clashed in an effusive debate [3], these were interpretative and generative semantics.

Cognitive semantics

Cognitive lexical semantics is thought to be most productive of the current approaches.


Another branch of lexicology, together with lexicography is phraseology. It studies compound meanings of two or more words, as in "raining cats and dogs". Because the whole meaning of that phrase is much different from the meaning of words included alone, phraseology examines how and why such meanings come in everyday use, and what possibly are the laws governing these word combinations. Phraseology also investigates idioms.


Since lexicology studies the meaning of words and their semantic relations, it often explores the origin and history of a word, i.e. its etymology. [UKT Ά ]

Etymologists analyse related languages using a technique known as the comparative method. In this way, word roots have been found that can be traced all the way back to the origin of, for instance, the Indo-European language family. However, the comparative method is unhelpful in the case of "multiple causation" [4], when a word derives from several sources simultaneously as in phono-semantic matching. [5]

Etymology can be helpful in clarifying some questionable meanings, spellings, etc., and is also used in lexicography. For example, etymological dictionaries provide words with their historical origins, change and development.


A good example of lexicology at work, that everyone is familiar with, is that of dictionaries and thesaurus. Dictionaries are books or computer programs (or databases) that actually represent lexicographical work, they are opened and purposed for the use of public.

As there are many different types of dictionaries, there are many different types of lexicographers.

Questions that lexicographers are concerned with are for example the difficulties in defining what simple words such as 'the' mean, and how compound or complex words, or words with many meanings can be clearly explained. Also which words to keep in and which not to include in a dictionary.

Noted lexicographers

Some noted lexicographers include:

• Dr. Samuel Johnson (September 18, 1709 – December 13, 1784)
• French lexicographer Pierre Larousse (October 23, 1817-January 3, 1875)
• Noah Webster (October 16, 1758 – May 28, 1843)
• Russian lexicographer Vladimir Dal (November 10, 1801 – September 22, 1872)

UKT: More in Wikipedia article

Lemma (Linguistics)

From Wikipedia: http://en.wikipedia.org/wiki/Lemma 090725

In linguistics a lemma (plural lemmas or lemmata) has two distinct interpretations:

1. morphology / lexicography: the canonical form or citation form of a set of forms (headword);
e.g., in English, <run>, <runs>, <ran> and <running> are forms of the same lexeme, with  RUN as the lemma.

2. psycholinguistics: Abstract conceptual form that has been mentally selected for utterance in the early stages of speech production, but before any sounds are attached to it.

UKT: (Waiting for comments from my peers - 090904) The notion of Lemma is nothing new in the East. It is comparable to Sanskrit Sphoṭa (literally "bursting, opening"). Look for the following passages in lang.htm :

Sphoṭa (literally "bursting, opening") is an important concept in Sanskrit philosophy of language, relating to the problem of speech production, how the mind orders linguistic units into coherent discourse.

€ In verse I.93, Bhartṛhari states that the 'sphota' is the universal or linguistic type - sentence-type or word-type, as opposed to their tokens (sounds) [2]. He makes a distinction between sphoṭa, which is whole and indivisible, and 'nāda' {na-da.}, the sound, which is sequenced and therefore divisible. The sphoṭa is the causal root, the intention, behind an utterance, in which sense is similar to the notion of lemma in most psycholinguistic theories of speech production. However, sphoṭa arises also in the listener, which is different from the lemma position. Uttering the 'nāda' induces the same mental state or sphoṭa in the listener - it comes as a whole, in a flash of recognition or intuition ( pratibhā , 'shining forth'). This is particularly true for vakya-sphoṭa or sentence-vibration, where the entire sentence is thought of (by the speaker), and grasped (by the listener) as a whole.

A lemma in morphology is the canonical form of a lexeme. Lexeme, in this context, refers to the set of all the forms that have the same meaning, and lemma refers to the particular form that is chosen by convention to represent the lexeme. [UKT Ά ]

In lexicography, this unit is usually also the citation form or headword by which it is indexed. Lemmas have special significance in highly inflected languages such as Czech. The process of determining the lemma for a given word is called lemmatisation.

The psycholinguistics interpretation refers to one of the more widely accepted psycholinguistic models of speech production, referring to an early stage in the mental preparation for an utterance. Here, lemma is the abstract form of a word that arises after the word has been selected mentally, but before any information has been accessed about the sounds in it (and thus before the word can be pronounced). It therefore contains information concerning only meaning and the relation of this word to others in the sentence. This notion of lemma is similar to the Sanskrit sphota (6th c.), an invariant mental word, of which the sound is a feature.

Morphology / Lexicography

In a dictionary, the lemma GO represents the inflected forms <go>, <goes>, <going>, <went>, and <gone>. The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". The disadvantage of such simplifications is, of course, the inability to look up a declined or conjugated form of the word, although some dictionaries, like Webster's, will list "went". Multilingual dictionaries vary in how they deal with this issue: the Langenscheidt dictionary of German does not list ging (< gehen); the Cassell does.

€ GO - example of a lexeme
€ <go>, <goes>, <going>, <went> and <gone> - forms of the same lexeme GO
€ GO - headword or lemma

The form that is chosen to be the lemma is usually the least marked form, though there are occasional exceptions; e.g., in Finnish, the dictionaries lists verbs not under the verb root, but under the first infinitive marked with -(t)a, -(t)δ.

Lemmas or word stems are used often in corpus linguistics for determining word frequency. In such usage the specific definition of "lemma" is flexible depending on the task it is being used for.

Lemmas in different languages

In English, the citation form of a noun is the singular: e.g., <mouse> rather than <mice>. For multi-word lexemes which contain possessive adjectives or reflexive pronouns, the citation form uses a form of the indefinite pronoun <one> : e.g., <do one's best>, <perjure oneself>. [UKT Ά ]

In languages with grammatical gender, the citation form of regular adjectives and nouns is usually the masculine singular. If the language additionally has cases, the citation form is often the masculine singular nominative.

In many languages, the citation form of a verb is the infinitive:

€ French aller,

€ German gehen.

€ English - it usually is the full infinitive (to go), but the bare infinitive for some defective verbs (must).

Latin, Ancient Greek, and Modern Greek (which has no infinitive), - the first person singular present tense is normally used, though occasionally the infinitive may also be seen. (For contracted verbs in Greek, an uncontracted first person singular present tense is used to reveal the contract vowel, e.g. φιλέω philιō for φιλῶ philō "I love" [implying affection]; αγαπάω agapαō for αγαπῶ agapō "I love" [implying regard]).

€ Japanese - the non-past (present and future) tense is used.

In Arabic, which has no infinitives, the third person singular masculine of the past tense is the least-marked form, and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used today, the triliteral of the word, either a verb or a noun, is used. Hebrew often uses the 3rd person masculine qal perfect, e.g., ברא bara' create, כפר kaphar deny. For Korean, -da is attached to the stem.

Some phrases are cited in a sort of lemma, e.g., Carthago delenda est (literally, "Carthage must be destroyed") is a common way of citing Cato, although what he said was more like, Ceterum censeo Carthaginem esse delendam ("As to the rest, I hold that Carthage must be destroyed").

UKT: Contd. below in Difference between stem and lemma .

Stem vs. Lemma

From Wikipedia: http://en.wikipedia.org/wiki/Lemma 090725

A stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the verb. For example, given the word "produced", its lemma is "produce", however the stem is "produc-." This is because there are words such as production. [1]

Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "wen-". (The "-t" of "went" may be considered as being derived from the past tense "-ed".)


When we produce a word, we are essentially turning our thoughts into sounds (a process known as lexicalisation). In many psycholinguistic models this is considered to be at least a two-stage process. The lemma is thus intermediate between the semantic level (where meaning is specified) and the phonological level (where the sounds of the word are specified). It is an abstract form containing syntactic information (about how the word can be used in a sentence), but no information about the pronunciation of the word. In this context, the lexeme is the phonologically specified form that is selected after the lemma.

This two-staged model is the most widely supported theory of speech production in psycholinguistics [2], although it has been recently challenged. [3] For example, there is some evidence to indicate that the grammatical gender of a noun is retrieved from the word's phonological form (the lexeme) rather than from the lemma. [4] This is easily explained by Caramazza's Independent Network model, which does not assume a distinct level between the semantic and the phonological stages (so there is no lemma representation); in this model, syntactic information about the word in this model is activated in the semantic or phonological level (so gender would be activated in the latter). [5]

UKT: End of Wikipedia article

UKT notes


based on Wikipedia: http://en.wikipedia.org/wiki/Lemmatisation 090731

In linguistics, lemmatisation is the process of grouping together the different inflected forms of a word so they can be analysed as a single item. [1]

In computational linguistics, lemmatisation is the algorithmic process of determining the lemma for a given word. Since the process may involve complex tasks such as understanding context and determining the part of speech of a word in a sentence (requiring, for example, knowledge of the grammar of a language) it can be a hard task to implement a lemmatiser for a new language.

In many languages, words appear in several inflected forms. For example, in English:

€ verb <to walk> may appear as <walk>, <walked>, <walks>, <walking>.
-- The base form, <walk>, is the one you might look up in a dictionary, is called the lemma for the word. [UKT Ά ]

The combination of the base form with the part of speech is often called the lexeme of the word.

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

For instance:

1. The word "better" has "good" as its lemma, but this is missed by stemming.
2. The word "walk" is the base form for word "walking", and hence this is matched in both stemming and lemmatization.

Analysers like Lucene Snowball [2] store the base stemmed format of the word without the knowledge of meaning, but taking into account the semantics of the word formation only. The stemmed word itself might not be valid (see lazy below).

The following is an example of lemmatisation and stemming. Given the following sentence:

[The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs]

org.apache.lucene.analysis.snowball.SnowballAnalyzer gives the following stems:

 [quick] [brown] [fox] [jump] [over] [lazy] [dog]

the lemmas from the words in the sentence would be as follows:

 [the] [quick] [brown] [fox] [jump] [over] [the] [lazy] [dog]

UKT: End of Wikipedia article.

lex·eme n. 1. The fundamental unit of the lexicon of a language. [The words] <find>, <found>, and <finding> are members of the English lexeme <find>. [ lex(icon) -eme ] -- AHTD
(UKT: I've edited this entry from AHTD inserting the <...> to suit the TIL bracket convention.)

The following is from Wikipedia: http://en.wikipedia.org/wiki/Lexeme 090730

A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, <run>, <runs>, <ran> and <running> are forms of the same lexeme, conventionally written as RUN. (Wiki-note-lexeme-01) A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.

Wiki-note-lexeme-01: RUN is here intended to display in small caps. Software limitations may result in its display either in full-sized capitals (RUN) or in full-sized capitals of a smaller font; either is anyway regarded as an acceptable substitute for genuine small caps. Wiki-note-lexeme-01b
UKT: Instead of displaying LEMMA in small caps, it is simpler to display it in the default size, as: RUN .

UKT: A headword, head word, lemma, or sometimes catchword is the word under which a set of related dictionary or encyclopaedia entries appears. [It is thus referred to as the base form of the word]. The headword is used to locate the entry, and dictates its alphabetical position. -- Wikipedia http://en.wikipedia.org/wiki/Headword 090801

€ <run> - example of a lexeme
€ <run>, <runs>, <ran> and <running> - forms of the same lexeme RUN
run - headword or lemma

A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. [UKT Ά ]

For example, the lexeme RUN has :

€ a present third person singular form <runs>,
€ a present non-third-person-singular form <run> (which also functions as the past participle and non-finite form),
€ a past form <ran>, and
€ a present participle <running>. (It does not include runner, runners, runnable, etc.)

The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as Run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.

A lexicon consists of lexemes.

In many formal theories of language, lexemes have subcategorization frames to account for the number and types of complements they occur within sentences and other syntactic structures.

The notion of a lexeme is very central to morphology, and thus, many other notions can be defined in terms of it. For example, the difference between inflection and derivation can be stated in terms of lexemes:

• Inflectional rules relate a lexeme to its forms.
• Derivational rules relate a lexeme to another lexeme.


Lexemes are often composed of smaller units with individual meaning called morphemes, according to root morpheme + derivational morphemes + desinence (not necessarily in this order), where:

• The root morpheme is the primary lexical unit of a word, which carries the most significant aspects of semantic content and cannot be reduced to smaller constituents. [2]

• The derivational morphemes carry only derivational information. [3]

• The desinence is composed of all inflectional morphemes, and carries only inflectional information. [4]

The compound root morpheme + derivational morphemes is often called the stem.[5] The decomposition stem + desinence can then be used to study inflection.

UKT: End of Wikipedia article.

From Wikipedia : http://en.wikipedia.org/wiki/Lexicography 090818

The pursuit of lexicography is divided into two related disciplines:

• Practical lexicography is the art or craft of compiling, writing and editing dictionaries.

• Theoretical lexicography is the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situation, and how users may best access the data incorporated in printed and electronic dictionaries. This is sometimes referred to as 'metalexicography'.

A person devoted to lexicography is called a lexicographer.

General lexicography focuses on the design, compilation, use and evaluation of general dictionaries, i.e. dictionaries that provide a description of the language in general use. Such a dictionary is usually called a general dictionary or LGP dictionary. Specialized lexicography focuses on the design, compilation, use and evaluation of specialized dictionaries, i.e. dictionaries that are devoted to a (relatively restricted) set of linguistic and factual elements of one or more specialist subject fields, e.g. legal lexicography. Such a dictionary is usually called a specialized dictionary or LSP dictionary.

There is some disagreement on the definition of lexicology, as distinct from lexicography. Some use "lexicology" as a synonym for theoretical lexicography; others use it to mean a branch of linguistics pertaining to the inventory of words in a particular language.

It is now widely accepted that lexicography is a scholarly discipline in its own right and not a sub-branch of applied linguistics, as the chief object of study in lexicography is the dictionary (see e.g. Bergenholtz/Nielsen/Tarp 2009).


Practical lexicographic work involves several activities, and the compilation of really crafted dictionaries require careful consideration of all or some of the following aspects:

• Profiling the intended users (i.e. linguistic and non-linguistic competences) and identifying their needs

• Defining the communicative and cognitive functions of the dictionary

• Selecting and organizing the components of the dictionary

• Choosing the appropriate structures for presenting the data in the dictionary (i.e. frame structure, distribution structure, macro-structure, micro-structure and cross-reference structure)

• Selecting words and affixes for systematization as entries

• Selecting collocations, phrases and examples

• Choosing lemma forms for each word or part of word to be lemmatized

• Defining words

• Organizing definitions

• Specifying pronunciations of words

• Labeling definitions and pronunciations for register and dialect, where appropriate

• Selecting equivalents in bi- and polylingual dictionaries

• Translating collocations, phrases and examples in bi- and polylingual dictionaries

• Designing the best way in which users can access the data in printed and electronic dictionaries

An overall focus point is to keep the lexicographic information costs incurred by dictionary users as low as possible. Nielsen (2008) suggests relevant aspects for lexicographers to consider when making dictionaries as they all affect the users' impression and actual use of specific dictionaries.

Theoretical lexicography (or metalexicography) concerns the same aspects, but is meant to lead to the development of principles that can improve the quality of future dictionaries, for instance in terms of access to data and lexicographic information costs. Several perspectives or branches of such academic dictionary research have been distinguished: 'dictionary criticism' (or evaluating the quality of one or more dictionaries, e.g. by means of reviews), 'dictionary history' (or tracing the traditions of a type of dictionary or of lexicography in a particular country or language), 'dictionary typology' (or classifying the various genres of reference works, such as dictionary versus encyclopedia, monolingual versus bilingual dictionary, general versus technical or pedagogical dictionary), 'dictionary structure' (or formatting the various ways in which the information is presented in a dictionary), 'dictionary use' (or observing the reference acts and skills of dictionary users), and 'dictionary IT' (or applying computer aids to the process of dictionary compilation).

One important consideration is the status of 'bilingual lexicography', or the compilation and use of the bilingual dictionary in all its aspects. In spite of a relatively long history of this dictionary type, it is often said to be less developed in a number of respects than its monolingual counterpart, especially in cases where one of the languages involved is not a major language. Not all genres of reference works are available in interlingual versions, e.g. LSP, learners' and encyclopedic types, although sometimes these challenges produce new subtypes, e.g. 'semi-bilingual' or 'bilingualised' dictionaries like Hornby's (Oxford) Advanced Learner's Dictionary English-Chinese, which have been developed by translating existing monolingual dictionaries (see Marello 1998).

UKT: End of Wikipedia article.

syntactic category

From Wikipedia: http://en.wikipedia.org/wiki/Syntactic_category 090817

A syntactic category is either:

• a phrasal category, such as noun phrase or verb phrase, which can be decomposed into smaller syntactic categories, or
• a lexical category, such as noun or verb, which cannot be further decomposed.

The three criteria used in defining syntactic categories are:

1. The type of meaning it expresses
2. The type of affixes it takes
3. The structure in which it occurs

In terms of phrase structure rules, phrasal categories can occur to the left side of the arrow while lexical categories cannot.

The lexical categories are traditionally called the parts of speech. They include nouns, verbs, adjectives, and so on.

UKT: End of Wikipedia article.

