Word formation


A collection by U Kyaw Tun (UKT) (M.S., I.P.S.T., USA) and staff of Tun Institute of Learning (TIL)

Word formation
  Semantic classification
  Formal classification
  Examples from different languages
  Recent trends
Compounding by language


UKT notes
null morpheme
word stem


Word formation

From Wikipedia: http://en.wikipedia.org/wiki/Word_formation 090820

In linguistics, word formation is the creation of a new word. Word formation is sometimes contrasted with semantic change, which is a change in a single word's meaning. The line between word formation and semantic change is sometimes a bit blurry; what one person views as a new use of an old word, another person might view as a new word derived from an old one and identical to it in form; see Conversion (linguistics). Word formation can also be contrasted with the formation of idiomatic expressions, though sometimes words can form from multi-word phrases; see Compound (linguistics) and Incorporation (linguistics).

A similar concept is Derivation.

UKT: End of Wikipedia article.

Derivation (linguistics)

by UKT based on Wikipedia: http://en.wikipedia.org/wiki/Derivation 090825

In linguistics, derivation is "used to form new words":

• happy --> happi-ness , un-happy from happy  
• determine --> determination

A contrast is intended with the process of inflection, which uses another kind of affix in order to form variants of the same word:

• determine --> determine-s ; determin-ing ; determin-ed  [1]

(UKT: dictionary entry shows 'inflection')
¤ <DETERMINE> - <determine-s>; <determin-ing>; <determin-ed>, See below how AHTD enters the word with <de.ter.mine> as the head word.

de·ter·mine v. de·ter·mined de·ter·min·ing de·ter·mines v. tr. 1. a. To decide or settle (a dispute, for example) conclusively and authoritatively. b. To end or decide, as by judicial action. 2. To establish or ascertain definitely, as after consideration, investigation, or calculation. See note at discover . 3. To cause (someone) to come to a conclusion or resolution. 4. To be the cause of; regulate: Demand determines production. 5. To give direction to: The management committee determines departmental policy. 6. To limit in scope or extent. 7. Mathematics To fix or define the position, form, or configuration of. 8. Logic To explain or limit by adding differences. 9. Law To put an end to; terminate. v. intr. 1. To reach a decision; resolve. See note at decide . 2. Law To come to an end. [Middle English determinen from Old French determiner from Latin dētermināre to limit - de- terminus boundary] -- AHTD

A derivational suffix usually applies to words of one syntactic category and changes them into words of another syntactic category. For example, the English derivational suffix -ly changes adjectives into adverbs (<slow > → <slowly>).

Some examples of English derivational suffixes:

• adjective-to-noun:

• adjective-to-verb:
(modernmodernise) in British English or
(archaicarchaicize) in American English and Oxford spelling

• noun-to-adjective:

• noun-to-verb:

• verb-to-adjective:

• verb-to-noun (abstract):

• verb-to-noun (concrete):
  --er (write-writer)

Although derivational affixes [suffix] do not necessarily modify the syntactic category, they modify the meaning of the base. In many cases, derivational affixes change both the syntactic category and the meaning:

modernmodernize ("to make modern").

The modification of meaning is sometimes predictable:
Adjective + ness
the state of being (Adjective);
¤ white whiteness

A prefix (writere-write; lordover-lord) will rarely change syntactic category in English. [UKT ¶ ]

The derivational prefixes un- and en- (em- before labials )

• applies to adjectives :

• applies to some verbs :

• but applies rarely nouns.

A few exceptions are the prefixes en- and be- . En- (em- before labials) is usually used as a transitive marker on verbs, but can also be applied to adjectives and nouns to form transitive verb: circle (verb) → encircle (verb); but rich (adj) → enrich (verb), large (adj) → enlarge (verb), rapture (noun) → enrapture (verb), slave (noun) → enslave (verb).

Note that derivational affixes are bound morphemes. In that, derivation differs from compounding, by which free morphemes are combined (lawsuit, Latin professor). It also differs from inflection in that inflection does not change a word's syntactic category and creates not new lexemes but new word forms (tabletables; openopened).

Derivation may occur without any change of form, for example telephone (noun) and to telephone. This is known as conversion or zero derivation. Some linguists consider that when a word's syntactic category is changed without any change of form, a null morpheme is being affixed.

UKT: End of Wikipedia article http://en.wikipedia.org/wiki/Derivation 090825.

Compound (linguistics)

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

In linguistics, a compound is a lexeme (less precisely, a word) that consists of more than one stem. [UKT ¶ ]

Compounding or composition is the word-formation that creates compound lexemes (the other word-formation process being derivation). Compounding or Word-compounding refers to the faculty and device of language to form new words by combining or putting together old words. In other words, compound, compounding or word-compounding occurs when a person attaches two or more words together to make them one word. The meanings of the words interrelate in such a way that a new meaning comes out which is very different from the meanings of the words in isolation.

Colloquial or everyday examples of compounds are <fireman> and <hardware>. Someone who believes that nothing he does has a good result might be called a <never-go-well> person. We combine the words <never>, <go> and <well> to form an adjectival compound. This process of birth and death of words is going on all the time.

Formation of compounds

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

Compound formation rules vary widely across language types.

In a more synthetic language [UKT: inflectional language], the relationship between the elements of a compound may be marked (?). [UKT ¶ ]


¤ the German compound: Kapitänspatent consists of the lexemes Kapitän (sea captain) and Patent (license) joined by an -s- (originally a genitive case suffix); and similarly,

¤ the Latin lexeme: paterfamilias contains the (archaic) genitive form familias of the lexeme familia (family).

Conversely, in the Hebrew language compound, the word בֵּית סֵפֶר bet sefer (school), it is the head that is modified: the compound literally means "house-of book", with בַּיִת bayit (house) having entered the construct state to become בֵּית bet (house-of). This latter pattern is common throughout the Semitic languages, though in some it is combined with an explicit genitive case, such that both parts of the compound are marked.

Agglutinative languages tend to create very long words with derivational morphemes. Compounds may or may not require the use of derivational morphemes also. The well-known Japanese compound 神風 kamikaze consists only of the nouns kami  'god, spirit' and kaze 'wind'. The longest compounds in the world may be found in Finnish and Germanic languages, such as German. Extremely long compound words can be found in the language of chemical compounds, where in the cases of biochemistry and polymers, they can be practically unlimited in length. German examples include Farbfernsehgerät (color television set), Kernbetriebenerfernlenkgeschosskreuzer (nuclear-powered guided-missile cruiser), and the jocular word Donaudampfschifffahrtsgesellschaftskapitänsmütze (Danube steamboat shipping company Captain's hat).

An example in Finnish is lentokonesuihkuturbiinimoottoriapumekaanikkoaliupseerioppilas, supposedly the longest actually used word in Finnish. In theory, even longer compounds are possible, but they are usually not found in actual discourse in Finnish.

Compounds can be rather long when translating technical documents from English to some other language, for example, Swedish. "Motion estimation search range settings" can be directly translated to rörelseuppskattningssökintervallsinställningar; the length of the words are theoretically unlimited, especially in chemical terminology.

Semantic classification

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

A common semantic classification of compounds yields four types:

• endocentric
• exocentric (also bahuvrihi)
• copulative (also dvandva)
• appositional

An endocentric compound consists of a head, i.e. the categorical part that contains the basic meaning of the whole compound, and modifiers, which restrict this meaning. For example, the English compound <doghouse>, where <house> is the head and <dog> is the modifier, is understood as a house intended for a dog. Endocentric compounds tend to be of the same part of speech (word class) as their head, as in the case of <doghouse>. (Such compounds were called tatpuruṣa in the Sanskrit tradition.)

Exocentric compounds (called a bahuvrihi compound in the Sanskrit tradition) do not have a head, and their meaning often cannot be transparently guessed from its constituent parts. For example, the English compound <white-collar> is neither a kind of collar nor a white thing. In an exocentric compound, the word class is determined lexically, disregarding the class of the constituents. For example, a <must-have> is not a verb but a noun. The meaning of this type of compound can be glossed as "(one) whose B is A", where B is the second element of the compound and A the first. A bahuvrihi compound is one whose nature is expressed by neither of the words: thus a <white-collar> person is neither white nor a collar (the collar's colour is a metaphor for socioeconomic status). Other English examples include <barefoot> and <Blackbeard>.

Copulative compounds are compounds which have two semantic heads.

Appositional compounds refer to lexemes that have two (contrary) attributes which classify the compound.


Formal classification

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

UKT: I have inserted examples from Burmese-Myanmar for comparing to English (which I know very well) and/or to languages I am familiar such as French and Hindi. A native Burmese-Myanmar speaker may argue that the compound formation is not necessary.

Noun-noun compounds

Most natural languages have compound nouns. The positioning of the language, i. e. the most common order of constituents in phrases where nouns are modified by adjectives, by possessors, by other nouns, etc. While Germanic languages, for example, are left-branching when it comes to noun phrases (the modifiers come before the head), the Romance languages are usually right-branching.

In French, compound nouns are often formed by left-hand heads with prepositional components inserted before the modifier, as in chemin-de-fer 'railway' lit. 'road of iron' and moulin à vent 'windmill', lit. 'mill (that works)-by-means-of wind'.

UKT: A Burmese-Myanmar example after the French:
  ¤ {lé-ra.hût} , lit. "air {ra.hût}" - compound formed because no white space inserted as given in MEDict-433
  ¤ {lé ra.hût} , lit. " air (white space) {ra.hût}" - no compound formation
Dictionary meaning of {ra.hût} :
  {ra.hût} - n. 1 something rotating on it axis 2. weaving  swift rotating on a horizontal axis. cf. {hkya:} 3. rotor 4. ferris wheel (Hindi: {ra.hûT}) -- MEDict-390

In Turkish, one way of forming compound nouns is as follows: yeldeğirmeni ‘windmill’ (yel: wind, değirmen-i: mill-possessive); demiryolu 'railway' (demir: iron, yol-u: road-possessive).


Verb-noun compounds

A type of compound that is fairly common in the Indo-European languages is formed of a verb and its object, and in effect transforms a simple verbal clause into a noun.

In Spanish, for example, such compounds consist of a verb conjugated for third person singular, present tense, indicative mood followed by a noun (usually plural): e.g., rascacielos (modelled on "skyscraper", lit. 'scratches skies'), sacacorchos 'corkscrew', lit. 'removes corks'). These compounds are formally invariable in the plural (but in many cases they have been reanalyzed as plural forms, and a singular form has appeared). French and Italian have these same compounds with the noun in the singular form: Italian grattacielo 'skyscraper', French grille-pain 'toaster', lit. 'toasts bread', and torche-cul 'ass-wipe' (Rabelais: See his "propos torcheculatifs").

This construction exists in English, generally with the verb and noun both in uninflected form: examples are: <spoilsport>, <killjoy>, <breakfast>, <cutthroat>, <pickpocket>, <dreadnought>, and <know-nothing>.

Also common in English is another type of verb-noun (or noun-verb) compound, in which an argument of the verb is incorporated into the verb, which is then usually turned into a gerund, such as <breastfeeding>, <finger-pointing>, etc. The noun is often an instrumental complement. From these gerunds new verbs can be made: <(a mother) breastfeeds (a child)> and from them new compounds <mother-child breastfeeding>, etc.

UKT: A neat example after English:
¤ {lak-Ñho:hto:} , lit. "finger pointing" -- MEDict-441
Note: The correct spelling for 'finger' is with a {ha.hto:}, and {lak-Ño:hto:} is incorrect.

In the Australian Aboriginal language Jingulu, (a Pama-Nyungan language), it is claimed that all verbs are V+N compounds, such as "do a sleep", or "run a dive", and the language has only three basic verbs: do, make, and run.

A special kind of composition is incorporation, of which noun incorporation into a verbal root (as in English <backstabbing>, <breastfeed>, etc.) is most prevalent (see below).


Verb-verb compounds

Verb-verb compounds are sequences of more than one verb acting together to determine clause structure. They have two types:

• In a serial verb, two actions, often sequential, are expressed in a single clause. For example:

Ewe [E·we language] trɔ dzo, lit. "turn leave", means "turn and leave", and

Hindi :
  ¤ जाकर देखो jā-kar dekh-o, lit. "go-CONJUNCTIVE PARTICIPLE see-IMPERATIVE", means "go and see".

Burmese-Myanmar: (UKT: Bur-Myan example based on Hindi - waiting for comments from my peers.)
  ¤ {þwa:kræÑ.}, lit. "go see"
  ¤ {þwa:pri: kræÑ.}, lit. "go and see" (with white space) for emphasis
  - {þwa:pri:kræÑ.}, same as above, but without white space would be lacking in emphasis
Note: Reversing the order of the two verbs gives a different meaning: {kræÑ.þwa:} meaning "go with caution"

In each case, the two verbs together determine the semantics and argument structure.

Serial verb expressions in English may include <What did you go and do that for?>, or <He just upped and left> ; this is however not quite a true compound since they are connected by a conjunction and the second missing arguments may be taken as a case of ellipsis.

• In a compound verb (or complex predicate), one of the verbs is the primary [verb], and determines the primary semantics and also the argument structure. The secondary verb, often called a vector verb or explicator, provides fine distinctions, usually in temporality or aspect, and also carries the inflection (tense and/or agreement markers). The main verb usually appears in conjunctive participial (sometimes zero) form. [UKT ¶ ]


  ¤ निकल गया nikal gayā, lit. "exit went", means 'went out',
  ¤ निकल पड़ा nikal paRā, lit. "exit fell", means 'departed' or 'was blurted out'.
In these examples, the primary verb is निकल nikal , and the vector verbs are गया gayā and पड़ा paRā  .

English :
  ¤ <start reading> 
Burmese :
  ¤ {sa.hpût} , lit. "start read"
  ¤ 読み始める yomihajimeru "start-CONJUNCTIVE-read" "start reading,"
In these examples, the vector verbs are start and 始める hajimeru "start" change according to tense, negation, and the like, while the main verbs reading and 読み yomi "reading" usually remain the same.

An exception to this is the passive voice, in which both English and Japanese modify the main verb, i.e.
¤ <start to be read> and
¤ 読まれ始める yomarehajimeru lit. "read-PASSIVE-(CONJUNCTIVE)-start" start to be read. [UKT ¶ ]

With a few exceptions all compound verbs alternate with their simple counterparts. That is, removing the vector does not affect grammaticality at all nor the meaning very much: निकला nikalā '(He) went out.' In a few languages both components of the compound verb can be finite forms: Kurukh kecc-ar ker-ar lit. "died-3pl went-3pl" '(They) died.'

• Compound verbs are very common in some languages, such as the northern Indo-Aryan languages Hindi-Urdu and Panjabi where as many as 20% of verb forms in running text are compound. They exist but are less common in Dravidian languages and in other Indo-Aryan languages like Marathi, and Nepali, in Tibeto-Burman languages like Limbu and Newari, in potentially macro-Altaic languages like Turkish, Korean, Japanese, Kazakh, Uzbek, and Kyrgyz, and in northeast Caucasian languages like Tsez and Avar.

UKT: Where does Burmese stand? We should note that Burmese-Myanmar is the member of Tibeto-Burman with the most speakers. My guess (as of 090821) is whether compounds are formed or not is not really very important in Burmese.

• Under the influence of a Quichua substrate speakers living in the Ecuadorian altiplano have innovated compound verbs in Spanish:

De rabia puso rompiendo la olla, 'In anger (he/she) smashed the pot.' (Lit. from anger put breaking the pot)

Botaremos matándote 'We will kill you.' (Cf. Quichua huañuchi-shpa shitashun, lit. kill-CP throw.1plFut, तेरे को मार डालेंगे)

• Compound verb equivalents in English (examples from the internet):

What did you go and do that for?
If you are not giving away free information on your web site then a huge proportion of your business is just upping and leaving.
Big Pig, she took and built herself a house out of brush.

• Caution: In descriptions of Persian and other Iranian languages the term 'compound verb' refers to noun-plus-verb compounds, not to the verb-verb compounds discussed here.


Compound adpositions

Compound prepositions formed by prepositions and nouns are common in English and the Romance languages (consider English on top of, Spanish encima de, etc.). Japanese shows the same pattern, except the word order is the opposite (with postpositions): no naka (lit. "of inside", i.e. "on the inside of").

Examples from different languages

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820


Ciencia-ficción 'science fiction': ciencia, 'science', + ficción, 'fiction' (This word is a calque from the English expression science fiction. In English, the head of a compound word is the last morpheme: science fiction. Conversely, the Spanish head is located at the front, so ciencia ficción sounds like a kind of fictional science rather than scientific fiction.)
Ciempiés 'centipede': cien 'hundred', + pies 'feet'
Ferrocarril 'railway': ferro 'iron', + carril 'lane'


Millepiedi 'centipede': mille 'thousand', + piedi 'feet'
Ferrovia 'railway': ferro 'iron', + via 'way'
Tergicristallo 'windscreen wiper': tergere 'to wash', + cristallo 'crystal, (pane of) glass'


Wolkenkratzer 'skyscraper': wolken 'clouds', + kratzer 'scraper'
Eisenbahn 'railway': Eisen 'iron', + bahn 'track'
Kraftfahrzeug 'automobile': Kraft 'power', + fahren/fahr 'drive', + zeug 'machinery'
Stacheldraht 'barbed wire': stachel 'barb/barbed', + draht 'wire'


sanakirja 'dictionary': sana 'word', + kirja 'book'
tietokone 'computer': tieto 'knowledge, data', + kone 'machine'
keskiviikko 'Wednesday': keski 'middle', + viikko 'week'
maailma 'world': maa 'land', + ilma 'air'


járnbraut 'railway': járn 'iron', + braut 'path' or 'way'
farartæki 'vehicle': farar 'journey', + tæki 'apparatus'
alfræðiorðabók 'encyclopædia': al 'everything', + fræði 'study' or 'knowledge', + orða 'words', + bók 'book'
símtal 'telephone conversation': sím 'telephone', + tal 'dialogue'


• 目覚まし(時計) mezamashi(dokei) 'alarm clock': 目 me 'eye' + 覚まし samashi (-zamashi) 'awakening (someone)' (+ 時計 tokei (-dokei) clock)
• お好み焼き okonomiyaki: お好み okonomi  'preference' + 焼き yaki 'cooking'
• 日帰り higaeri 'day trip': 日 hi 'day' + 帰り kaeri (-gaeri) 'returning (home)'
• 国会議事堂 kokkaigijidō   'national diet building': 国会 kokkai   'national diet' + 議事 giji 'proceedings' + 堂 'hall'

Russian language

In the Russian language compounding is a common type of word formation, and several types of compounds exist, both in terms of compounded parts of speech and of the way of the formation of a compound. [1]

Compound nouns may be agglutinative compounds, hyphenated compounds (stol-kniga 'folded table' lit. 'table-book', i.e., "book-like table"), or abbreviated compounds (portmanteaux: kolkhoz). Some compounds look like portmanteaux, while in fact they are an agglutinations of type stem + word: " Akademgorodok" (from "akademichesky gorodok" 'Academic Townlet', i.e., Academic Village). In agglutinative compound nouns, an agglutinating infix is typically used: parokhod 'steamship': par + o + khod. Compound nouns may be created as noun + noun, adjective + noun, noun + adjective (rare), noun + verb (or, rather, noun + verbal noun).

Compound adjectives may be formed either per se, e.g., "belo-rozovy" 'white-pink' or as a result of compounding during the derivation of an adjective from a multiword term: Каменноостровский проспект ([kəmʲɪnnʌʌˈstrovskʲɪj prʌˈspʲɛkt]) 'Stone Island Avenue', a street in St.Petersburg.

Reduplication in Russian language is also a source of compounds.

Quite a few Russian words are borrowed from other languages in an already compounded form, including numerous " classical compounds": "avtomobil" (automobile).

Germanic languages

In Germanic languages, compound words are formed by prepending a descriptive word in front of the main word. A good example is "football"; it is a "ball" that has something to do with "foot". Each part may in turn be a compound word, so there is no problem making an arbitrary long word. This contrasts to Romance languages, where prepositions are more used to specify such word relationships instead of concatenating the words.

As a member of the Germanic family of languages, English is special in that compound words are usually written by separating them into their parts. Although English does not form compound nouns to the extent of Dutch or German, such constructions as "Girl Scout troop", "city council member", and "cellar door" are arguably compound nouns and used as such in speech. Writing them as separate words is merely an orthographic convention, possibly a result of influence from French.

A problem with splitting compound words like in English is that the separate parts may make sense as separate words, making it ambiguous. One example is "heavy weight lifter", which is either a "heavy weightlifter", or a "heavyweight-lifter", however the latter two forms are the non-English Germanic way of writing it: In Norwegian, it becomes "tung vektløfter" or "tungvektsløfter" respectively, notice that the interfix "s" replaces the hyphen, making the distinction clearer in speech. Also, compounds are pronounced continuously as one word in at least German and north Germanic languages, whereas English pronunciation may just reflect the way it is written.

Recent trends

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

Although there is no universally agreed-upon guideline regarding the use of compound words in the English language, in recent decades written English has displayed a noticeable trend towards increased use of compounds. Recently, many words have been made by taking syllables of words and compounding them, such as pixel (picture element) and bit (binary digit). This is called a syllabic abbreviation. Moreover, the English way of compounding words is spreading to other languages : There is a trend in Scandinavian languages towards splitting compound words, known in Norwegian as "orddelingsfeil" (word split error). Because the Norwegian language relies heavily on the distinction between the compound word and the sequence of the separate words it consists of, this has dangerous implications. For example "røykfritt" (smokefree, meaning no smoking) has been seen confused with "røyk fritt" (smoke freely).

Compounding by language

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Compound  090820

UKT: The following Wikipedia links are intended for future use.
Classical compounds
English compounds
Sanskrit compounds

End of Wikipedia article http://en.wikipedia.org/wiki/Compound  090820

UKT notes

null morpheme

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Null_morpheme 090911

In morpheme-based morphology, a null morpheme is a morpheme that is realized by a phonologically null affix (an empty string of phonological segments). In simpler terms, a null morpheme is an "invisible" affix. It's also called zero morpheme; the process of adding a null morpheme is called null affixation, null derivation or zero derivation. The concept was first used over two thousand years ago by Pāṇini in his Sanskrit grammar. Some linguists object to the notion of a null morpheme, arguing that it sets up an unverifiable distinction between a "null" or "zero" element, and nothing at all.

UKT: See more in notes in Morphology - morpho.htm

Go back null-morpheme-note-b

word stem

Excerpt from Wikipedia: http://en.wikipedia.org/wiki/Stem 090818

In linguistics, a stem (sometimes also theme) is a part of a word. The term is used with slightly different meanings.

[UKT: The word stem is used in two usages as follows:]

In one usage, a stem is a form to which affixes can be attached. [1] Thus, in this usage, the English word <friendships> contains the stem <friend>, to which the derivational suffix <-ship> is attached to form a new stem <friendship>, to which the inflectional suffix <-s> is attached. In a variant of this usage, the root of the word (in the example, <friend>) is not counted as a stem.

In a slightly different usage, which is adopted in the remainder of this article, a word has a single stem, namely the part of the word that is common to all its inflected variants. [2] Thus, in this usage, all derivational affixes are part of the stem. For example, the stem of <friendships> is <friendship>, to which the inflectional suffix <-s> is attached.

Stems may be roots, e.g. <run>, or they may be morphologically complex, as in compound words (cf. the compound nouns <meat ball> or <bottle opener>) or words with derivational morphemes (cf. the derived verbs <black-en> or <standard-ize>). [UKT ¶ ]

UKT: See more in notes in Words and their meanings in human languages - word.htm

Go back word-stem-note-b

