Browsed by
Month: June 2019

Building a Lexicon

Building a Lexicon

Currently in my constructed-language work for The Curse of Steel, I’m selecting word roots from my script-generated list of all the legal possibilities.

I’m not being particularly systematic here. I started with the roots for several names I had already settled on during early development, and from an earlier word-list that I built before I started getting my computer to help out with all this. (Along the way, I discovered that I had broken some of my own rules about legal word-root formation. Time to make minor tweaks to the word-lists!)

With that finished, I’ve been grabbing words from a variety of sources: color terms, the numerals from one to ten, and so on. I’ve even pulled down my copy of the Silmarillion and started paging through the appendices for ideas – that’s kind of a ready-made list of vocabulary prompts for any naming language! Not that I’m slavishly imitating any one source, but if my final lexicon ends up sounding vaguely Indo-European and vaguely like Sindarin, I suppose I can be accused of stealing from the best.

So far I’ve got about 80 word-roots. The list follows, taken straight from my growing spreadsheet. A couple of notes first.

You’ll notice the word roots incorporate some numerals and special characters. Those are meant to represent some phonemes that would normally be expressed with more than one character. That way, when I pull them over to be processed by another Perl script, I won’t have to fuss too much with parsing those out. If you know anything about PIE phonology, you’ll probably recognize that I’m using a similar set of three “laryngeal” consonants, that will disappear from daughter languages but give rise to a variety of vowel colorations. Other special characters represent aspirated or labialized consonants (e.g., representing the differences among phonemes we might pronounce as g-, gh-, or gw-).

Meanwhile, every word root has a “weight” attached. This is something I built into the script to generate the word roots, to enforce some assumptions about which phonemes are most common.

Ur-Language RootWeightPart of SpeechMeaningNotes
re2n567AdverbParticle for future aspect of verbs
we2489AdverbParticle to indicate negation of verbs
2sper352Adverb“away”
te2n440Conjunction“and”
rey540Noun“chieftain, noble, king”
d2en440Noun“man,” also numeral “ten”
we@420Noun“water”
ke2m392Noun“hand,” also numeral “five”
kest392Noun“head”
@e2n378Noun“tree”
2eng378Noun“iron”Probably borrowed from another language group
%en360Noun“girl, woman”
1kwes313Noun“lake, pond, pool”
me2r@302Noun“fate, doom”
$2er252Noun“home, dwelling”
ke3lm196Noun“hill, knoll, rock”
ye1480Numeral“one”
kens1403Numeral“seven”
tre1s403Numeral“three”
2tes392Numeral“two”
semt1358Numeral“six”
we2rs352Numeral“four”
let3244Numeral“eight”
pen@3189Numeral“nine”
weytN/AVerb“to know, to see (visions)”Not a legal ur-language root, probably borrowed from another language group
1es640Verb“to be” (indicating a state of being)
ken630Verb“to think, to engage in spiritual activity”
ret630Verb“to guard, to protect”
wer630Verb“to die”
ne2r567Verb“to be glorious, to be brilliant”
tren567Verb“to be stiff, to be taut, to be mighty”
mew560Verb“to partition”
re@540Verb“to hit, to strike”
kres504Verb“to mix up, to confuse”
me2r504Verb“to crowd, to form a crowd”
kel489Verb“to be cold, to be chilly”
nek2441Verb“to strip away, to expose”
pret441Verb“to exchange”
terk441Verb“to break”
t2er440Verb“to crash, to smite”
dren2396Verb“to lengthen, to be long”
gre1n388Verb“to sanctify, to make a treaty”
1@em384Verb“to stand”
$er360Verb“to turn”
me3r360Verb“to be large, to be great”
kre2s352Verb“to be black”
ke3350Verb“to bend”
2lew342Verb“to flow (like water)”
kelt342Verb“to hammer, to work with metal”
welk342Verb“to tear”
teym336Verb“to encircle, to finish (a circle)”
de3n315Verb“to give, to receive a gift, to be guest-friends”
dre3315Verb“to have sacred power”
ke3r315Verb“to run”
kre2w308Verb“to make a harsh sound, to croak”
sen2@302Verb“to be old, to be ancient”
2el@293Verb“to be white”
2ewg293Verb“to hear”
te$280Verb“to be wild, to be free”
ske2t274Verb“to hate”
te2lm274Verb“to spread”
#e2n252Verb“to go, to walk”
ke3rs252Verb“to stand tall, to tower”
wer#252Verb“to threaten”
de3w244Verb“to be dark (in color)”
k3el244Verb“to be whole, to be unmarred”
kwe3244Verb“to be loyal”
le3k244Verb“to burn, to set aflame”
we3k244Verb“to speak, to call”
kle2w240Verb“to cut, to slice”
g2els235Verb“to be green”
ske2@235Verb“to darken”
de1#224Verb“to take”
@er#216Verb“to bite”
1rew#201Verb“to be red”
te2$196Verb“to hurt, to harm”
3re$180Verb“to straighten, to direct”
$2ey168Verb“to be blue”
$eyt168Verb“to be white”
de!140Verb“to divide”

I think I’ll probably generate a few dozen more roots, then copy them into a separate spreadsheet where I’ll build actual words. Most of the roots will make perfectly good words without modification, but I’ll also apply some of the word morphology rules I’ve worked out to derive more words. I imagine I’ll have as many as 200-250 words by the time I’m done, enough to form the basis for a decent naming language. Then to build Perl scripts to apply the sound-change rules.

Once that’s done – no doubt with a certain amount of tweaking to suit my aesthetic tastes – I’ll have a system by which I can quickly create and record new words as I write the story. In three different, but clearly related, languages!

Lots of work up front, to save a lot of work and frustration later. That’s what computers are for, right?

Rough Draft for an Ur-Language

Rough Draft for an Ur-Language

Here are some of the basic notes I’ve put together for my constructed-language work for The Curse of Steel. The idea here is that this is an ur-language, very vaguely reminiscent of Proto-Indo-European, which can act as the mother-tongue for a set of derived languages. Since these aren’t planned to be anything but a set of naming languages, I haven’t worked out a lot of deep grammar or sentence structure – the emphasis here is on word morphology, the rules for the formation of nouns, verbs, adjectives, and so on.

This is all very rough draft, of course, and I’m deliberately not trying to be very adventurous – none of this is supposed to suggest a highly exotic sound or feel to English-speaking readers. Still, it should give you an idea of what goes into the construction of an artificial language for genre fiction. I may post some of my growing lexica shortly, to provide more examples.

Phonology

The ur-language has the following consonant set:

  Labial Coronal Dorsal Laryngeal
Nasals *m *n    
Stops *p *t *k, *kw  
*b *d *g, *gw
*bh *dh *gh, *gwh
Fricatives   *s   *h1, *h2, *h3
Liquids   *r, *l    
Semivowels *w   *y  

Word roots in the ur-language can be either nouns or verbs. Most adjectives or adverbs are constructed by inflection of an underlying stative verb (that is, a verb form which expresses a state of being). A word root in the ur-language almost invariably has the following phonotactic structure:

  • The root is always composed of at least one consonant in the onset, the vowel *e, and at least one consonant in the coda. No root may begin or end with the vowel.
  • In a consonant cluster, the consonants are always arranged in order of sonority. Consonants appear in three classes by sonority (lower to higher sonority):
    • Obstruents, which include:
      • Plosives (*p, *b, *bh, *t, *d, *dh, *k, *g, *gh, *kw, *gw, or *gwh)
      • Sibilants (*s)
      • Laryngeals (*h1, *h2, or *h3)
    • Labial sonorants (*m or *w)
    • Non-labial sonorants (*n, *r, *l, or *y)
  • A consonant cluster may consist of up to one non-labial sonorant, up to one labial sonorant, and up to one obstruent from each class.
  • In a cluster of obstruents, the sibilant *s may only appear before a plosive, never after. A laryngeal may appear before or after any other obstruent, but not another laryngeal.
  • In the onset (before the vowel), consonants must appear in increasing sonority, while in the coda (after the vowel) they must appear in decreasing sonority. The one exception is that in the coda, a laryngeal may always appear first.
  • Legal word roots normally follow certain phonotactic rules:
    • They may not contain more than one nasal consonant (*m or *n)
    • They may not contain more than one liquid (*l or *r)
    • They may not contain more than one semivowel (*w or *y)
    • They may not contain more than one plain voiced plosive (*b, *d, *g, or *gw)
    • They may not contain more than one laryngeal fricative (*h1, *h2, or *h3)

Word Formation Rules

Verbs

The primary categories for verbs include:

  • Person: 1st, 2nd, and 3rd.
  • Number: Singular, dual, and plural.
  • Aspect: Perfective, imperfective, and stative.
  • Mood: Indicative, subjunctive, imperative, and optative.
  • Tense: Present and past.

The primary conjugations are for person, number, and aspect. They tend to be very regular, applying inflectional endings to the verb root as follows.

Primary Conjugation

This conjugation is used for the present tense of the indicative mood of imperfective verbs, and for the subjunctive mood of all verbs.

  Singular Dual Plural
1st Person *-mi *-weh3s *-mos
2nd Person *-si *-tes *-te
3rd Person *-ti *-teh2s *-nti

The future tense is indicated with this conjugation, and the particle *reh2n placed just before the verb.

Secondary Conjugation

This conjugation is used for the past tense of the indicative mood of imperfective verbs, for the indicative mood of perfective verbs, and for the optative mood of all verbs.

  Singular Dual Plural
1st Person *-m *-we *-me
2nd Person *-s *-te *-t
3rd Person *-t *-teh2 *-nt

Furthermore, imperfective verbs in the past tense exhibit ablaut, in which the primary vowel of the verb root shifts from *e to *o.

Stative Conjugation

This conjugation is used for stative verbs.

  Singular Dual Plural
1st Person *-h2e *-we *-meh2
2nd Person *-th2e *-h2ey *-eh2
3rd Person *-e *-h2ey *-eh1r

Imperative Mood

This conjugation (applicable only in the second or third person) is used for the imperative mood of all verbs.

  Singular Dual Plural
1st Person N/A N/A N/A
2nd Person *-Ø *-to *-te
3rd Person *-tu *-tew *-ntu

Other Verb Formation Notes

Negation is indicated with the particle *weh2 immediately after the main verb.

Nouns

The primary categories for nouns are:

  • Class: Animate and inanimate.
  • Number: Singular and plural. Although verbs can take the specific dual number, dual nouns are simply considered plural.
  • Case:
    • Absolutive case (the argument of an intransitive verb or the object of a transitive verb)
    • Ergative case (the subject or “agent” of a transitive verb)
    • Dative case (the indirect object of a verb, the recipient or beneficiary of an action)
    • Genitive case (the possessor, composition, or point of reference for another noun)
    • Locative case (expressing the location of another noun or a verb’s action)
    • Ablative case (expressing motion or action away from another noun)
    • Instrumental case (expressing the means of an action)
    • Vocative case (marking the noun being addressed)

Noun class is not marked on the noun, but all nouns are assigned to either the animate or inanimate classes. The assignment is usually intuitive, although there are some exceptions. Examples include non-living but moving objects which might be considered the habitation place of a spirit, or non-living objects which are nevertheless often addressed as if they possess the power of speech.

Case and number markings are as follows:

SingularPlural
  Animate Inanimate Animate Inanimate
Absolutive *-Ø *-s *-eh1 *-eh1
Ergative *-m *-m *-meh1 *-meh1
Dative *-meh2 *-meh2 *-mus *-mus
Genitive *-kh2e *-kh3e *-kh2ey *-kh2ey
Locative *-ey *-ey *-su *-su
Ablative *-os *-os *-yos *-yos
Instrumental *-an *-an *-eh2 *-eh2
Vocative *-Ø *-Ø *-es *-h2

Nouns in the ergative case also exhibit ablaut, in which the primary vowel of the nominal root shifts from *e to *o.

Noun Formation from Verb Roots

Many nouns in the ur-language are formed from verb roots, usually by applying a specific suffix to the root. For example:

  • Animate creature or human that performs X: *X-as
  • Inanimate object or thing that performs X: *X-os
  • Gerund form (“X-ing”): *X-en
  • Infinitive form (“to X”): *X-on
  • The result of X: *X-am or *X-as

Common Prefixes

  • *an- “into”
  • *as- “out, out from”
  • *en- “on, upon”
  • *reh3 “good, noble”
  • *tar– “against”
  • *wer- “over”
A Bit of Conlanging

A Bit of Conlanging

I’ve set aside plot-work for The Curse of Steel for the moment, so I can once and for all get the constructed-language work for that story knocked out. The idea is that my protagonist is going to encounter not only her own culture but several others as well, most of them somewhat related to her own in linguistic terms. Kind of like an Iron-Age Celt visiting Latin-speaking or Greek-speaking areas; the languages wouldn’t be intelligible to her, but names and some bits of vocabulary would sound hauntingly familiar. I’m also aiming for the reader to feel comfortable with the names they find in the story, which suggests not wandering too far from the Indo-European tree.

The procedure I’m working is to develop a partial constructed language that’s somewhat reminiscent of Proto-Indo-European (PIE), and then to apply a consistent set of sound-change laws and grammatical changes to generate words in two or three daughter languages.

Not at all difficult, especially once I’ve developed some computer tools to automate the process, but it is kind of detail-driven and time-consuming. The biggest potential pitfall is trying to imitate PIE too closely. One thing we do know about the reconstructed PIE language is that it made Classical Greek or Sanskrit look simple in comparison. My constructed languages for this project are going to be a lot less fiddly and complex. They’re just going to be naming languages, for the most part, so I don’t need to have a bunch of linguistic complication, I just need to be able to hint at it in a plausible manner.

So far I’ve developed a tool (a Perl script of about 120 lines) to generate all the “legal” word roots in the ur-language (about 150,000 of them, more than I’ll ever use). I’ve dumped all of those into an Excel spreadsheet which now serves as my master list for future lexicon-building.

I’m currently working on a partial description of the ur-language, with special attention to morphology: just how do you form verbs or nouns from word roots, how do the verbs conjugate, how do the nouns decline, and so on. As soon as that’s more or less finished, I’ll be building another Perl script (or maybe two) to automatically generate verb conjugations or noun declensions as needed.

The last step will involve developing two or three sets of sound-change laws, so I can take completed words in the ur-language and create daughter-language words from them. Another Perl script for that, I think.

Once all this work is done, my constructed-language workflow will get a lot simpler, and hopefully more consistent. Do I need a name or a bit of exotic vocabulary? Build a word in the ur-language, by selecting a legal root from the list and applying the defined morphology rules. Then run that through the sound-change script to generate final lexicon entries. Everything goes into a set of Excel spreadsheets, so I can sort and massage the results as needed. If I ever feel like tweaking the structure a bit, it becomes easy to modify the scripts and re-run everything.

I find when working on world-building, having the technical knowledge necessary to produce plausible detail isn’t the most important thing. That part is relatively easy. The hard part is scoping the task so that you can plan it out from start to finish, and then having the discipline to finish the task and move on. World-building is a neat hobby. If your goal is to actually write stories, you can’t permit yourself to get buried in the world-building. As I’ve learned to my cost.

“Fermi’s Nightmare” Article Now Available on Sharrukin’s Worlds

“Fermi’s Nightmare” Article Now Available on Sharrukin’s Worlds

One of the few blog entries I’ve ever written that I thought was worth preserving was titled “Fermi’s Nightmare.” This was a brief examination of a corollary to a well-known observation made by Enrico Fermi back in the 1950s. For the last few years, that’s been hosted over at the Sharrukin’s Archive site. As of today, I’ve moved it into a static page on this blog. It should be visible in the Pages sidebar on the right.

At this point, the only thing still sitting at Sharrukin’s Archive that isn’t available anywhere else is some draft material for the Human Destiny setting. Fairly soon, I may either move that content over here, or simply decide to take it offline until I do some redesign of the setting. To be honest, there are things about the current concept that have me seriously blocked – I’ve been struggling for a couple of years to produce more stories for it than the one I’ve published.

Either way, expect the Sharrukin’s Archive site to come down entirely as soon as I’ve figured out what to do with the remaining material.

Status Report (31 May 2019)

Status Report (31 May 2019)

My main project at present continues to be the development of plot for the novel The Curse of Steel. That’s moving along at a reasonable pace. Off to the side of that task, though, I recently found myself struggling with a different obstacle.

The Curse of Steel is going to be a bit of pseudo-historical fantasy, set in an alternate Iron Age world rather like Tolkien’s Middle-earth or Robert Howard’s Hyborian Era. Part of the project has involved the development of a small set of partial “constructed languages,” mostly for the derivation of names and a few scraps of vocabulary to act as cultural markers.

The process I’ve been using has been to develop an ur-language that somewhat resembles a simplified version of Proto-Indo-European (PIE). I then apply sets of sound-change rules to develop words in my planned collection of daughter languages. The result should be consistent and pleasing to the ear, even if it doesn’t work as a complete conversational language. All of this is fairly routine.

The problem has been that I’ve been doing all of this by hand, and the project has gotten large enough that I can’t keep it all straight in my head. I can’t always remember which word-roots I’ve already used, and the documents I’m using to record them aren’t exactly user-friendly. Meanwhile, whenever I tweak the rules for word formation or sound-changes, I find I’m not applying the tweaks consistently. I’m getting snarled up.

Okay, I realized a while ago, a lot of this would really be better done by a computer. Computers are great at tedious tasks that involve applying procedures consistently across a lot of data. Couldn’t I find a tool that I could use to keep track of my word-roots, record my expanding vocabulary, apply inflectional rules and sound changes, all of that?

So I went looking for software tools that other people had used for language construction. Unfortunately, I didn’t find anything I thought would be useful . . . but earlier this week I realized I had all the tools I needed to build my own.

Years ago I was a professional coder. I did most of my work in the C language and a UNIX environment, but I also taught myself a language called Perl, which is ideal for processing text strings and applying well-defined procedures to them. Couldn’t I build Perl scripts to generate all the possible word roots, apply inflectional rules to them, develop daughter-language vocabulary?

Okay, it’s probably been almost twenty years since I wrote a lot of Perl, but I still have my books, and coding is a little like riding a bicycle. You never entirely forget the skill once you have it. Meanwhile, there exists a nice free implementation of Perl for the Windows environment (Strawberry Perl). So over the last few days, I’ve been starting to build a Perl library that I can use to manage the language construction task – at least well enough to get past the immediate obstacle.

Early results are promising. I’ve just about got a script written and debugged, which will generate all the word roots in my ur-language, according to the PIE-like structure I’ve designed. Once that’s finished and I pull the output, I’ll dump that into an Excel spreadsheet where I can record the meanings I select for different roots. Then I should be able to put together another script that will apply the sound-change rules I’ve designed.

I may show off some of the results of this work over the next few days. Once this side project is done, the conlang process shouldn’t get in the way so badly. If I need a name or a piece of vocabulary, I will be able to generate it quickly, record it, and get right back to writing story.

Not to mention, it’s kind of neat to be writing code again. It’s been a while.