This post will go over some name generation algorithms. As humans beings we can easily make up names - watch:
Bob, Orthur, Sonstig, Rachon
How can we make the computer do something similar?
1. A Really Big List
Open notepad, start typing names. At the very least we'll want two lists one for female names and one for male. (You may also be able to
grab lists off the web. Baby naming
sites seem to be abundant and full of oddness that would go well in most games York Hawkright)
When you create a NPC you check gender, choose the correct list then randomly choose an entry.
Advantages
- Names will always be sensible (no "wwprojh" or offensive words)
- No CPU cycles are spent on generation algorithms.
Disadvantages
- Takes a lot of time and effort.
- The more stratification* you want the more effort is required
- The list is finite - as some point there will be repitition
- Altering name distribution is tricky. (some names are slightly more common in region X for instance)
*
stratification: Perhaps you'd like names associated by birth place.
Indian people tend to have different names than
Chinese people. You might also want to swap names by social standing, beggars have different names than lords. Occupational names may also want to seperated -pirates and ninjas. Knights and monks. Bad guys and good guys. Demons and Angels.
The big list approach has enough drawbacks for us to want something better. Not to say big list is redundant. It made be used with any other name generation technique.
2. Markov Chains
By using a markov chaining algorithm we recieve a data structure that can produce many different names. A markov chain is a record of patterns in a sequence of information. Here our sequence of information will be letters but it could be any sequence - dungeon tiles, musical notes etc.
A markov algorithm has two important parameters, the first is the data sequence. The second is it's "pattern memory" or "step size". This is best explained with an example.
Example 1. Stepsize = 1
The data sequence will be the following three words.
bag
bat
ate
Each word we'll feed into the algorithm ... and this is what will happen:
First Rule: b -> a b|ag
Letter b's can be followed by letter a's. Is what this says. This rule is recorded.
Second Rule: a -> g ba|g
Letter a's can be followed by letter g's. The final rule that doesn't need recording is letter g's are followed by nothing. This is default behaviour for all letters without rules. That leaves us with two production rules.
b -> a
a -> g
From these rules we can make
new words. This is done by randomly choosing a starting rule. For instance we might choose rule two. We put down "a".
a
According to the production rule we know after a we can put a "g". This is our only choice so we put down g.
ag
G has no production rule so we should stop. This leaves us with the rather unsatisfactory word "ag".
What to do when we've
finished but the words to short? Well we can pick another rule at random - the warning here is this greatly increases your chances or getting some rubbish words. So we have ag, let's say we but down "a" or "b" then follow the production rule. We might get
agag or agbag
So even at this early stage we have things that look like words!
Now to feed in the last two words. This leaves us with the following rules.
b -> a
b -> a
a -> g
a-> t
a -> t
t -> e
So some of the rules are duplicate, that means their more likely to get picked. (Programming this you'd probably give each rule a pick count rather than have multiple duplicate rules in your array. This also allows you to toggle whether or not you want probablities to effect picking.)
Using these rules we can produce things like: batag, bate, bagbat ...
Example 3. Stepsize = 2
So what's step size 2? Well the production rules produced will now look like the following:
ba -> g_
ba -> t_
at -> e_.
As you can see setsize two really suits words greater than three letters. How about "rope", "road" and "peon". The following rules would be produced.
ro -> ad
ro -> pe
pe -> on
From this we can get: ropeon. The more words fed in the greater the possible outputs. Let's add "adventure".
ro -> ad
ro -> pe
pe -> on
ad -> ve
ve -> tu
tu -> re
In this fasion a large database of rules can be built. You are best feeding in name-words rather than gardening terms, for instance.
Advantages
- Easy to churn out new words from a name list.
- The intensive phase of building the data structure is done once and need only be done once.
Disadvantages
- It will tend to produce some obviously non-name words
- Especially if it's comes to a end (g-> nothing) and then you force it to pick a random production and carry on.
- It has to carefully set up by feeding in a "good" data set.
- It's not easy to demand words of a certain size
Markov chains are a step in the right direction. Names will often be good enough. But can we do better?
Extension to the Markov Chain Algorithm
We force words to meet certain moulds. This ensures we always have words of a set size and logical composition. Each word we enter into the database we save the order of it's constant vowels. e.g.
nathiel -> CVCCVVC
derkor -> CVCCVC
Then we lock all generated names into the above patterns. This reduces the amount of non-name words.
3. Handbuilt Grammars
A grammar is a lot like a collection of the production rules we generated above but this time we type them out by hand! Seems a bit like a step-backwards but we'll get much higher quality names this way. First here's the bat, bag, ate grammar version of a the markov chain.
[a] -> a[t] | a[g]
[b] -> b[a]
[g] -> g
[t] -> t
The square brackets mean that this an element that needs to be reduced. The | (pipe) means "or". Seems to be pretty similar but we no longer have the probabilty stuff (this of course can be programmed in). Well hand writing the grammar give us a much finer degree of control. Let's imagine a grammar for last names.
[name] -> [C][V][C][ending]
[C] -> b | c | d | f | ... // A selection of the constants
[V] -> a | e | i | o | u
[ending] -> ith | on | ton | field | man
So we reduce [name] and from this are given some more symbols to reduce. Once we reduce everything we're left with a name. If we're given two or more "or" options in a production we pick one a random. This name genator will produce names such as:
Darman, Homon, Turith ...
These aren't bad names! So we'd probably want to create at the least a male and female grammar. We'd get a standard somewhat related set of names.
The fun doesn't stop with names for players of course - we can name stars, swords, monsters, cities, boats, ports, districts, houses, steeds, gods ...
If we write a grammar interpretter that will take in any type of grammar - then we can generate the actually grammar content randomly! :o
Advantages
- Great degree of control and random generation combined
- Names sounds natural and relatedness between names can be encouraged
Disadvantages
- Handbuilt grammars are ... Yes! Created by hand. Therefore there's sometime investment
- You may end building quite a few if you want names for different places, and monsters names... blah blah blah
And here is some extremely cheap lua code
--[[
Cheap Grammar
A attempt to make a grammar with out the hard parsing programming.
--]]
require("lualib.txt");
math.randomseed(os.time());
function C()
return Pick({"b","c","d","f","g","h","j","k","l","m","n","p","r","s","t","v","w","x","y","z"})
end
function V()
return Pick({"a","i","e","o","u"});
end
function Ending()
return Pick({"ith", "ton", "on", "field", "man"})
end
math.randomseed(os.time());
print(C() .. V() .. C() .. Ending());
Rogfield, Ravon ... Geton, Xanith, Kolman.
(Why does my random pick function alway pick R first? Something to look into)
lualib.txt is my library of useful lua functions. One of these functions is Pick. I'm considering writing something on this library but for now
Pick is left as an excerise for the reader.
4. The End
There are three ways to add a variety of names to your game. You can chose to use one or use them all in concert. Have fun.
Some resources
I seem to have removed a load of my links - I'll come back to this and try and fill it up with a few useful bits