Algorithm to generate random names

I recently wanted to create an algorithm that could generate random fantasy names. The goal was to have the algorithm produce a new random name each time it was called.

Returning a random “normal” name is pretty easy (ie: John, Robert, Stacy). You just take a big list of names (like from the US census) and draw one out at random.

If you rather want to create some made-up name, like one you could find in a fantasy novel, video game or sci-fi movie and you want it to be random, then you need to use an algorithm.

Here is a sample of random male fantasy names generated with the technique I will describe:

Ealst
Fara
Riond
Cuserion
Staran
Melenorno
Ennamit
Inele
Carau
Anbolod
Andockeff
Belagu
Fresp
Aronduste
Kemal
Fanda
Gaurio
Dersta

Overview

Stringing random letters together would give you something like: “dkwidfjwz”, which isn’t very convincing.

The proper technique consist of parsing a text file to create probability tables. These indicate the probability of a letter following another letter. Something that says the letter A has 50% chance of being followed by E and 50% of being followed by F.

While you could take existing letter probability from the English language, it is very important that make your own probability tables created from your own sample data. Otherwise you won’t get good names from your output. This is because standard probability tables are made using common names rather than proper names.

Building your probability tables

This simple script was rapidly thrown up together and won’t win me any awards for coding style but it will show you how to generate a two letter pair probability table.

# simple scripting to generate probability tables from a file

# PARAMETERS
# this script takes two parameters, the first is the input file 
# and the second is the output file

require "yaml"

input_file = ARGV[0]
output_file = ARGV[1]

# treat all letters as well as spaces
chars = ('a'..'z').to_a.push(' ')

last_char_read = " "
frequencies = Hash.new(0.0)

# parse the file to read letter pair frequencies
File.open(input_file) do |file|
	while char = file.getc
		if ('a'..'z').to_a.include?(char.downcase)
			if chars.include?(last_char_read.downcase)
				frequencies[last_char_read.downcase + char.downcase] += 1
			end
		end
		
		last_char_read = char
	end
end

# get the total count of each single letter
letter_total_count = Hash.new(0.0)
frequencies.each {|key, value| letter_total_count[key[0]] += value}  
letter_total_count[frequencies.keys.last[1]] += 1

# the final hash will contain our, ahem, final result
final = Hash.new(0.0)
frequencies.sort.each {|key, value| final[key] = (value / letter_total_count[key[0]])  }  

# make a running total 
chars.each do |first_letter|
	running_total = 0.0

	('a'..'z').each do |second_letter| 
		if final.key? first_letter + second_letter
			original_value = final[first_letter + second_letter] 
			final[first_letter + second_letter] += running_total
			running_total += original_value
		end
	end
end 

# output to file for later use
File.open(output_file, "w") {|file| file.puts YAML::dump(final)}

Here is a partial sample of what is being generated in the output file:

ba: 0.21774193548387097
be: 0.5403225806451613
bi: 0.5887096774193548
bl: 0.6129032258064515
bo: 0.814516129032258
br: 0.9274193548387096
bu: 0.9919354838709677
by: 1.0

As we will see in the Improvements section, this basic script can be improved upon to get better results.

Generating the random name

To generate a random file from such a file you basically generate a random number from 0.0 to 1.0 and get the corresponding letter.

Here is how you could generate a random name:

require "yaml"

class RandomNameGenerator
	def initialize	
		@random_number_generator = Random.new 
		@probability_tables = YAML.load_file('your_data_file_name_here')
	end
	
	def generate
		name = get_next_letter(' ')
		
		@random_number_generator.rand(4...10).times do			
			next_letter = get_next_letter(name[name.length - 1])
			
			while next_letter == name[name.length - 1] && next_letter == name[name.length - 2]
				next_letter = get_next_letter(name[name.length - 1])
			end
			
			name += next_letter	
		end

		return name
	end

private
	def get_next_letter(current_letter)	
		random_number = @random_number_generator.rand(0.0..1.0)
		
		@probability_tables.select {|k, v| k[0] == current_letter && 
		 	  															 v >= random_number}.first[0][1]
	end 
end

If you are using Ruby 1.9.2 or above, use the new Random class and reuse an instantiated object rather than creating a new for each “roll”. This way your results will be much more random. (Detailed explanation here)

Sample data

To get better results it is important to have good sample data. I can’t stress that enough. You need to give some thought about what to put in the file you will parse.

Consider the following points:

  • The source of your data: If you create your sample data using names from Lord of the Rings, expect to get names similar to those in Lord of the Rings. Your data should have a single or a very limited number of themes.
  • Sample size: A good sample size is important to get variability.
  • Sample representativeness: Your data sample should be representative. Your data should include rare letter combinations, but these should be infrequent. Think about how frequently you want certain letter combinations to come up and reflect this in your sample data.
  • Many samples: Be sure to use different samples for men, woman, places or different themes.

Improvements

My current program is more complex than what I have shown you here but I have a feeling this post is already far too long. I should really keep my post shorter if I want people to read them. For brevity I didn’t go in all the gory details but hopefully what you have here will get started on the correct path.

Here’s what to do next:

  • Generate both two letters pairs and three letters triplets in your probability tables.
  • Favor using triplets in your random generator but have pairs as a fallback when needed.
  • Instead of using a random value for name length, use a normal, Cauchy or Gaussian distribution to get name length. You could also build an average name length from your sample data.
  • Prevent long repetition of a single letter in output results (ie: Maeeeel). This is much less of a problem if you are using triplets.

Hope you have fun with all of this!

UPDATE: I have created a better version of this algorithm. You can find it here on GitHub and I also made another post, A better algorithm to generate random names that contains explanations. This post still contains useful complementary information that should be read first.

2 thoughts on “Algorithm to generate random names

Leave a comment