Algorithm to generate random names in F#

I remade and improved my random name generator algorithm I had done in Ruby several years ago, but this time in F#.

It works by taking a sample file which contains names, the names should be thematically similar, and uses it to create chains of probabilities. That is, when we find the letter A in the sample, what are the possible letters that can follow this A and what probability is there for each of these letter to come up.

This probability chain can have any length bigger than one.

Here are the steps for this algorithm:

  1. Build a probability table from the input file.
  2. Generate name length info from the input file.
  3. Generate a name with the name length and probability table.

Build a probability table

Here’s what a probability table looks like:

{“probabilities”:{” “:{“al”:0.973913,”am”:0.965217,”ar”:0.93913,”at”:0.930435,”au”:0.921739,”ba”:0.904348,”be”:0.886957,”bi”:0.878261,”bo”:0.86087,”bu”:0.852174,”ca”:0.843478,”co”:0.834783,”da”:0.808696,”de”:0.791304,”do”:0.782609,”dr”:0.773913,”el”:0.756522,”eo”:0.747826,”fa”:0.73913,”ga”:0.721739,”gh”:0.713043,”gi”:0.704348,”gr”:0.695652,”gu”:0.669565,”ha”:0.643478,”ho”:0.626087,”is”:0.617391,”je”:0.6,”ju”:0.591304,”ka”:0.582609,”ko”:0.556522,”ku”:0.547826,”la”:0.53913,”li”:0.521739,”lo”:0.513043,”lu”:0.495652,”ma”:0.486957,”me”:0.478261,”mh”:0.46087,”mi”:0.452174,”mo”:0.426087,”no”:0.4,”on”:0.391304,”or”:0.373913,”pa”:0.356522,”ph”:0.330435,”pu”:0.321739,”qa”:0.313043,”qu”:0.304348,”ra”:0.278261,”rh”:0.269565,”ri”:0.26087,”ro”:0.234783,”ru”:0.226087,”sa”:0.191304,”se”:0.182609,”sh”:0.173913,”ta”:0.147826,”th”:0.13913,”to”:0.130435,”tu”:0.121739,”ul”:0.113043,”va”:0.086957,”vo”:0.078261,”wa”:0.069565,”wi”:0.06087,”xa”:0.052174,”xe”:0.043478,”yu”:0.026087,”ze”:0.017391,”zi”:0.008696,”zu”:0.0},”a”:{“ba”:0.970874,”be”:0.951456,”de”:0.941748,”di”:0.932039,”ev”:0.92233,”go”:0.893204,”gu”:0.883495,”hd”:0.873786,”hr”:0.864078,”ie”:0.854369,”ig”:0.84466,”im”:0.834951,”

,”nameLengthInfo”:{“mean”:7.4273504273504276,”standardDeviation”:1.7261995176214626}}

This table can be serialized to prevent recomputing it each time we call the algorithm.

The algorithm works with sub strings of size X where a small X will provide more random results (less close to the original result) but a larger X will provide results more closely aligned with the sample file.

Results more closely aligned with the sample file better reflect the sample but face a higher risk of ending up as a pastiche of 2 existing names or in some cases being one of the sample’s name as is.

Here’s how the sub strings work:

If we have the following name in our sample file:

Gimli

Using a sub string length of 2 would add all these sub strings in our probability table:

G i

i m

m l

l i

While using a sub string length of 3 would add these sub strings:

G im

i ml

m li

And so on as we increase the length of the sub strings.

After we have counted all the possible occurrences of each sub string over the whole file we assign a probability to each one.

For example, if for our whole file we would have the following possible sub strings for G

G im

G lo

G an

Each of these would be assigned a probability of 33.3%.

Generate name length info from the input file

The generate names of a length representative of our input sample we simply count theĀ  length of each name and derive a mean value and standard deviation. Using the mean and standard deviation will then easily allow us to draw a value from the normal distribution of word lengths.

Additionally it’s best to enforce a minimum word length. Even if our sample contains shorter names (2 or 3 letters long), from experience the algorithm doesn’t produce convincing results on these shorter lengths.

This is because it doesn’t differentiate sub strings for long and short names.

Generate a name with the name length and probability table

To generate a name we start by finding our desired name length using our name length info. Then we select our first character, the white space character.

We then generate a number between 0.0 and 1.0 (or 0 and 100) and using a prebuilt dictionary containing the probability table, find the next item.

Code

The code is also available on GitHub in a more readable format.

Sample and Examples

The larger the sample the better. Also the more thematically aligned the sample, the better. What I mean by thematically aligned is if you include the names of all Greek masculine mythological figures, you will get results that resemble the names of the Greek heroes and Gods.

For example:

Thedes
Kratla
Pourseus

On the other hand if you build your samples with names from the Lord of The Rings but include an equal part of Hobbits, Dwarf, Elven and Orcish names you will end up with a mishmash that does not make much sense.

Finally here are some results of the algorithm using the this sample file containing the names of some of the locations in the games Final Fantasy XI and Final Fantasy XIV:

Bastok SanDoria Windurst Jeuno Aragoneu Derfland Elshimo Fauregandi Gustaberg Kolshushu Kuzotz LiTelor Lumoria Movalpolos Norvallen Qufim Ronfaure Sarutabaruta Tavnazian TuLia Valdeaunia Vollbow Zulkheim Arrapago Halvung Oraguille Jeuno Rulude Selbina Mhaura Kazham Norg Rabao Attohwa Garlaige Meriphataud Sauromugue Beadeaux Rolanberry Pashhow Yuhtunga Beaucedine Ranguemont Dangruf Korroloka Gustaberg Palborough Waughroon Zeruhn Bibiki Purgonorgo Buburimu Onzozo Shakhrami Mhaura Tahrongi Altepa Boyahda RoMaeve ZiTah AlTaieu Movalpolos Batallia Davoi Eldieme Jugner Phanauet Delkfutt Bostaunieux Ghelsba Horlais Ranperre Yughott Balga Giddeus Horutoto Toraimarai Lufaise Misareaux Phomiuna Riverne Xarcabard Gusgen Valkurm Ordelle LaTheine Konschtat Arrapago Carteneau Thanalan Coerthas Noscea Matoya MorDhona Gridania Rhotano Uldah Limsa Lominsa Dravanian Ishgard Doma Sastasha Tamtara Halatali Haukke Qarn Aurum Amdapor Pharos Xelphatol Daniffen Aldenard Garlea Eorzea Vanadiel

Note that this sample is very small and not thematically consistent, still here are the results using a sub string length of 2:

Gazormon
Vasamaur
Ltemenolp
Zonaone
Ldausa
Zorvaie
Kugo
Limoruxeab
Raullaiat
Jelphorshi

A sub string length of 3:

Arzergid
Mhaure
Phowindugh
Rhonearlos
Tuto
Saltabaolo
Qangarle

And a sub string length of 5:

Arronfais
Sautotara
Tahimorut
Batahranperr
Movernguegan

I feel that the algorithm could still use some improvements but is still very satisfactory considering the bad quality of the sample file used.

Height map generation in F# using midpoint displacement

Here is a simple program to generate some height maps. The maps can be generated to png files or txt files (as a serialized array).

Here’s the main program:

module TerrainGen

open System.Drawing

open HeightMap  
open MidpointDisplacement
open TestFramework
open Tests

let heightMapToTxt (heightMap:HeightMap) (filename:string) =
    let out = Array.init (heightMap.Size * heightMap.Size) (fun e -> heightMap.Map.[e].ToString())
    System.IO.File.WriteAllLines(filename, out)

let heightMapToPng (heightMap:HeightMap) (filename:string) =
    let png = new Bitmap(heightMap.Size, heightMap.Size)
    for x in [0..heightMap.Size-1] do
        for y in [0..heightMap.Size-1] do
            let red, green, blue = convertFloatToRgb (heightMap.Get x y) 
            png.SetPixel(x, y, Color.FromArgb(255, red, green, blue))
    
    png.Save(filename, Imaging.ImageFormat.Png) |> ignore

[<EntryPoint>]
let main argv =
    consoleTestRunner testsToRun
    let map = newHeightMap 8
    generate map 0.3 0.5
    heightMapToPng map "out.png"
    heightMapToTxt map "out.txt"  
0 

It uses two other modules. HeightMap which contains the height map type and the functions to work with this type. MidpointDisplacement which contains the algorithm proper.

module HeightMap

// contains the height map types and common functions that can be re-used for 
// different generation algorithms

type HeightMap = {Size:int; Map:float array} with     
    member this.Get x y =
        this.Map.[x * this.Size + y]      
        
    member this.Set x y value =
        this.Map.[x * this.Size + y] <- value

// returns a square matrix of size 2^n + 1
let newHeightMap n : HeightMap =
    let size = ( pown 2 n ) + 1
    {Size = size; Map = Array.zeroCreate (size * size)}  

// normalize a single value to constrain it's value between 0.0 and 1.0
let normalizeValue v =
    match v with
    | v when v < 0.0 -> 0.0
    | v when v > 1.0 -> 1.0
    | _ -> v

// converts a float point ranging from 0.0 to 1.0 to a rgb value
// 0.0 represents black and 1.0 white. The conversion is in greyscale 
let convertFloatToRgb (pct:float) : int * int * int =
    let greyscale = int (255.0 * pct)
    (greyscale, greyscale, greyscale)
    
// returns the average between two values    
let inline avg (a:^n) (b:^n) : ^n =
    (a + b) / (LanguagePrimitives.GenericOne + LanguagePrimitives.GenericOne)
    
// returns a floating number which is generated using bounds as a control of the range of possible values
let randomize (rnd:System.Random) (bound:float) : float =   
(rnd.NextDouble() * 2.0 - 1.0) * bound
module MidpointDisplacement

open HeightMap

// set the four corners to random values
let initCorners (hm:HeightMap) (rnd) =
    let rnd = System.Random()    
    let size = hm.Size   
    
    hm.Set 0 0 (rnd.NextDouble())
    hm.Set 0 (size - 1) (rnd.NextDouble())
    hm.Set (size - 1) 0 (rnd.NextDouble())
    hm.Set (size - 1) (size - 1) (rnd.NextDouble())
    
// set the middle values between each corner (c1 c2 c3 c4)
// variation is a function that is applied on each pixel to modify it's value
let middle (hm:HeightMap) (x1, y1) (x2, y2) (x3, y3) (x4, y4) (variation) =   
    // set left middle
    if hm.Get x1 (avg y1 y3) = 0.0 then 
        hm.Set x1 (avg y1 y3) (avg (hm.Get x1 y1) (hm.Get x3 y3) |> variation)      
    
    // set upper middle
    if hm.Get (avg x1 x2) y1 = 0.0 then
        hm.Set (avg x1 x2) y1 (avg (hm.Get x1 y1) (hm.Get x2 y2) |> variation)
    
    // set right middle
    if hm.Get x2 (avg y2 y4) = 0.0 then 
        hm.Set x2 (avg y2 y4) (avg (hm.Get x2 y2) (hm.Get x4 y4) |> variation)
    
    // set lower middle
    if hm.Get (avg x3 x4) y3 = 0.0 then
        hm.Set (avg x3 x4) y3 (avg (hm.Get x3 y3) (hm.Get x4 y4) |> variation)           

// set the center value of the current matrix to the average of all middle values + variation function
let center (hm:HeightMap) (x1, y1) (x2, y2) (x3, y3) (x4, y4) (variation) =
    // average height of left and right middle points
    let avgHorizontal = avg (hm.Get x1 (avg y1 y3)) (hm.Get x2 (avg y2 y4))
    let avgVertical = avg (hm.Get (avg x1 x2) y1) (hm.Get (avg x3 x4) y3)
           
    // set center value
    hm.Set (avg x1 x4) (avg y1 y4) (avg avgHorizontal avgVertical |> variation) 

let rec displace (hm) (x1, y1) (x4, y4) (rnd) (spread) (spreadReduction) =
    let ulCorner = (x1, y1) 
    let urCorner = (x4, y1)
    let llCorner = (x1, y4)
    let lrCorner = (x4, y4)
    
    let variation = (fun x -> x + (randomize rnd spread)) >> normalizeValue
    let adjustedSpread = spread * spreadReduction
    
    // the lambda passed in as a parameter is temporary until a define a better function
    middle hm ulCorner urCorner llCorner lrCorner variation 
    center hm ulCorner urCorner llCorner lrCorner variation
    
    if x4 - x1 >= 2 then
        let xAvg = avg x1 x4
        let yAvg = avg y1 y4
        displace hm (x1, y1) (xAvg, yAvg) rnd adjustedSpread spreadReduction
        displace hm (xAvg, y1) (x4, yAvg) rnd adjustedSpread spreadReduction
        displace hm (x1, yAvg) (xAvg, y4) rnd adjustedSpread spreadReduction
        displace hm (xAvg, yAvg) (x4, y4) rnd adjustedSpread spreadReduction
    
let generate hm startingSpread spreadReduction =
    let rnd = System.Random()
    let size = hm.Size - 1    
    
    initCorners hm rnd
displace hm (0, 0) (size, size) rnd startingSpread spreadReduction

The algorithm is pretty similar to diamond-square, in fact I have seen some people call it so, but it’s subtly different (in how to various sub-sections are divided) from the canon example, which is why I’m referring to it as midpoint displacement rather than diamond-square.

I’m pretty happy with the output of the results. It’s better than any map I have done before. Here is an example :

out

The code would need some optimization has it’s running out of memory fairly quick when generating larger maps.

You can find it as part of a larger repo on GitHub, that I have sadly abandoned.

Tree in Rust

I re-implemented my C tree program from my last post in Rust. Here is the GitHub link.

use std::collections::VecDeque;

struct TreeNode {
    value: i32,
    left: Option<Box<TreeNode>>,
    right: Option<Box<TreeNode>>,
}

fn main() {
    let root = build_tree();
    root.breadth_first();
}

fn build_tree() -> TreeNode {
    let root = TreeNode { value: 2,
        left: Some(Box::new(TreeNode { value: 7,
                            left: Some(Box::new(TreeNode { value: 2, left: None, right: None })),
                            right: Some(Box::new(TreeNode { value: 6,
                                                left: Some(Box::new(TreeNode { value: 5, left: None, right: None })),
                                                right: Some(Box::new(TreeNode { value: 11, left: None, right: None })) })) })),
        right: Some(Box::new(TreeNode { value: 5,
                            left: None,
                            right: Some(Box::new(TreeNode { value: 9,
                                                left: Some(Box::new(TreeNode { value: 4, left: None, right: None })),
                                                right: None })) }))};
    return root;
}

impl TreeNode {
    fn depth_first_pre(self) {
        print!("{}, ", self.value);

        if self.left.is_some() {
            self.left.unwrap().depth_first_pre();
        }

        if self.right.is_some() {
            self.right.unwrap().depth_first_pre();
        }
    }

    fn depth_first_post(self) {
        if self.left.is_some() {
            self.left.unwrap().depth_first_post();
        }

        if self.right.is_some() {
            self.right.unwrap().depth_first_post();
        }

        print!("{}, ", self.value);
    }

    fn breadth_first(self) {
        let mut queue = VecDeque::new();
        queue.push_back(self);

        while !queue.is_empty() {
            let node = queue.pop_front();

            match node {
                Some(e) => {
                    print!("{}, ", e.value);

                    if e.left.is_some() {
                        queue.push_back(*e.left.unwrap());
                    }

                    if e.right.is_some() {
                        queue.push_back(*e.right.unwrap());
                    }
                },
                None => return,
            }
        }
    }
}

It’s pretty simple stuff. The main problem is that this consumes the tree as I’ve not dealt with ownership and borrowing, two things I really need to grok in Rust.

Update
I have updated the GitHub repository with non consuming versions of all three algorithms.

Tree in C

I recently came back to C after a long time (about 12+ years).

For one thing pointers seem a lot less complicated than I remember, in fact they’re pretty easy to understand. Besides this I’ve found a like the simplicity of the language and the closeness to the machine.

I think what changed from my college days regarding pointers is a better understanding of the stack and exposure to languages like C# with reference and value types.

I’ve written a short program that creates a tree data structure and that allows to navigate it in three different ways:

Depth first pre-order
Depth first post-order
Breadth first

Depth first just means starting from one node and going as deep as possible until you find a leaf node (a node with no children) and then backtracking. You can do this in an iterative way but for me this is so much simpler to understand using recursion.

The difference between pre-order and post-order is when you apply your operation on the tree node, before or after continuing traversal of the tree. You can look at the example below.

Breadth first on the other hand is easier done in an iterative way. The key here is keeping a second data structure, a queue to store the nodes. After you visit a node you enqueue it’s children and visit them in turn, printing their value and enqueuing they’re children in turn.

It’s all pretty simple stuff but I needed a refresher on this topic and also something simple to start writing C code again.

The code is on GitHub.

Here is the code of the main file, tree.c:

#include <stdio.h>
#include <stdlib.h>
#include "queue.c"

struct tree_node {
  int value;
  struct tree_node *parent;
  struct tree_node *left, *right;
};

typedef struct tree_node node;
typedef void (*node_func)(node *);

node * create_node(int, node *);
node * build_tree();
void breadth_first(node *);
void create_or_enqueue(q_node **, node *);
void depth_first_pre(node *, node_func);
void depth_first_post(node *, node_func);
void print_node(node *);
void free_node(node *);

int main() {
  node *root = build_tree();

  // depth_first_pre(root, print_node);
  breadth_first(root);
  depth_first_post(root, free_node);

  return 0;
}

node * create_node(int value, node *parent) {
  node *new = malloc(sizeof(node));
  new->value = value;
  new->parent = parent;

  return new;
}

/* builds a tree with a preset of data */
node * build_tree() {
  node *root = create_node(2, NULL);
    root->left = create_node(7, root);
      root->left->left = create_node(2, root->left);
      root->left->right = create_node(6, root->left);
        root->left->right->left = create_node(5, root->left->right);
        root->left->right->right = create_node(11, root->left->right);
    root->right = create_node(5, root);
      root->right->right = create_node(9, root->right);
        root->right->right->left = create_node(4, root->right->right);

  return root;
}

/* Pre-order Depth-first traversal */
void depth_first_pre(node *current, node_func func) {
  if (current == NULL) {
    return;
  }

  (func(current));

  depth_first_pre(current->left, func);
  depth_first_pre(current->right, func);
}

/* Post-order Depth-first traversal */
void depth_first_post(node *current, node_func func) {
  if (current == NULL) {
    return;
  }

  depth_first_post(current->left, func);
  depth_first_post(current->right, func);

  (func(current));
}

/* breadth-first traversal */
void breadth_first(node *current) {
  if (current == NULL) {
    return;
  }

  q_node *head = create_qnode(current);

  while (head != NULL) {
    q_node *q_node_ptr = dequeue(&head);
    node *node_ptr = q_node_ptr->value;
    printf("%d, ", node_ptr->value);
    q_node **head_ptr = &head;

    if (node_ptr->left != NULL) {
      create_or_enqueue(head_ptr, node_ptr->left);
    }

    if (node_ptr->right != NULL) {
      create_or_enqueue(head_ptr, node_ptr->right);
    }
  }
}

void create_or_enqueue(q_node **head, node *next_node) {
  if (*head == NULL) {
    *head = create_qnode(next_node);
  }
  else {
    enqueue(*head, create_qnode(next_node));
  }
}

void print_node(node *current) {
  printf("%d, ", current->value);
}

void free_node(node *current) {
  free(current);
}

Here is my simple implementation of a queue (keep in mind that I haven’t done any C programming for more than a decade) which I use for the breadth first algorithm:

#include <stdlib.h>

struct queue_node {
  void * value;
  struct queue_node *next;
};

typedef struct queue_node q_node;

q_node * create_qnode(void *);
void enqueue(q_node *, q_node *);
q_node * dequeue(q_node **);

q_node * create_qnode(void * value) {
  q_node *new = malloc(sizeof(q_node));
  new->value = value;

  return new;
}

void enqueue(q_node *head, q_node *new) {
  if (new == NULL || head == NULL) {
    return;
  }

  q_node *node_ptr = head;
  while (node_ptr->next != NULL) {
    node_ptr = node_ptr->next;
  }

  node_ptr->next = new;
}

q_node * dequeue(q_node **head) {
  if (*head == NULL) {
    return NULL;
  }

  q_node *old_head = *head;
  *head = (*head)->next;

  return old_head;
}

The function to create the data uses the tree from the Wikipedia article on Tree.

Public domain Tree diagram.
Public domain Tree diagram.

Here is the output from traversing the tree depth first pre-order:
2, 7, 2, 6, 5, 11, 5, 9, 4

And here is the output from traversing the tree breadth first:
2, 7, 5, 2, 6, 9, 5, 11, 4

If you have any suggestions to improve the code or see any mistakes don’t hesitate to point them out in the comments.

I will probably look into doing some basic compression in C next and then as a further project I might end up converting these in Rust for the experience of comparing the two languages. Hopefully I might get some insights into Rust at the same time.