Writing a simple Sudoku solver in F#

To continue learning about F# and to practice programming functionally, I have written a simple Sudoku solver. I use the word simple, because some simple Sudoku problems can be solved only by going over the board and removing entries without having to make “guesses” and backtracking.

This guessing and backtracking usually involves implementing a depth-first search when writing a Sudoku algorithm.

In the interest of time (there’s another project which I have been delaying for months and need to be starting) I have only implemented a solution to the easy type of Sudokus.

The first part is writing code that checks whether a board is correctly solved. This part can be done first and it’s better to do so since we will need it to check whether our Sudoku solver actually works.

The process is pretty easy, in Board.fs I implement the types I need to represent the board (Value) and it’s state (BoardSolved).

Checking the board implies collecting all it’s rows, columns and boxes (which are nine 3×3 regions on the board) and then checking them for duplicates.

module Board
type Value =
| Locked of int
| Possible of int List
type BoardSolved =
| Solved
| InvalidSolution
| BoardUnlocked
// Get a series of board elements defined by the function eFunc
let private getLinearElements eFunc start end' (board:Value[,]) : Value list =
[start..end'] |> List.fold (eFunc) []
// Get the Values at row rowNum
let row rowNum (board:Value[,]) : Value list =
getLinearElements (fun acc e> board.[rowNum, e] :: acc) 0 8 board
// Get the values at column colNum
let column colNum (board:Value[,]) : Value list =
getLinearElements (fun acc e> board.[e, colNum] :: acc) 0 8 board
// Get the values at box boxNum. A box is a 3×3 square of which there are 9 in the Sudoku board.
// The squares are at fixed position and are numbered sequentially from 0 to 8 with 0 being the top left corner
// and 8 and the bottom right corner.
let box boxNum (board:Value[,]) : Value list =
let startX, endX, startY, endY = match boxNum with
| 0 -> 0, 2, 0, 2
| 1 -> 3, 5, 0, 2
| 2 -> 6, 8, 0, 2
| 3 -> 0, 2, 3, 5
| 4 -> 3, 5, 3, 5
| 5 -> 6, 8, 3, 5
| 6 -> 0, 2, 6, 8
| 7 -> 3, 5, 6, 8
| 8 -> 6, 8, 6, 8
| _ -> failwith "Invalid box values. range: 0-8"
[startY..endY]
|> List.fold (fun acc x> (getLinearElements (fun acc e> board.[x, e] :: acc) startX endX board) @ acc) []
let containsDuplicates list : bool =
List.length list <> (list |> List.distinct |> List.length)
// Check if a particular board is solved by returning it's BoardSolved status.
let boardSolved (board:Value[,]) : BoardSolved =
let makeLists func =
List.init 1 (fun x> List.init 9 (fun y> func y board))
// check that all squares are locked
let allLocked = board |> Seq.cast<Value>
|> Seq.forall (fun e> match e with
| Locked _ -> true
| _ -> false)
match allLocked with
| false -> BoardUnlocked
// check for duplicates
| true -> match makeLists row |> List.exists (fun x> x |> List.exists (containsDuplicates)) with
| true -> InvalidSolution
| false -> match makeLists column |> List.exists (fun x> x |> List.exists (containsDuplicates)) with
| true -> InvalidSolution
| false -> match makeLists box |> List.exists (fun x> x |> List.exists (containsDuplicates)) with
| true -> InvalidSolution
| false -> Solved

view raw
Board.fs
hosted with ❤ by GitHub

To solve a board we go over all rows, columns and boxes and for each item in these collections we remove from the list of possible values those values which have already been selected in another item. I call this “locking” the values.

This operation is applied successively until the board is solved. The most complicated part came because I used a 2D array to represent the board but I needed to convert the rows, columns and boxes values to lists. Converting the lists generated from the boxes was the single most complicated and longest part of the whole program.

I guess it pays to just use lists for everything and skip on using arrays. The language and it’s libraries are pretty much made for working with lists and sequences so when you deviate from that you end up having to write additional code.

module Solver
open Board
// Takes a Value list in parameter and return a new list with all possible wich are duplicates of
// locked values removed
let removePossibleEntries (values:Value list) : Value list =
// printf "\nremovePossibleEntries entry %A" values
// get locked values
let locked = values |> List.filter (fun x> match x with
| Locked _ -> true
| Possible _ -> false)
|> List.map (fun x> match x with
| Locked y -> y
| Possible _ -> failwith "Should have been filtered out")
// remove locked values from all lists of possible values
values |> List.map
(fun x> match x with
| Locked y -> Locked y
| Possible items -> Possible (List.filter (fun z> not (List.contains z locked)) items))
// go over all values and switch List of one possible value to a Locked value
let lockSingleValues (values:Value list) =
// printf "\nlockSingleValues entry %A" values
values |> List.map (fun x> match x with
| Possible [a] -> Locked a
| Possible (head :: tail) -> Possible (head :: tail)
| Possible [] -> failwith "Error this Possible list is empty"
| Locked v -> Locked v)
// when given a list of of lists each created by the application of the row function, convert it back to a board
let newBoardFromRows (items:Value list list) =
Array2D.init 9 9 (fun i j> items.[i].[8j])
// when given a list of of lists each created by the application of the column function, convert it back to a board
let newBoardFromColumns (items:Value list list) =
Array2D.init 9 9 (fun i j> items.[j].[8i])
// when given a list of of lists each created by the application of the box function, convert it back to a board
let newBoardFromBox (items:Value list list) =
let unboxed = Array2D.create 9 9 (Locked 0)
let si = seq {
for h in [0; 3; 6] do yield! seq {
for i in 1..3 do yield! seq {
for j in 1..3 do yield 0 + h
for j in 1..3 do yield 1 + h
for j in 1..3 do yield 2 + h
}
}
}
let sj = seq {
for i in 1..3 do yield! seq {
for j in 1..3 do yield! [0..2];
for j in 1..3 do yield! [3..5];
for j in 1..3 do yield! [6..8];
}
}
let sx = seq {
for i in 0..8 do yield! seq {
for j in 0..8 do yield i, j
}
}
let coordinates = List.map2 (fun i j> i, j) (si |> Seq.toList) (sj |> Seq.toList)
List.iter2 (fun (x, y) (i, j) -> Array2D.set unboxed x y (items.[i].[8j])) (sx |> Seq.toList) coordinates
unboxed
// remove and lock values
// remove = remove elements in lists of possible values. We remove those values which already appear somewhere else in
// the unit
// lock = when there is only one possible value we change it from a list of 1 possible to a locked value
let removeAndLock (board:Value[,]) =
let perform func dest =
[0..8] |> List.map (fun i> func i dest |> removePossibleEntries |> lockSingleValues)
// perform row board |> perform column |> perform box
perform row board |> newBoardFromRows |> perform column |> newBoardFromColumns |> perform box |> newBoardFromBox
let solve (board:Value[,]) =
// call removeAndLock, if board has changed call it again in case we are able to remove more possible values
// after the changes in the first removeAndLock call
let rec exclusionStep previousStep =
let currentStep = removeAndLock previousStep
if currentStep = previousStep then currentStep else exclusionStep currentStep
let newBoard = exclusionStep board
// check if the board is solved otherwise try to solve the board by trying arbitrary values
match boardSolved newBoard with
// we can return this valid board
| Solved -> newBoard
// we need to continue not all squares are filled
| BoardUnlocked -> failwith "Board not yet solved"
// we have come to an invalid solution after the exclusionStep, this shouldn't be happening
| InvalidSolution -> failwith "InvalidSolution after performStep"

view raw
Solver.fs
hosted with ❤ by GitHub

Here is the GitHub repo with the whole thing.

To make this solver capable of solving any Sudoku, we would need to change the

| BoardUnlocked -> failwith “Board not yet solved”

line with calling a depth-first search which would select the first unlocked value, try and select the first possible value for this item and then continue recursively trying to solve the board until either it succeeds or it fails. When/if it fails backtrack and continue it’s search using another value.

Genesis Procedural Content Generator

One of the things I have been working on is Genesis, a procedural content generator written in F#. It generates random names and terrain maps.

The project has almost nothing to do with what I had envisioned when I started it.

Here are some sample maps :

 

 

Mountains are in grey, grasslands/plains in pale green, forests in dark green, desert/arid climate in brown and water in blue.

It’s still lacking rivers and lakes and the output isn’t satisfactory. Sadly I think I would need to start over to get realistic terrain generation, which was one of my goal. The main problem is that I used midpoint displacement and noise functions which while good enough for video games don’t generate photo-realistic results alone.

To try and correct that, I devised my own solution using a crude simulation of water erosion rather than look into and implement standard algorithms.

It was a learning experience and a lot of fun but I would have had quicker and better results by copying some existing algorithm.

I don’t know what to do with the project right now as there are other things I want to spend my time on, but I would still like to work on it some more in the future. Maybe take a whole different approach and generate fantasy maps with brushes and icons which are much less detailed.

How to create an F# project under Linux

There isn’t a ton of info about starting out a new F# project under Linux so I’ve decided to document how I do it.

Install Mono

Following the instructions on the official F# site, install Mono and F#:

sudo apt-get install mono-complete fsharp

You can test that F# is installed by typing

fsharpc

to bring up the F# compiler.

Install Visual Studio Code

You could use any editor, I chose Visual Studio Code because it offers superb F# integration when combined with the Ionide plugin.

After installing VS Code press Ctr-Shift-X to open the Extension window and search for “Ionide” and install the following extensions:

  • Ionide-fsharp
  • Ionide-Paket
  • Ionide-FAKE

Start a new project

Using Ctr-P bring up the command window and type:

>f#:new project

Follow the instruction and select a class library project.

This will create the base project scaffolding including some files and folders. Now might be a good time to do a git init.

Paket

To build your project you must use the build.sh script. If you try this now, the command will fail because it won’t be able to find the Paket bootstrapper.

Paket is the package manager that downloads, installs and manages dependencies,  much like NuGet, Cargo or RubyGem on other platforms.

So head on over and download the latest Paket bootstrapper specifically paket.bootstrapper.exe and drop it in the .paket folder that was created by Ionide.

You can now run

./build.sh

And Paket will create a paket.dependencies and paket.lock.

Adding some code to the project

There should be a folder with the project name you specified to Ionide . Let’s add a new file to this project.

You have two choices here:

  1. Add it manually
  2. Use Ionide to automatically add the file to the project

Add it manually

To add the file manually you must create a new file with a .fs extension using the file system or VS Code.

Next you must add your file to the project by editing the .fsproj file.

Then open your .fsproj file and locate the ItemGroup section which includes the Compile Include tags. Similar to this:
<CompileInclude=”genesis2.fs”/>
<NoneInclude=”Script.fsx”/>
Add a new entry:
<CompileInclude=”NewFile.fs”/>
<CompileInclude=”genesis2.fs”/>
<NoneInclude=”Script.fsx”/>
Be careful, the order of the elements is important. If a file A is a dependency for file B, it must come before file B in this listing. In this example NewFile.fs can be a dependency for genesis2.fs but the reverse can’t be true.
The manual technique, even though it is tedious, is useful, particularly to debug your fsproj.

Use Ionide

The alternative is to use Ionide a to add a new file for you.

Create the new .fs file like before, but this time use Ctr-P, type in

>f#

And select, Add current file to project. Voila!

Adding a new dependency to our existing project

Let’s add a new dependency to our project, MathNet.Numerics, a library that provides methods and algorithms for numerical computations.

Again let’s see how to add it both manually and using the Ionide plugin.

Before installing any packages make sure that the .paket/paket.exe file that was previously downloaded by the bootstrapper  is executable. To do so, change the permissions via the command line or the GUI.

Add it manually

First open the paket.dependencies file in your project root and add the following line:

nuget MathNet.Numerics
And then in your project folder (for each of the projects where you want to install the dependencies) locate the paket.references file and add this line:
MathNet.Numerics
Then run:
.paket/paket.exe install
To check that everything is working you can by adding the following in one of your source file:
open MathNet.Numerics.Distributions
And build your project again.

Use Ionide

Open the .fsproj file of the project you want to add the dependency to, type in Ctr-P and then:

> paket: Add NuGet to current project

Debugging the project

Twice I’ve had the project refusing to build because of the paket dependencies after a Mono upgrade. In this case your best bet is to delete the dependency info in the fsproj and add them again.

Algorithm to generate random names in F#

I remade and improved my random name generator algorithm I had done in Ruby several years ago, but this time in F#.

It works by taking a sample file which contains names, the names should be thematically similar, and uses it to create chains of probabilities. That is, when we find the letter A in the sample, what are the possible letters that can follow this A and what probability is there for each of these letter to come up.

This probability chain can have any length bigger than one.

Here are the steps for this algorithm:

  1. Build a probability table from the input file.
  2. Generate name length info from the input file.
  3. Generate a name with the name length and probability table.

Build a probability table

Here’s what a probability table looks like:

{“probabilities”:{” “:{“al”:0.973913,”am”:0.965217,”ar”:0.93913,”at”:0.930435,”au”:0.921739,”ba”:0.904348,”be”:0.886957,”bi”:0.878261,”bo”:0.86087,”bu”:0.852174,”ca”:0.843478,”co”:0.834783,”da”:0.808696,”de”:0.791304,”do”:0.782609,”dr”:0.773913,”el”:0.756522,”eo”:0.747826,”fa”:0.73913,”ga”:0.721739,”gh”:0.713043,”gi”:0.704348,”gr”:0.695652,”gu”:0.669565,”ha”:0.643478,”ho”:0.626087,”is”:0.617391,”je”:0.6,”ju”:0.591304,”ka”:0.582609,”ko”:0.556522,”ku”:0.547826,”la”:0.53913,”li”:0.521739,”lo”:0.513043,”lu”:0.495652,”ma”:0.486957,”me”:0.478261,”mh”:0.46087,”mi”:0.452174,”mo”:0.426087,”no”:0.4,”on”:0.391304,”or”:0.373913,”pa”:0.356522,”ph”:0.330435,”pu”:0.321739,”qa”:0.313043,”qu”:0.304348,”ra”:0.278261,”rh”:0.269565,”ri”:0.26087,”ro”:0.234783,”ru”:0.226087,”sa”:0.191304,”se”:0.182609,”sh”:0.173913,”ta”:0.147826,”th”:0.13913,”to”:0.130435,”tu”:0.121739,”ul”:0.113043,”va”:0.086957,”vo”:0.078261,”wa”:0.069565,”wi”:0.06087,”xa”:0.052174,”xe”:0.043478,”yu”:0.026087,”ze”:0.017391,”zi”:0.008696,”zu”:0.0},”a”:{“ba”:0.970874,”be”:0.951456,”de”:0.941748,”di”:0.932039,”ev”:0.92233,”go”:0.893204,”gu”:0.883495,”hd”:0.873786,”hr”:0.864078,”ie”:0.854369,”ig”:0.84466,”im”:0.834951,”

,”nameLengthInfo”:{“mean”:7.4273504273504276,”standardDeviation”:1.7261995176214626}}

This table can be serialized to prevent recomputing it each time we call the algorithm.

The algorithm works with sub strings of size X where a small X will provide more random results (less close to the original result) but a larger X will provide results more closely aligned with the sample file.

Results more closely aligned with the sample file better reflect the sample but face a higher risk of ending up as a pastiche of 2 existing names or in some cases being one of the sample’s name as is.

Here’s how the sub strings work:

If we have the following name in our sample file:

Gimli

Using a sub string length of 2 would add all these sub strings in our probability table:

G i

i m

m l

l i

While using a sub string length of 3 would add these sub strings:

G im

i ml

m li

And so on as we increase the length of the sub strings.

After we have counted all the possible occurrences of each sub string over the whole file we assign a probability to each one.

For example, if for our whole file we would have the following possible sub strings for G

G im

G lo

G an

Each of these would be assigned a probability of 33.3%.

Generate name length info from the input file

The generate names of a length representative of our input sample we simply count the  length of each name and derive a mean value and standard deviation. Using the mean and standard deviation will then easily allow us to draw a value from the normal distribution of word lengths.

Additionally it’s best to enforce a minimum word length. Even if our sample contains shorter names (2 or 3 letters long), from experience the algorithm doesn’t produce convincing results on these shorter lengths.

This is because it doesn’t differentiate sub strings for long and short names.

Generate a name with the name length and probability table

To generate a name we start by finding our desired name length using our name length info. Then we select our first character, the white space character.

We then generate a number between 0.0 and 1.0 (or 0 and 100) and using a prebuilt dictionary containing the probability table, find the next item.

Code

The code is also available on GitHub in a more readable format.

module NameLength
open System
open System.Configuration
open MathNet.Numerics.Distributions
// A record type that contains the mean and standard deviation of the names world length from an
// input file.
type NameLengthInfo = { mean:float; standardDeviation: float }
// Given an input string returns the NameLengthInfo record that can be later used to draw a random
// value from the normal distribution.
let internal getNameLengthInfo (input:string) : NameLengthInfo =
let names = input.Split [|' '|]
let numberOfNames = float names.Length
let namesLengths = names |> Array.map (fun name> float name.Length)
let mean = namesLengths |> Array.average
let standardDeviation = sqrt (Array.averageBy (fun x> (x mean)**2.0) namesLengths)
{ mean = mean; standardDeviation = standardDeviation }
// Given a NameLengthInfo returns a random value drawn from a normal (gaussian) distribution
let internal getNameLength (nameLengthInfo:NameLengthInfo) =
let mean = nameLengthInfo.mean
let standardDeviation = nameLengthInfo.standardDeviation
let normalDistribution = Normal(mean, standardDeviation)
let length = normalDistribution.Sample() |> Math.Round |> int
let minimumLength = ConfigurationManager.AppSettings.Item("minimumNameLength") |> int
if length >= (minimumLength) then length else minimumLength
module ProbabilityTable
open System
open System.IO
open Newtonsoft.Json
open MapConverter
open NameLength
type ProbabilityTable = { probabilities:Map<string, Map<string, float>>;
nameLengthInfo:NameLengthInfo }
// Parses a string and count the total number of occurrences of substrings of size length
let rec countOccurrences input (occurrenceTable:Map<string, float>) length =
let adjLen = length 1
match input |> Seq.toList with
| head :: tail when tail.Length >= adjLen ->
let other = Seq.take adjLen tail |> Seq.toList
let occurrence = head :: other |> Array.ofList |> String
// add current occurrence to the occurrence table
let updatedMap = match occurrenceTable.ContainsKey (occurrence) with
| true -> occurrenceTable.Add(occurrence, occurrenceTable.[occurrence] + 1.0)
| false -> occurrenceTable.Add(occurrence, 1.0)
// call the function recursively with the rest of the string
countOccurrences (tail |> Array.ofList |> String) updatedMap length
| _ -> occurrenceTable
// Return a new probability table with the key value pair added.
// Given letter X, a probability table gives a percentage for letter Y to appear following letter X.
let private addProbability (key:string) value (probabilityTable:Map<string, Map<string, float>>) length =
let mainKey = Char.ToString key.[0]
let subKey = key.[1..]
match Seq.forall Char.IsLower subKey with
| false -> probabilityTable // do not add a subkey containing a white space
| _ -> match probabilityTable.ContainsKey(mainKey) with
| true -> let subMap = Map.find mainKey probabilityTable
match subMap.ContainsKey(subKey) with
| true -> failwithf "subkey %s already added in probabilityTable" subKey
| false -> let newSubMap = subMap.Add(subKey, value)
probabilityTable.Add(mainKey, newSubMap)
| false -> let subMap = Map.empty.Add(subKey, value)
probabilityTable.Add(mainKey, subMap)
// Cumulate the submap to transform to probabilities of the form 0.75 0.25 0.0.
// Notice that the order is decreasing. Instead of using the more tradational increasing order
// of 0.25 0.75 1.0, we are presenting the values in decreasing order starting from 1 to make
// picking the right value easier later on. When we will pick the letters we will draw a random
// number and check if it is greater than the value.
let private cumulate map =
let total = Map.fold (fun acc key value> acc + value) 0.0 map
let _, cumulativeSubMap =
// map into probability
Map.map (fun key value> value / total) map
// fold into a cumulative probability result
|> Map.fold (fun (t, (m:Map<string, float>)) key value ->
(t value, m.Add(key, t value))
) (1.0, Map.empty)
Map.map (fun key (value:float) -> Math.Round(value, 6)) cumulativeSubMap
// Given an input string creates a probability table for the different letters in the string.
let buildProbabilityTable (input:string) length : ProbabilityTable =
let nameLengths = getNameLengthInfo input
let occurrencesTable = countOccurrences (input.ToLower()) Map.empty length
let adjLen = length 1
let table = Map.fold (fun acc key value> addProbability key value acc adjLen)
Map.empty occurrencesTable
|> Map.map (fun key value> cumulate value)
{ probabilities = table; nameLengthInfo = nameLengths }
// Given an input file path, creates a probability table calling buildProbabilityTable
let buildProbabilityTableFromMediaFile filePath length : ProbabilityTable =
let input = File.ReadAllText(filePath)
buildProbabilityTable input length
// Given an input file path for an already built serialized probabilityTable, return this table
let buildProbabilityTableFromSerializationFile filePath length : ProbabilityTable =
let json = File.ReadAllText(filePath)
JsonConvert.DeserializeObject<ProbabilityTable>(json, mapConverter)
// Serialize a ProbabilityTable to file
let serializeProbabilityTable filePath (table:ProbabilityTable) =
let json = JsonConvert.SerializeObject table
File.WriteAllText(filePath, json)
module NameGenerator
open System
open System.Collections.Generic
open NameLength
open ProbabilityTable
let rnd = System.Random()
// Randomly returns a string from values based on it's probability
let private pickString (values:Map<string, float>) =
let randomValue = rnd.NextDouble()
let pick = values
|> Map.tryPick (fun key value> if randomValue >= value then Some(key) else None)
match pick with
| Some v -> v
| None -> failwith "Can't pick letter"
// Recursively creates a new name.
let rec private buildName (nameSoFar:string) (charLeft:int) (probabilityTable:ProbabilityTable) =
let lastChar = Char.ToString nameSoFar.[nameSoFar.Length 1]
let addition = match Map.containsKey lastChar probabilityTable.probabilities with
// if our character exists pick one of it's subkeys
| true -> pickString probabilityTable.probabilities.[lastChar]
// otherwise start a new sequence of character with a name starting character
| false -> pickString probabilityTable.probabilities.[" "]
let newName = nameSoFar + addition
let newCharLeft = charLeft addition.Length
match newCharLeft with
| ln when ln > 0 -> buildName newName newCharLeft probabilityTable // we need more
| ln when ln < 0 -> newName.[0..newName.Length 1] // we went one char to long
| _ -> newName // we are exactly where we want to be
// Given a pre-built probability table generates a random name.
let generateRandomName (probabilityTable:ProbabilityTable) =
let nameLength = int (getNameLength probabilityTable.nameLengthInfo)
// We pass in the whitespace char to start the name as this will allow us to find letters after
// spaces in our probability table. These are the letters that start name.
// We must remember to take this whitespace into account in our nameLength and later when
// returning the name
let lowerCaseName = buildName " " nameLength probabilityTable
(Char.ToUpper lowerCaseName.[1] |> Char.ToString) + lowerCaseName.[2..]

view raw
namegen.fs
hosted with ❤ by GitHub

Sample and Examples

The larger the sample the better. Also the more thematically aligned the sample, the better. What I mean by thematically aligned is if you include the names of all Greek masculine mythological figures, you will get results that resemble the names of the Greek heroes and Gods.

For example:

Thedes
Kratla
Pourseus

On the other hand if you build your samples with names from the Lord of The Rings but include an equal part of Hobbits, Dwarf, Elven and Orcish names you will end up with a mishmash that does not make much sense.

Finally here are some results of the algorithm using the this sample file containing the names of some of the locations in the games Final Fantasy XI and Final Fantasy XIV:

Bastok SanDoria Windurst Jeuno Aragoneu Derfland Elshimo Fauregandi Gustaberg Kolshushu Kuzotz LiTelor Lumoria Movalpolos Norvallen Qufim Ronfaure Sarutabaruta Tavnazian TuLia Valdeaunia Vollbow Zulkheim Arrapago Halvung Oraguille Jeuno Rulude Selbina Mhaura Kazham Norg Rabao Attohwa Garlaige Meriphataud Sauromugue Beadeaux Rolanberry Pashhow Yuhtunga Beaucedine Ranguemont Dangruf Korroloka Gustaberg Palborough Waughroon Zeruhn Bibiki Purgonorgo Buburimu Onzozo Shakhrami Mhaura Tahrongi Altepa Boyahda RoMaeve ZiTah AlTaieu Movalpolos Batallia Davoi Eldieme Jugner Phanauet Delkfutt Bostaunieux Ghelsba Horlais Ranperre Yughott Balga Giddeus Horutoto Toraimarai Lufaise Misareaux Phomiuna Riverne Xarcabard Gusgen Valkurm Ordelle LaTheine Konschtat Arrapago Carteneau Thanalan Coerthas Noscea Matoya MorDhona Gridania Rhotano Uldah Limsa Lominsa Dravanian Ishgard Doma Sastasha Tamtara Halatali Haukke Qarn Aurum Amdapor Pharos Xelphatol Daniffen Aldenard Garlea Eorzea Vanadiel

Note that this sample is very small and not thematically consistent, still here are the results using a sub string length of 2:

Gazormon
Vasamaur
Ltemenolp
Zonaone
Ldausa
Zorvaie
Kugo
Limoruxeab
Raullaiat
Jelphorshi

A sub string length of 3:

Arzergid
Mhaure
Phowindugh
Rhonearlos
Tuto
Saltabaolo
Qangarle

And a sub string length of 5:

Arronfais
Sautotara
Tahimorut
Batahranperr
Movernguegan

I feel that the algorithm could still use some improvements but is still very satisfactory considering the bad quality of the sample file used.