Algorithm to generate random names in F#

I remade and improved my random name generator algorithm I had done in Ruby several years ago, but this time in F#.

It works by taking a sample file which contains names, the names should be thematically similar, and uses it to create chains of probabilities. That is, when we find the letter A in the sample, what are the possible letters that can follow this A and what probability is there for each of these letter to come up.

This probability chain can have any length bigger than one.

Here are the steps for this algorithm:

  1. Build a probability table from the input file.
  2. Generate name length info from the input file.
  3. Generate a name with the name length and probability table.

Build a probability table

Here’s what a probability table looks like:

{“probabilities”:{” “:{“al”:0.973913,”am”:0.965217,”ar”:0.93913,”at”:0.930435,”au”:0.921739,”ba”:0.904348,”be”:0.886957,”bi”:0.878261,”bo”:0.86087,”bu”:0.852174,”ca”:0.843478,”co”:0.834783,”da”:0.808696,”de”:0.791304,”do”:0.782609,”dr”:0.773913,”el”:0.756522,”eo”:0.747826,”fa”:0.73913,”ga”:0.721739,”gh”:0.713043,”gi”:0.704348,”gr”:0.695652,”gu”:0.669565,”ha”:0.643478,”ho”:0.626087,”is”:0.617391,”je”:0.6,”ju”:0.591304,”ka”:0.582609,”ko”:0.556522,”ku”:0.547826,”la”:0.53913,”li”:0.521739,”lo”:0.513043,”lu”:0.495652,”ma”:0.486957,”me”:0.478261,”mh”:0.46087,”mi”:0.452174,”mo”:0.426087,”no”:0.4,”on”:0.391304,”or”:0.373913,”pa”:0.356522,”ph”:0.330435,”pu”:0.321739,”qa”:0.313043,”qu”:0.304348,”ra”:0.278261,”rh”:0.269565,”ri”:0.26087,”ro”:0.234783,”ru”:0.226087,”sa”:0.191304,”se”:0.182609,”sh”:0.173913,”ta”:0.147826,”th”:0.13913,”to”:0.130435,”tu”:0.121739,”ul”:0.113043,”va”:0.086957,”vo”:0.078261,”wa”:0.069565,”wi”:0.06087,”xa”:0.052174,”xe”:0.043478,”yu”:0.026087,”ze”:0.017391,”zi”:0.008696,”zu”:0.0},”a”:{“ba”:0.970874,”be”:0.951456,”de”:0.941748,”di”:0.932039,”ev”:0.92233,”go”:0.893204,”gu”:0.883495,”hd”:0.873786,”hr”:0.864078,”ie”:0.854369,”ig”:0.84466,”im”:0.834951,”

,”nameLengthInfo”:{“mean”:7.4273504273504276,”standardDeviation”:1.7261995176214626}}

This table can be serialized to prevent recomputing it each time we call the algorithm.

The algorithm works with sub strings of size X where a small X will provide more random results (less close to the original result) but a larger X will provide results more closely aligned with the sample file.

Results more closely aligned with the sample file better reflect the sample but face a higher risk of ending up as a pastiche of 2 existing names or in some cases being one of the sample’s name as is.

Here’s how the sub strings work:

If we have the following name in our sample file:

Gimli

Using a sub string length of 2 would add all these sub strings in our probability table:

G i

i m

m l

l i

While using a sub string length of 3 would add these sub strings:

G im

i ml

m li

And so on as we increase the length of the sub strings.

After we have counted all the possible occurrences of each sub string over the whole file we assign a probability to each one.

For example, if for our whole file we would have the following possible sub strings for G

G im

G lo

G an

Each of these would be assigned a probability of 33.3%.

Generate name length info from the input file

The generate names of a length representative of our input sample we simply count the  length of each name and derive a mean value and standard deviation. Using the mean and standard deviation will then easily allow us to draw a value from the normal distribution of word lengths.

Additionally it’s best to enforce a minimum word length. Even if our sample contains shorter names (2 or 3 letters long), from experience the algorithm doesn’t produce convincing results on these shorter lengths.

This is because it doesn’t differentiate sub strings for long and short names.

Generate a name with the name length and probability table

To generate a name we start by finding our desired name length using our name length info. Then we select our first character, the white space character.

We then generate a number between 0.0 and 1.0 (or 0 and 100) and using a prebuilt dictionary containing the probability table, find the next item.

Code

The code is also available on GitHub in a more readable format.

Sample and Examples

The larger the sample the better. Also the more thematically aligned the sample, the better. What I mean by thematically aligned is if you include the names of all Greek masculine mythological figures, you will get results that resemble the names of the Greek heroes and Gods.

For example:

Thedes
Kratla
Pourseus

On the other hand if you build your samples with names from the Lord of The Rings but include an equal part of Hobbits, Dwarf, Elven and Orcish names you will end up with a mishmash that does not make much sense.

Finally here are some results of the algorithm using the this sample file containing the names of some of the locations in the games Final Fantasy XI and Final Fantasy XIV:

Bastok SanDoria Windurst Jeuno Aragoneu Derfland Elshimo Fauregandi Gustaberg Kolshushu Kuzotz LiTelor Lumoria Movalpolos Norvallen Qufim Ronfaure Sarutabaruta Tavnazian TuLia Valdeaunia Vollbow Zulkheim Arrapago Halvung Oraguille Jeuno Rulude Selbina Mhaura Kazham Norg Rabao Attohwa Garlaige Meriphataud Sauromugue Beadeaux Rolanberry Pashhow Yuhtunga Beaucedine Ranguemont Dangruf Korroloka Gustaberg Palborough Waughroon Zeruhn Bibiki Purgonorgo Buburimu Onzozo Shakhrami Mhaura Tahrongi Altepa Boyahda RoMaeve ZiTah AlTaieu Movalpolos Batallia Davoi Eldieme Jugner Phanauet Delkfutt Bostaunieux Ghelsba Horlais Ranperre Yughott Balga Giddeus Horutoto Toraimarai Lufaise Misareaux Phomiuna Riverne Xarcabard Gusgen Valkurm Ordelle LaTheine Konschtat Arrapago Carteneau Thanalan Coerthas Noscea Matoya MorDhona Gridania Rhotano Uldah Limsa Lominsa Dravanian Ishgard Doma Sastasha Tamtara Halatali Haukke Qarn Aurum Amdapor Pharos Xelphatol Daniffen Aldenard Garlea Eorzea Vanadiel

Note that this sample is very small and not thematically consistent, still here are the results using a sub string length of 2:

Gazormon
Vasamaur
Ltemenolp
Zonaone
Ldausa
Zorvaie
Kugo
Limoruxeab
Raullaiat
Jelphorshi

A sub string length of 3:

Arzergid
Mhaure
Phowindugh
Rhonearlos
Tuto
Saltabaolo
Qangarle

And a sub string length of 5:

Arronfais
Sautotara
Tahimorut
Batahranperr
Movernguegan

I feel that the algorithm could still use some improvements but is still very satisfactory considering the bad quality of the sample file used.

Integrating DropzoneJS into an ASP.NET MVC site

Dropzone allows you to easily handle file upload via drag and drop.

Here is a simple tutorial on how to integrate DropzoneJS into an existing ASP.NET MVC application. The instructions on the dropzone page are easy to follow but I include here a version tailored for ASP.NET MVC.

First off, there is a NuGet package but I prefer not to use it so that I can include only the necessary files and choose where those files are located.

The first step is to obtain the JavaScript file and include it into your /Scripts folder.

dropzone1

Optionally you can also download and use the CSS file. Put this one in your /Content folder.

dropzone2

Once these files are in place, it’s time to create some bundles. In App_Start/BundleConfig.cs add the following bundles:

bundles.Add(new ScriptBundle("~/bundles/dropzone")
            .Include("~/Scripts/dropzone.js"));

bundles.Add(new StyleBundle("~/Content/dropzone-css")
            .Include("~/Content/dropzone.css"));

On the page you want to use dropzone on, add calls to the Styles and Scripts render methods and also a form element with the dropzone class. The class name is how dropzone locates the control so it can make the appropriate modifications.

Remember that you can’t nest form elements so if your page already contains another form your will need to make sure they aren’t nested.

Here is how our page.cshtml:

@Styles.Render("~/Content/dropzone-css")

<div class="jumbotron">
    <h1>dropzone test</h1>
</div>

<div class="row">
    @using (Html.BeginForm("FileUpload", "Home", 
                           FormMethod.Post,
                           new
                           {
                               @class = "dropzone",
                               id = "dropzone-form",
                           }))
    {
        <div class="fallback">
            <input name="file" type="file" multiple />
        </div>
    }
</div>

@section scripts {
    @Scripts.Render("~/bundles/dropzone")

    <script type="text/javascript">
        Dropzone.options.dropzoneForm = {
            paramName: "file",
            maxFilesize: 20,
            maxFiles: 4,
            acceptedFiles: "image/*",
            dictMaxFilesExceeded: "Custom max files msg",
        };
    </script>
}

The form contains an optional fallback element for clients who don’t support JavaScript. Also note in the scripts section, custom configuration options are declared.

You may notice I have set a maxFilesize. This value is in MB. It is necessary to change the maxRequestLength attribute in your web.config file to a value at least as big as maxFilesize, otherwise IIS will reject files under the maxFilesize limit.

Here is how to change your maxRequestLength:

<system.web>
    <compilation debug="true" targetFramework="4.5.2" />
    <!--maxRequestLength increased to 20 MB-->
    <httpRuntime targetFramework="4.5.2" 
                 maxRequestLength="20480" />
</system.web>

After we have updated our web.config we can now move on to the controller.

[HttpPost]
public ActionResult FileUpload(HttpPostedFileBase file)
{
    try
    {
        var memStream = new MemoryStream();
        file.InputStream.CopyTo(memStream);

        byte[] fileData = memStream.ToArray();

        //save file to database using fictitious repository
        var repo = new FictitiousRepository();
        repo.SaveFile(file.FileName, fileData);
    }
    catch (Exception exception)
    {
        return Json(new { success = false, 
                          response = exception.Message });
    }

    return Json(new { success = true, 
                      response = "File uploaded." });
}

When uploading multiple files at the same time, this method will get called as many times as there are files.

If you want to handle the returned JSON, you need to handle events from dropzone, in this case the complete event.

Here is how the page looks after having uploaded two pictures:

dropzone3

The competent programmer is fully aware…

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague. In the case of a well-known conversational programming language I have been told from various sides that as soon as a programming community is equipped with a terminal for it, a specific phenomenon occurs that even has a well-established name: it is called “the one-liners”. It takes one of two different forms: one programmer places a one-line program on the desk of another and either he proudly tells what it does and adds the question “Can you code this in less symbols?” —as if this were of any conceptual relevance!— or he just asks “Guess what it does!”. From this observation we must conclude that this language as a tool is an open invitation for clever tricks; and while exactly this may be the explanation for some of its appeal, viz. to those who like to show how clever they are, I am sorry, but I must regard this as one of the most damning things that can be said about a programming language. Another lesson we should have learned from the recent past is that the development of “richer” or “more powerful” programming languages was a mistake in the sense that these baroque monstrosities, these conglomerations of idiosyncrasies, are really unmanageable, both mechanically and mentally.

– Edsger Dijkstra, The Humble Programmer