fuglede.dk blog (English version)
http://fuglede.dk/en/blog/
Most recent blog entriesBias in the .NET random number generatorhttp://fuglede.dk/en/blog/bias-in-net-rng/http://fuglede.dk/en/blog/bias-in-net-rng/In this post we will see that when prompted to provide random positive integers, the default .NET Framework random number generator has a strong bias in its least significant bit.rnrnBackgroundrnAt my work, a good amount of time is spent doing simulation modeling: Analyses of computational algorithms whose underlying variables are governed by probability distributions, and which may be studied by running the systems with concrete samples when actual analysis in infeasible. In fact, even when analytical studies of such systems are possible, simulation tends to be faster to implement. Moreover, working in a Microsoft-oriented company, much time is spent together with the .NET Framework whose class library includes a pseudo-random number generator, which we will refer to as System.Random.rnrnFor such a generator to be useful in the context of simulation, it must have certain properties. In particular we require that it has no apparent bias. For example, if a generator can produce integers, one way to use it for simulating the flipping of a fair coin would be to take such an integer, and associate it to the events of heads/tails if the integer is odd/even respectively. In this case, if the generator is biased towards even integers, it is clearly inadequate for the task at hand.rnrnNow, one does not have to search long on StackOverflow to find claims that System.Random is not useful in the context of simulation, but what is harder to find are good reasons for why this would be the case; in particular, the Microsoft documentation of the generator makes no such disclaimers.rnrnHere, we will show that System.Random.Next(0, int.MaxValue) and System.Random.Next(int.MaxValue), which both give a "random" integer between \(0\) and \(2^{31}-2\) (both inclusive), have much greater probabilities of returning odd numbers than even numbers. This is the case for version 4.7 and earlier versions of the .NET Framework.rnrnThe algorithmrnThe source code for System.Random is available through Microsoft's Reference Source, allowing us to directly review the algorithm for potential issues. The algorithm for Random.Next(b), which gives an integer between \(0\) and \(b-1\) amounts to the following:rnrn1. Take a random integer between \(0\) and \(2^{31}-2\).rn2. Divide the integer by \(2^{31}-1\) to get a double-precision floating-point between \(0\) and \(1\).rn3. Multiply this floating-point with \(b\) to get a number between \(0\) and \(b\).rn4. Take the integer part of this new number to get an integer between \(0\) and \(b-1\).rnrnArguably the first point above is the most interesting one, but for the purposes of this post, we will simply assume that the integer random number generation is perfect, and consider only the implications of the floating-point conversions that follow. As it turns out, the integer generation itself has a number of issues, which I might cover in a follow-up post; for now, let me refer to this draft in which the algorithm, and some of its issues, are discussed in detail.rnrnFloating-points and rounding errorsrnLet us now focus on the case of \(b = 2^{31}-1\), and let \(m\) be the output of the first step above, that is, \(m \in \{0, \dots, 2^{31}-2\}\). Then, the number output by Random.Next(b) is x = (int)((m * (1.0/int.MaxValue)) * int.MaxValue), the result of multiplying \(m\) by the double-precision floating-point \(1/(2^{31}-1)\) (whose byte representation is 0x3E00000000200000), and afterwards by the integer \(2^{31}-1\) (represented as 0x7FFFFFFF) – both operations as floating-point operations – and taking the integer part of the result.rnrnThis would be entirely unproblematic if not for the fact that we risk incurring rounding errors in the process. Indeed, if \(f : \{0, \dots, 2^{31}-2\} \to \{0, \dots, 2^{31}-2\}\) denotes the composition of the multiplications, we find thatrn\[f(m) = \begin{cases} m-1, & m = 2^{n+22} + 2^{n+2}k + 2^n,\ n \in \{0, \dots, 8\},\ k \in \{0, \dots, 2^{21}-2^{20}-1\}, \\ m, & \text{otherwise.}\end{cases}\] I wish I had a nice analytical proof of this relation, but it came about only through experimentation and searching for patterns. To see that it holds, we simply check all values of \(m\); the following piece of C handles that in seconds:rn#include <stdio.h>rnrnint main()rn{rn int maxInt = 2147483647;rn double macIntRec = 1.0 / maxInt;rn int n = 1; // Index of most significant bitrn int modulus;rn int period;rn for (int m = 1; m < maxInt; m++)rn {rn // When m == 2^n, update criterion for f(m) == mrn if (m == 1 << n) {rn printf("m = 2^%d\n", n);rn modulus = 1 << n - 22;rn period = (1 << n - 20) - 1;rn n++;rn }rn int x = (int)(m * macIntRec * maxInt);rn // Test m % 2^(n-20) == 2^(n-22)rn if ((m & period) == modulus) x++;rn if (x != m) printf("Mistake at m: %d\n", m); // This never happensrn }rn return 0;rn}rnrnWith this in place, we see our source of bias: From the case \(n = 0\), there are \(2^{21}-2^{20}-1\) odd numbers that are mapped to even numbers after rounding, but from the cases \(n = 1, \dots, 8\), we get \(8 \cdot (2^{21}-2^{20}-1)\) even numbers that are mapped to odd numbers.rnrnIn other words, if \(m\) was taken uniformly at random, the probability that \(f(m)\) is odd is \(0.5034\), which, for the purposes of simulation, is a no-go.rnrnFor the meticulous, let us quickly note that \(\{0, \dots, 2^{31}-2\}\) contains more even numbers that odd numbers; this clearly makes no difference for the argument (and in fact only makes the situation worse).rnrnAn experimentrnThe above tells us that Random.Next(int.MaxValue) should output many more odd than even numbers. The argument relied on the integer random number generation actually working. Since this algorithm itself comes with bias, we should include in our discussion an actual .NET example, so let us come back to where we started and simulate coin flipping with C#:rnpublic static void Main()rn{rn var rng = new Random(0);rn var sampleSize = 1000000;rn var numbers = Enumerable.Repeat(0, sampleSize).Select(_ => rng.Next(0, int.MaxValue));rn var heads = numbers.Count(x => x % 2 == 1);rn var tails = sampleSize - heads;rn Console.WriteLine($"Heads: {heads}, tails: {tails}");rn // Output: Heads: 503291, tails: 496709rn}rnrnAs expected, odd numbers are overrepresented in the output, and the question becomes: How overrepresented? A result of 500000-500000 would probably look even more dubious, and a bit of care needs to be taken in interpreting ratios such as the above.rnrnA frequentist statistician will tell you that in order to properly rule out an outcome such as the above, one needs to consider how likely that outcome would be had we used a truly random generator. As such, we need to calculate the probability of obtaining \(496709\) tails from a million flips of a fair coin, or an even more extreme outcome. This probability is given by the cumulative distribution function of the binomial distribution; we calculate this using the Python library SciPy.rnIn [1]: import scipy.stats as strnrnIn [2]: 2*st.binom(1000000, 0.5).cdf(496709)rnOut[2]: 4.6722165389139607e-11rnrnThat is, were we to flip a million coins, the probability of seeing our observed result, or something more extreme, is \(0.0000000046\%\). Not super likely.rnrnimport numpy as nprnimport scipy.stats as strnimport matplotlib.pyplot as pltrnimport seaborn as snsrnsns.set()rnrnx = np.arange(4.95e5, 5.05e5)rnc = st.binom(1e6, 0.5).pmf(x)rnplt.ylabel('Probability density')rnplt.xlabel('Number of heads')rnplt.title('Outcomes of a million coin flips')rnrnax = plt.axes()rnbbox_props = dict(boxstyle="rarrow", fc=(0.8,0.9,0.9), ec=(0.5,0.5,0.9), lw=0.5)rnt = ax.text(503291, 0.00025, "System.Random", ha="center", va="center", rotation=-90,rn size=15, bbox=bbox_props)rnplt.gcf().subplots_adjust(left=0.15)rnplt.plot(x, c)rnrnrnFinal remarksrnThe main takeaway from this, besides a warning against using System.Random for anything that requires random numbers, should be that even if we had taken as our starting point something truly random, the fact that the innocent-looking type conversions wrecked chaos indicates how every part of random number generation needs to be based on theory. Citing Knuth, "random numbers should not be generated with a method chosen at random. Some theory should be used".rnrnIn the special case of System.Random we should note that one can avoid the floating-point conversions by invoking Random.Next() directly. As mentioned elsewhere, this method has other issues meaning that it can not exactly be taken to be based on theory.rnrnMicrosoft have been made aware of the various issues surrounding System.Random. My hope is that .NET Core will evolve to support proper random number generation by default.Sun, 27 Aug 2017 16:50:51 +0200Fun with deepdreamhttp://fuglede.dk/en/blog/fun-with-deepdream/http://fuglede.dk/en/blog/fun-with-deepdream/Towards the end of the month of June, Google released the Python source code for deepdream: A piece of software which allows for the visualisation of the potential of a collection of artificial neural networks set up for image recognition. I can't exactly claim that I dug into the technical details, but Google produced a blog post which explains in rough terms the capabilities of the software, and what is going on. Let me not attempt to reproduce that as I would not be able to do a better job than Google anyway.Thu, 30 Jul 2015 16:16:11 +0200Basic topology: The noteshttp://fuglede.dk/en/blog/basic-topology-the-notes/http://fuglede.dk/en/blog/basic-topology-the-notes/I recently completed a 40 hour course on "Basic topology". The course covered the fundamentals on topological and metric spaces, including properties such as connectedness, compactness, and separation. At the end, we were able to talk in detail about elementary algebraic topology and explicitly find the fundamental groups of some simple topological spaces.Tue, 23 Jun 2015 12:53:14 +0200Agar with friendshttp://fuglede.dk/en/blog/agar-with-friends/http://fuglede.dk/en/blog/agar-with-friends/This post describes how to set up a game with your friends in the online game agar.io. Note that this post is very likely to get outdated soon after the time of writing as the game is still under development.Sun, 21 Jun 2015 18:58:13 +0200Func KB-460 and Linuxhttp://fuglede.dk/en/blog/kb-460-and-linux/http://fuglede.dk/en/blog/kb-460-and-linux/This will be one of those posts that I wish I would have been able to find myself when I encountered the problems I did.Sun, 29 Mar 2015 10:55:25 +0200A new day, a new bloghttp://fuglede.dk/en/blog/a-new-day-a-new-blog/http://fuglede.dk/en/blog/a-new-day-a-new-blog/Like a slightly slow version of a leap year or the Winter olympics, as it does rougly every five years it has happened once again: I've decided to try blogging once more.Wed, 01 Oct 2014 23:49:10 +0200