Tag Archives: programming

Faster RANLUX Pseudo-Random Number Generators

RANLUX denotes a class of high-quality pseudo-random number generators (PRNGs) with long periods, solid theoretical foundations, and uses in computational physics. These generators are built on top of add-with-carry and subtract-with-borrow PRNGs which allow customization through three user-provided parameters yet there are only two variants in use today working internally with 24-bit and 48-bit arithmetic; this comes at a cost because modern computers do not possess instructions for 24-bit or 48-bit arithmetic. In this blog post I will present the theory behind RANLUX PRNGs, a list of possible parameters for 16-, 32-, and 64-bit RANLUX generators, criteria for selecting generators as well as benchmarks comparing the most competitive RANLUX variants with the RANLUX generators in the C++11 standard library, the Mersenne Twister, and xoshiro128+.

Some of the new RANLUX flavors are easily implemented in C++11 (sometimes in only two lines of code), they are  faster than the existing RANLUX flavors, they retain all desirable properties of a RANLUX generator, and they pass all DieHarder and TestU01 BigCrush PRNG tests.

The programs used to compute the RANLUX parameters are called ranlux-tools. They are free software under the terms of the Mozilla Public License 2.0 and can be found online:


Continue reading Faster RANLUX Pseudo-Random Number Generators

Rademacher Floating Point Library Released

The Rademacher Floating Point Library provides a C++11 class computing random uniformly distributed floating-point values in a given half-open interval [a,b). In comparison to every other existing software, this library

  • computes values that are actually uniformly distributed,
  • draws randomly from all floating-point values in range,
  • avoids round-off errors causing out-of-range values,
  • uses only integer arithmetic, and
  • works with built-in C++ float types as well with custom floating-point types.

The Rademacher FPL is licensed under the Mozilla Public License 2.0 and therefore free software ("free" as in "free beer" as well as "free" as in "freedom"). The source code can be found online:


Another Advantage of Garbage Collection

Most literature about garbage collection contains a list of advantages of garbage collection. The lists of advantages known to me omit one advantage: certain concurrent algorithms require garbage collection.

I will demonstrate this point with an example in the programming language C, specifically the revision C11. Consider a singly-linked list where the pointer to the head of the list is concurrently read and written by multiple threads:

struct node
    size_t value;
    struct node* p_next;
typedef struct node node;

_Atomic(node*) p_head = ATOMIC_VAR_INIT(NULL);

The singly-linked list is considered empty if p_head is a null pointer. The thread T1 is only reading the list:

void* T1(void* args)
    node* p = atomic_load(&p_head);

    // more computations
    // stop referencing the head of the list

    return NULL;

The thread T2 removes the first element of the singly-linked list by trying to overwrite the pointer stored in p_head:

void* T2(void* args)
    node* p = NULL;
    node* p_next = NULL;
    node* p_expected = NULL;

        p = atomic_load(&p_head);

        if( !p )

        p_next = p->p_next;
        p_expected = p;
    } while(!atomic_compare_exchange_strong(&p_head, &p_expected, p_next));
    // ensure other threads stopped referencing p

    return NULL;

T2 relies on compare-and-swap in line 16 to detect interference of other threads.

After successfully updating p_head, the memory referenced by p needs to be freed after all threads stopped referencing this memory and in general, this requires a garbage collector. Waiting does not help because the threads holding references might have been stopped by the operating system. Scanning the stack, the heap, and the other threads' CPU registers is not possible in many languages or not covered by the programming language standard and besides, such a scan is an integral part of any tracing garbage collector.

In the introduction I wrote that certain concurrent algorithms require garbage collection and more accurately, it should say: In the absence of special guarantees, certain concurrent algorithms require garbage collection. For example, if we can guarantee that threads hold their references to the singly-linked list only for a certain amount of time, then there is no need for garbage collection and this fact is used in the Linux kernel when using the read-copy-update mechanisms.

Master's Thesis: Projection Methods for Generalized Eigenvalue Problems

My master's thesis deals with dense and sparse solvers for generalized eigenvalue problems (GEPs) with Hermitian positive semidefinite matrices. Key results are

  • structure-preserving backward error bounds computable in linear time,
  • the runtime of GSVD-based dense GEP solvers is within factor 5 of the fastest GEP solver with Netlib LAPACK in my tests,
  • computing the GSVD directly is up to 20 times slower than the computation by means of QR factorizations and the CS decomposition with Netlib LAPACK in my tests,
  • given a pair of matrices with 2x2 block structure, I show how to minimize eigenvalue perturbation by off-diagonal blocks with the aid of graph algorithms, and
  • I propose a new multilevel eigensolver for sparse GEPs that is able to compute up to 1000 eigenpairs on a cluster node with two dual-core CPUs and 16 GB virtual memory limit for problems with up to 150,000 degrees of freedom in less than eleven hours.

The revised edition of the thesis with fixed typos is here (PDF), the source code is available here, and the abstract is below. In February, I already gave a talk on the preliminary thesis results; more details can be found in the corresponding blog post.


This thesis treats the numerical solution of generalized eigenvalue problems (GEPs) Kx = \lambda Mx, where K, M are Hermitian positive semidefinite (HPSD). We discuss problem and solution properties, accuracy assessment of solutions, aspect of computations in finite precision, the connection to the finite element method (FEM), dense solvers, and projection methods for these GEPs. All results are directly applicable to real-world problems.

We present properties and origins of GEPs with HPSD matrices and briefly mention the FEM as a source of such problems.

With respect to accuracy assessment of solutions, we address quickly computable and structure-preserving backward error bounds and their corresponding condition numbers for GEPs with HPSD matrices. There is an abundance of literature on backward error measures possessing one of these features; the backward error in this thesis provides both.

In Chapter 3, we elaborate on dense solvers for GEPs with HPSD matrices. The standard solver reduces the GEP to a standard eigenvalue problem; it is fast but requires positive definite mass matrices and is only conditionally backward stable. The QZ algorithm for general GEPs is backward stable but it is also much slower and does not preserve any problem properties. We present two new backward stable and structure preserving solvers, one using deflation of infinite eigenvalues, the other one using the generalized singular value decomposition (GSVD). We analyze backward stability and computational complexity. In comparison to the QZ algorithm, both solvers are competitive with the standard solver in our tests. Finally, we propose a new solver combining the speed of deflation with the ability of GSVD-based solvers to handle singular matrix pencils.

Finally, we consider black-box solvers based on projection methods to compute the eigenpairs with the smallest eigenvalues of large, sparse GEPs with Hermitian positive definite matrices (HPD). After reviewing common methods for spectral approximation, we briefly mention ways to improve numerical stability. We discuss the automated multilevel substructuring method (AMLS) before analyzing the impact of off-diagonal blocks in block matrices on eigenvalues. We use the results of this thesis and insights in recent papers to propose a new divide-and-conquer eigensolver and to suggest a change that makes AMLS more robust. We test the divide-and-conquer eigensolver on sparse structural engineering matrices with 10,000 to 150,000 degrees of freedom.

2010 Mathematics Subject Classification. 65F15, 65F50, 65Y04, 65Y20.

Edit: Revised master's thesis from April 2016 (PDF)

Advice: Implementing a Solver using LAPACK

For my master's thesis I implemented multiple solvers for structured generalized eigenvalue problems using LAPACK. In this post, I will briefly discuss a method to simplify the memory management and ways to catch programming errors as early as possible when implementing a solver for linear algebra problems that uses LAPACK. The advice in this post only supplements good programming practices like using version control systems and automated tests.

Continue reading Advice: Implementing a Solver using LAPACK

Type Coercions and Floating-Point Types

Consider a language where we only allow automatic, implicit type conversions (coercion) among numeric types if every value of the source type can be represented as a value of the target type, i.e., there is no truncation and no round-off. Let us call this kind of coercion value-preserving coercion. With such a strict coercion rule, an unsigned integer with four bytes can be coerced into a signed integer with eight bytes but the compiler (interpreter) will not coerce signed integers to unsigned integers. In this post, we highlight a reason why coercions from single-precision to double-precision floating-point types may be undesirable although they are value preserving.

Continue reading Type Coercions and Floating-Point Types

Matlab: Existing Files and mkdir

In Matlab, a directory called foo can be created by calling mkdir('foo'). If mkdir is successful, then the function returns logical 1; otherwise it returns logical 0. Now consider the case where Matlab is run on Linux and there is already a file called foo in the current directory. Linux does not permit the existence of a file and a directory of the same name in the same folder so the call to mkdir should fail. Instead, mkdir will leave the file intact, return logical 1, and print the misleading warning "Directory already exists". That is, there is no directory called foo after this supposedly successful function call.

The peculiarity of this behavior has been reported to Mathworks in September 2015. The example above was tested with Matlab R2015b and R2014a on Linux.

Implicit Type Conversions Gone Wrong, C++ Edition

Consider the following setup:

#include <cstddef>
using namespace std;
// ...
size_t n = 100;
size_t* array = new size_t[n];

The goal is to set every element of the array to the fixed value x.

In C++, an idiomatic way to solve this problem are the functions std::fill (link to documentation) and std::fill_n (link to documentation) from the STL header algorithm. In addition to the fixed value x, std::fill requires a pair of output iterators specifying a half-open range as its arguments whereas the additional argument to std::fill_n is an output iterator pointing to the first element and the length of the container. That is, we could assign to the elements of the array above as follows:

#include <cstddef>
#include <algorithm>
using namespace std;
// ...
size_t n = 100;
size_t* array = new size_t[n];

fill(array, array+n, x);
fill_n(array, n, x);

What made me write this blog post is the following line of code:

fill_n(array, array+n, x);

Here, I accidentally used std::fill_n in place of std::fill but the code still compiles because of the automatic, implicit type conversion (type coercion) from std::size_t* to std::size_t. With warnings enabled (-Wextra -Wall -std=c++11 -pedantic) neither g++ 4.8.5 nor clang++ 3.5 warn about this line and yet this piece of code causes a segmentation fault on my computer whenever it is executed.