Why I Chose the Mozilla Public License 2.0

Selecting a software license requires a conscious effort  because of the various rights and obligations that come with it. For me, the most important aspects of a license were

  • using an existing open source license,
  • commercial use clauses,
  • patent protection,
  • the obligation to disclose changes,
  • the ability to revoke the license as the sole copyright holder, and
  • compatibility with BSD-licensed code.

In this blog post I elaborate on why I chose the Mozilla Public License 2.0 for my eigensolver DCGeig based on these features.

Continue reading Why I Chose the Mozilla Public License 2.0

The Discretized Laplace Operator on Hyperrectangles with Zero Dirichlet Boundary Conditions

In this blog post, I present stiffness and mass matrix as well as eigenvalues and eigenvectors of the Laplace operator (Laplacian) on domains (0, \ell), (0, \ell_1) \times (0, \ell_2), and so on (hyperrectangles) with zero Dirichlet boundary conditions discretized with the finite difference method (FDM) and the finite element method (FEM) on equidistant grids. For the FDM discretization, we use the central differences scheme with the standard five-point stencil in 2D. For the FEM, the ansatz functions are the hat functions. The matrices, standard eigenvalue problems A v = \sigma v, and generalized eigenvalue problems K w = \tau M w arising from the discretization lend themselves for test problems in numerical linear algebra because they are well-conditioned, not diagonal, and the matrix dimension can be increased arbitrarily.

Python code generating the matrices and their eigenpairs can be found in my git repository discrete-laplacian.

Continue reading The Discretized Laplace Operator on Hyperrectangles with Zero Dirichlet Boundary Conditions

CMake: "Errors occurred during the last pass"

You executed

ccmake -G ninja /path/to/source-code

on the command line and after pressing

[c] to configure

ccmake presents you an empty window with the message

Errors occurred during the last pass

in the status bar. The cause of this problem is the misspelled generator name on the ccmake command line (argument -G): the generator is called Ninja with capital N as the first letter.

SuperLU vs Direct Substructuring

The eigenproblem solver in my master's thesis used SuperLU, a direct solver for the solution of systems of linear equations (SLE) Ax = b. For the largest test problems, the eigensolver ran out of memory when decomposing the matrix A which is why I replaced SuperLU with direct substructuring in an attempt to reduce memory consumption. For this blog post, I measured set-up time, solve time, and memory consumption of SuperLU and direct substructuring with real symmetric positive definite real-world matrices for SLEs with a variable number of right-hand sides, I will highlight that SuperLU was deployed with a suboptimal parameter choice, and why the memory consumption of the decomposition of A is the wrong objective function when you want to avoid running out of memory.

Continue reading SuperLU vs Direct Substructuring

Another Advantage of Garbage Collection

Most literature about garbage collection contains a list of advantages of garbage collection. The lists of advantages known to me omit one advantage: certain concurrent algorithms require garbage collection.

I will demonstrate this point with an example in the programming language C, specifically the revision C11. Consider a singly-linked list where the pointer to the head of the list is concurrently read and written by multiple threads:

struct node
{
    size_t value;
    struct node* p_next;
};
typedef struct node node;

_Atomic(node*) p_head = ATOMIC_VAR_INIT(NULL);

The singly-linked list is considered empty if p_head is a null pointer. The thread T1 is only reading the list:

void* T1(void* args)
{
    node* p = atomic_load(&p_head);

    // more computations
    // stop referencing the head of the list

    return NULL;
}

The thread T2 removes the first element of the singly-linked list by trying to overwrite the pointer stored in p_head:

void* T2(void* args)
{
    node* p = NULL;
    node* p_next = NULL;
    node* p_expected = NULL;

    do
    {
        p = atomic_load(&p_head);

        if( !p )
            break;

        p_next = p->p_next;
        p_expected = p;
    } while(!atomic_compare_exchange_strong(&p_head, &p_expected, p_next));
    // ensure other threads stopped referencing p
    free(p);

    return NULL;
}

T2 relies on compare-and-swap in line 16 to detect interference of other threads.

After successfully updating p_head, the memory referenced by p needs to be freed after all threads stopped referencing this memory and in general, this requires a garbage collector. Waiting does not help because the threads holding references might have been stopped by the operating system. Scanning the stack, the heap, and the other threads' CPU registers is not possible in many languages or not covered by the programming language standard and besides, such a scan is an integral part of any tracing garbage collector.

In the introduction I wrote that certain concurrent algorithms require garbage collection and more accurately, it should say: In the absence of special guarantees, certain concurrent algorithms require garbage collection. For example, if we can guarantee that threads hold their references to the singly-linked list only for a certain amount of time, then there is no need for garbage collection and this fact is used in the Linux kernel when using the read-copy-update mechanisms.

Master's Thesis: Projection Methods for Generalized Eigenvalue Problems

My master's thesis deals with dense and sparse solvers for generalized eigenvalue problems (GEPs) with Hermitian positive semidefinite matrices. Key results are

  • structure-preserving backward error bounds computable in linear time,
  • the runtime of GSVD-based dense GEP solvers is within factor 5 of the fastest GEP solver with Netlib LAPACK in my tests,
  • computing the GSVD directly is up to 20 times slower than the computation by means of QR factorizations and the CS decomposition with Netlib LAPACK in my tests,
  • given a pair of matrices with 2x2 block structure, I show how to minimize eigenvalue perturbation by off-diagonal blocks with the aid of graph algorithms, and
  • I propose a new multilevel eigensolver for sparse GEPs that is able to compute up to 1000 eigenpairs on a cluster node with two dual-core CPUs and 16 GB virtual memory limit for problems with up to 150,000 degrees of freedom in less than eleven hours.

The revised edition of the thesis with fixed typos is here (PDF), the source code is available here, and the abstract is below. In February, I already gave a talk on the preliminary thesis results; more details can be found in the corresponding blog post.

Abstract

This thesis treats the numerical solution of generalized eigenvalue problems (GEPs) Kx = \lambda Mx, where K, M are Hermitian positive semidefinite (HPSD). We discuss problem and solution properties, accuracy assessment of solutions, aspect of computations in finite precision, the connection to the finite element method (FEM), dense solvers, and projection methods for these GEPs. All results are directly applicable to real-world problems.

We present properties and origins of GEPs with HPSD matrices and briefly mention the FEM as a source of such problems.

With respect to accuracy assessment of solutions, we address quickly computable and structure-preserving backward error bounds and their corresponding condition numbers for GEPs with HPSD matrices. There is an abundance of literature on backward error measures possessing one of these features; the backward error in this thesis provides both.

In Chapter 3, we elaborate on dense solvers for GEPs with HPSD matrices. The standard solver reduces the GEP to a standard eigenvalue problem; it is fast but requires positive definite mass matrices and is only conditionally backward stable. The QZ algorithm for general GEPs is backward stable but it is also much slower and does not preserve any problem properties. We present two new backward stable and structure preserving solvers, one using deflation of infinite eigenvalues, the other one using the generalized singular value decomposition (GSVD). We analyze backward stability and computational complexity. In comparison to the QZ algorithm, both solvers are competitive with the standard solver in our tests. Finally, we propose a new solver combining the speed of deflation with the ability of GSVD-based solvers to handle singular matrix pencils.

Finally, we consider black-box solvers based on projection methods to compute the eigenpairs with the smallest eigenvalues of large, sparse GEPs with Hermitian positive definite matrices (HPD). After reviewing common methods for spectral approximation, we briefly mention ways to improve numerical stability. We discuss the automated multilevel substructuring method (AMLS) before analyzing the impact of off-diagonal blocks in block matrices on eigenvalues. We use the results of this thesis and insights in recent papers to propose a new divide-and-conquer eigensolver and to suggest a change that makes AMLS more robust. We test the divide-and-conquer eigensolver on sparse structural engineering matrices with 10,000 to 150,000 degrees of freedom.

2010 Mathematics Subject Classification. 65F15, 65F50, 65Y04, 65Y20.

Edit: Revised master's thesis from April 2016 (PDF)

A Fast, Static WordPress Blog

Until recently, my blog felt slow although it contained only static content. In this post, I describe how I decreased loading times and the size of this website by

  • using WordPress as a static website generator,
  • not loading unused scripts and fonts, and
  • employing compression and client-side caches.

According to WebPagetest, a Firefox client in Frankurt with a DSL connection and an empty cache needed to download 666 kB (28 requests) and had to wait approximately 7.7 seconds before being able to view my frontpage from April 4. With the static website, the same client has to wait about 3.1 seconds and transfer 165 kB (18 requests). As a side-effect, the website offers considerably less attack surface now and user privacy was improved.

Continue reading A Fast, Static WordPress Blog

Code at gitlab.com

Finally, I managed to upload code to gitlab.com:

https://gitlab.com/christoph-conrads/christoph-conrads.name

The repository contains:

Many commits were modified on April 12; I retrofitted all files of my WordPress theme with license headers in order to avoid licensing issues.

Advice: Implementing a Solver using LAPACK

For my master's thesis I implemented multiple solvers for structured generalized eigenvalue problems using LAPACK. In this post, I will briefly discuss a method to simplify the memory management and ways to catch programming errors as early as possible when implementing a solver for linear algebra problems that uses LAPACK. The advice in this post only supplements good programming practices like using version control systems and automated tests.

Continue reading Advice: Implementing a Solver using LAPACK