A cover tree is a tree data structure used for the partitiong of metric spaces to speed up operations like nearest neighbor, -nearest neighbor, or range searches. In this blog post, I introduce cover trees, their uses, their properties, and I measure the effect of the dimension of the metric space on the run-time in an experiment with synthetic data.

# SuperLU vs Direct Substructuring

The eigenproblem solver in my master's thesis used SuperLU, a direct solver for the solution of systems of linear equations (SLE) . For the largest test problems, the eigensolver ran out of memory when decomposing the matrix which is why I replaced SuperLU with direct substructuring in an attempt to reduce memory consumption. For this blog post, I measured set-up time, solve time, and memory consumption of SuperLU and direct substructuring with real symmetric positive definite real-world matrices for SLEs with a variable number of right-hand sides, I will highlight that SuperLU was deployed with a suboptimal parameter choice, and why the memory consumption of the decomposition of is the wrong objective function when you want to avoid running out of memory.

# Another Advantage of Garbage Collection

Most literature about garbage collection contains a list of advantages of garbage collection. The lists of advantages known to me omit one advantage: certain concurrent algorithms require garbage collection.

I will demonstrate this point with an example in the programming language C, specifically the revision C11. Consider a singly-linked list where the pointer to the head of the list is concurrently read and written by multiple threads:

struct node { size_t value; struct node* p_next; }; typedef struct node node; _Atomic(node*) p_head = ATOMIC_VAR_INIT(NULL);

The singly-linked list is considered empty if `p_head`

is a null pointer. The thread T1 is only reading the list:

void* T1(void* args) { node* p = atomic_load(&p_head); // more computations // stop referencing the head of the list return NULL; }

The thread T2 removes the first element of the singly-linked list by trying to overwrite the pointer stored in `p_head`

:

void* T2(void* args) { node* p = NULL; node* p_next = NULL; node* p_expected = NULL; do { p = atomic_load(&p_head); if( !p ) break; p_next = p->p_next; p_expected = p; } while(!atomic_compare_exchange_strong(&p_head, &p_expected, p_next)); // ensure other threads stopped referencing p free(p); return NULL; }

T2 relies on compare-and-swap in line 16 to detect interference of other threads.

After successfully updating `p_head`

, the memory referenced by `p`

needs to be freed after all threads stopped referencing this memory and in general, this requires a garbage collector. Waiting does not help because the threads holding references might have been stopped by the operating system. Scanning the stack, the heap, and the other threads' CPU registers is not possible in many languages or not covered by the programming language standard and besides, such a scan is an integral part of any tracing garbage collector.

In the introduction I wrote that certain concurrent algorithms **require** garbage collection and more accurately, it should say: In the absence of special guarantees, certain concurrent algorithms require garbage collection. For example, if we can guarantee that threads hold their references to the singly-linked list only for a certain amount of time, then there is no need for garbage collection and this fact is used in the Linux kernel when using the read-copy-update mechanisms.

# Master's Thesis: Projection Methods for Generalized Eigenvalue Problems

My master's thesis deals with dense and sparse solvers for generalized eigenvalue problems (GEPs) with Hermitian positive semidefinite matrices. Key results are

- structure-preserving backward error bounds computable in linear time,
- the runtime of GSVD-based dense GEP solvers is within factor 5 of the fastest GEP solver with Netlib LAPACK in my tests,
- computing the GSVD directly is up to 20 times slower than the computation by means of QR factorizations and the CS decomposition with Netlib LAPACK in my tests,
- given a pair of matrices with 2x2 block structure, I show how to minimize eigenvalue perturbation by off-diagonal blocks with the aid of graph algorithms, and
- I propose a new multilevel eigensolver for sparse GEPs that is able to compute up to 1000 eigenpairs on a cluster node with two dual-core CPUs and 16 GB virtual memory limit for problems with up to 150,000 degrees of freedom in less than eleven hours.

The revised edition of the thesis with fixed typos is here (PDF), the source code is available here, and the abstract is below. In February, I already gave a talk on the preliminary thesis results; more details can be found in the corresponding blog post.

## Abstract

This thesis treats the numerical solution of generalized eigenvalue problems (GEPs) , where , are Hermitian positive semidefinite (HPSD). We discuss problem and solution properties, accuracy assessment of solutions, aspect of computations in finite precision, the connection to the finite element method (FEM), dense solvers, and projection methods for these GEPs. All results are directly applicable to real-world problems.

We present properties and origins of GEPs with HPSD matrices and briefly mention the FEM as a source of such problems.

With respect to accuracy assessment of solutions, we address quickly computable and structure-preserving backward error bounds and their corresponding condition numbers for GEPs with HPSD matrices. There is an abundance of literature on backward error measures possessing one of these features; the backward error in this thesis provides both.

In Chapter 3, we elaborate on dense solvers for GEPs with HPSD matrices. The standard solver reduces the GEP to a standard eigenvalue problem; it is fast but requires positive definite mass matrices and is only conditionally backward stable. The QZ algorithm for general GEPs is backward stable but it is also much slower and does not preserve any problem properties. We present two new backward stable and structure preserving solvers, one using deflation of infinite eigenvalues, the other one using the generalized singular value decomposition (GSVD). We analyze backward stability and computational complexity. In comparison to the QZ algorithm, both solvers are competitive with the standard solver in our tests. Finally, we propose a new solver combining the speed of deflation with the ability of GSVD-based solvers to handle singular matrix pencils.

Finally, we consider black-box solvers based on projection methods to compute the eigenpairs with the smallest eigenvalues of large, sparse GEPs with Hermitian positive definite matrices (HPD). After reviewing common methods for spectral approximation, we briefly mention ways to improve numerical stability. We discuss the automated multilevel substructuring method (AMLS) before analyzing the impact of off-diagonal blocks in block matrices on eigenvalues. We use the results of this thesis and insights in recent papers to propose a new divide-and-conquer eigensolver and to suggest a change that makes AMLS more robust. We test the divide-and-conquer eigensolver on sparse structural engineering matrices with 10,000 to 150,000 degrees of freedom.

2010 Mathematics Subject Classification. 65F15, 65F50, 65Y04, 65Y20.

**Edit**: Revised master's thesis from April 2016 (PDF)

# A Fast, Static WordPress Blog

Until recently, my blog felt slow although it contained only static content. In this post, I describe how I decreased loading times and the size of this website by

- using WordPress as a static website generator,
- not loading unused scripts and fonts, and
- employing compression and client-side caches.

According to WebPagetest, a Firefox client in Frankurt with a DSL connection and an empty cache needed to download 666 kB (28 requests) and had to wait approximately 7.7 seconds before being able to view my frontpage from April 4. With the static website, the same client has to wait about 3.1 seconds and transfer 165 kB (18 requests). As a side-effect, the website offers considerably less attack surface now and user privacy was improved.

# Code at gitlab.com

Finally, I managed to upload code to gitlab.com:

https://gitlab.com/christoph-conrads/christoph-conrads.name

The repository contains:

- The WordPress theme of this website, a child theme of "Twenty Fourteen",
- measurement data and scripts of the article Performance and Accuracy of xPTEQR, and
- the Matlab code for the article Spectral Norm Bounds for Hermitian Matrices.

Many commits were modified on April 12; I retrofitted all files of my WordPress theme with license headers in order to avoid licensing issues.

# Advice: Implementing a Solver using LAPACK

For my master's thesis I implemented multiple solvers for structured generalized eigenvalue problems using LAPACK. In this post, I will briefly discuss a method to simplify the memory management and ways to catch programming errors as early as possible when implementing a solver for linear algebra problems that uses LAPACK. The advice in this post only supplements good programming practices like using version control systems and automated tests.

# Type Coercions and Floating-Point Types

Consider a language where we only allow automatic, implicit type conversions (coercion) among numeric types if every value of the source type can be represented as a value of the target type, i.e., there is no truncation and no round-off. Let us call this kind of coercion *value-preserving coercion*. With such a strict coercion rule, an unsigned integer with four bytes can be coerced into a signed integer with eight bytes but the compiler (interpreter) will not coerce signed integers to unsigned integers. In this post, we highlight a reason why coercions from single-precision to double-precision floating-point types may be undesirable although they are value preserving.

# Building NumPy and SciPy with Intel Compilers and Intel MKL

NumPy and SciPy rely on BLAS and LAPACK for basic linear algebra functionality like matrix-vector multiplication, linear system solves, or routines for eigenvalue computation. The Intel Math Kernel Library (MKL) is a mathematics library providing amongst other things fast and multithreaded implementations of BLAS and LAPACK. In this blog post, I describe how to compile NumPy and SciPy with the Intel compilers using Intel MKL on Linux. Note NumPy and SciPy can be linked to MKL without the Intel compilers by providing the proper linker options to, e.g., GCC, and I will briefly explain this as well.

Continue reading Building NumPy and SciPy with Intel Compilers and Intel MKL

# Talk: Master's Thesis

Yesterday I gave a talk on the preliminary results of my master's thesis on *Projection Methods for Generalized Eigenvalue Problems. *Below is the announcement for the talk; it mentions only complex (Hermitian) matrices but all results are immediately applicable to real matrices. The slides are here (PDF).

## Projection Methods for Generalized Eigenvalue Problems

In this talk, we will discuss backward errors, dense solvers, and a new projection method for generalized eigenvalue problems (GEPs) with Hermitian positive semidefinite (HPSD) matrices.

We will present properties and origins of such GEPs. We will also address quickly computable and structure preserving backward error bounds for these kinds of GEPs. There is an abundance of literature on backward error measures possessing one of these features but only recently, the author came across a backward error providing both.

We will elaborate on dense solvers for GEPs with HPSD matrices. The standard solver for GEPs with Hermitian matrices is fast but requires positive definite mass matrices and is only conditionally backward stable; the QZ algorithm for general GEPs is backward stable but it is also magnitudes slower and does not preserve any problem properties. In the talk, we will present two new backward stable and structure preserving solvers, one using deflation, the other one using the generalized singular value decomposition (GSVD). In comparison to the QZ algorithm, both solvers are competitive with the standard solver in our tests.

Finally, we will touch on a new solver for large, sparse GEPs.