Sparse Matrix Ordering Example

Graph theory was identified as a powerful tool for sparse matrix computation when Seymour Parter used undirected graphs to model symmetric Gaussian elimination more than 30 years ago [28]. Graphs can be used to model symmetric matrices, factorizations and algorithms on non-symmetric matrices, such as fill paths in Gaussian elimination, strongly connected components in matrix irreducibility, bipartite matching, and alternating paths in linear dependence and structural singularity. Not only do graphs make it easier to understand and analyze sparse matrix algorithms, but they broaden the area of manipulating sparse matrices using existing graph algorithms and techniques [13]. In this section, we are going to illustrate how to use BGL in sparse matrix computation such as ordering algorithms. Before we go further into the sparse matrix algorithms, let us take a step back and review a few things.

Graphs and Sparse Matrices

A graph is fundamentally a way to represent a binary relation between objects. The nonzero pattern of a sparse matrix also describes a binary relation between unknowns. Therefore the nonzero pattern of a sparse matrix of a linear system can be modeled with a graph G(V,E), whose n vertices in V represent the n unknowns, and where there is an edge from vertex i to vertex j when A_ij is nonzero. Thus, when a matrix has a symmetric nonzero pattern, the corresponding graph is undirected.

Sparse Matrix Ordering Algorithms

The process for solving a sparse symmetric positive definite linear system, Ax=b, can be divided into four stages as follows:

A widely used but rather simple ordering algorithm is a variant of the Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm. This algorithm can be used as a preordering method to improve ordering in more sophisticated methods such as minimum degree algorithms [21].

Reverse Cuthill-McKee Ordering Algorithm

namespace boost {
  template <class Graph, class Vertex, class Color, class Degree>
  Vertex 
  pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc,
                         Color color, Degree degree)
  {
    typename property_traits<Color>::value_type c = get(color, u);

    rcm_queue<Vertex, Degree> Q(degree);
    
    typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end;
    for (tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
      put(color, *ui, white(c));
    breadth_first_search(G, u, Q, bfs_visitor<>(), color);

    ecc = Q.eccentricity(); 
    return Q.spouse();
  }
  
  template <class Graph, class Vertex, class Color, class Degree> 
  Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) {
    int eccen_r, eccen_x;
    Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d);
    Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d);

    while (eccen_x > eccen_r) {
      r = x;
      eccen_r = eccen_x;
      x = y;
      y = pseudo_peripheral_pair(G, x, eccen_x, c, d);
    }
    return x;
  }
} // namespace boost

Figure 1: The BGL implementation of find_starting_node. The key part pseudo_peripheral_pair is BFS with a custom queue type virtually.

  template < class Graph, class Vertex, class OutputIterator, 
             class Color, class Degree >
  inline void 
  cuthill_mckee_ordering(Graph& G, 
                         Vertex s,
                         OutputIterator inverse_permutation, 
                         Color color, Degree degree)
  {
    typedef typename property_traits<Degree>::value_type DS;
    typename property_traits<Color>::value_type c = get(color, s);
    typedef indirect_cmp<Degree, std::greater<DS> > Compare;
    Compare comp(degree);
    fenced_priority_queue<Vertex, Compare > Q(comp);

    typedef cuthill_mckee_visitor<OutputIterator>  CMVisitor;
    CMVisitor cm_visitor(inverse_permutation);
    
    typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end;
    for (tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui)
      put(color, *ui, white(c));
    breadth_first_search(G, s, Q, cm_visitor, color);
  }

Figure 2: The BGL implementation of Cuthill-McKee algoithm.

Minimum Degree Ordering Algorithm

The pattern of another category of high-quality ordering algorithms in wide use is based on a greedy approach such that the ordering is chosen to minimize some quantity at each step of a simulated -step symmetric Gaussian elimination process. The algorithms using such an approach are typically distinguished by their greedy minimization criteria [34].

In graph terms, the basic ordering process used by most greedy algorithms is as follows:

One of the most important examples of such an algorithm is the Minimum Degree algorithm. At each step the minimum degree algorithm chooses the vertex with minimum degree in the corresponding graph as v^k. A number of enhancements to the basic minimum degree algorithm have been developed, such as the use of a quotient graph representation, mass elimination, incomplete degree update, multiple elimination, and external degree. See [35] for a historical survey of the minimum degree algorithm.

The BGL implementation of the Minimum Degree algorithm closely follows the algorithmic descriptions of the one in [21]. The implementation presently includes the enhancements for mass elimination, incomplete degree update, multiple elimination, and external degree.

In particular, we create a graph representation to improve the performance of the algorithm. It is based on a templated ``vector of vectors.'' The vector container used is an adaptor class built on top the STL std::vector class. Particular characteristics of this adaptor class include the following:

Note that this representation is similar to that used in Liu's implementation, with some important differences due to dynamic memory allocation. With the dynamic memory allocation we do not need to over-write portions of the graph that have been eliminated, allowing for a more efficient graph traversal. More importantly, information about the elimination graph is preserved allowing for trivial symbolic factorization. Since symbolic factorization can be an expensive part of the entire solution process, improving its performance can result in significant computational savings.

The overhead of dynamic memory allocation could conceivably compromise performance in some cases. However, in practice, memory allocation overhead does not contribute significantly to run-time for our implementation as shown in [] because it is not done very often and the cost gets amortized.

Proper Self-Avoiding Walk

The finite element method (FEM) is a flexible and attractive numerical approach for solving partial differential equations since it is straightforward to handle geometrically complicated domains. However, unstructured meshes generated by FEM does not provide an obvious labeling (numbering) of the unknowns while it is vital to have it for matrix-vector notation of the underlying algebraic equations. Special numbering techniques have been developed to optimize memory usage and locality of such algorithms. One novel technique is the self-avoiding walk [].

If we think a mesh is a graph, a self-avoiding walk(SAW) over an arbitrary unstructured two-dimensional mesh is an enumeration of all the triangles of the mesh such that two successive triangles shares an edge or a vertex. A proper SAW is a SAW where jumping twice over the same vertex is forbidden for three consecutive triangles in the walk. it can be used to improve parallel efficiency of several irregular algorithms, in particular issues related to reducing runtime memory access (improving locality) and interprocess communications. The reference [] has proved the existence of A PSAW for any arbitrary triangular mesh by extending an existing PSAW over a small piece of mesh to a larger part. The proof effectively provides a set of rules to construct a PSAW.

The algorithm family of constructing a PSAW on a mesh is to start from any random triangle of the mesh, choose new triangles sharing an edge with the current sub-mesh and extend the existing partial PSAW over the new triangles.

Let us define a dual graph of a mesh. Let a triangle in the mesh be a vertex and two triangles sharing an edge in the mesh means there is an edge between two vertices in the dual graph. By using a dual graph of a mesh, one way to implement the algorithm family of constructing a PSAW is to reuse BGL algorithms such as BFS and DFS with a customized visitor to provide operations during traversal. The customized visitor has the function tree_edge() which is to extend partial PSAW in case by case and function start_vertex() which is to set up the PSAW for the starting vertex.