Special Topics in Linear Algebra

Return to Courses

Special Topics in Linear Algebra

Special Topics in Linear Algebra was a course both in fundamental linear algebraic theory as well as in the applications of linear algebra for solving a wide range of problems

Select course topics can be found below:

Theory:

Field/Vector Space Characterization
Rank-Nullity Theorem
Isomorphism
Change of Base Matrices
Cauchy-Schwarz / Triangle Inequality
Spectral Theorem
Power Vectors / Jordan Canonical Form
Matrix Exponentials
Kronecker Products

Applications:

Gram-Schmidt Orthogonalization
Polynomial Regression
Spectral Graph Theory for Clustering
Courant-Fischer Theorem
Graph Matching (Sylvester Equation)
Rayleigh Quotient (estimating eigenvalues)
Gershgorin Circle Theorem
Matrix Compression
Cauchy Interlacing Theorem

Another requirement of the course was to write a technical paper on a topic related to a course. I chose to write a paper on random projections. The paper covered the theory behind random projections as well as applications, including the application of random projections to kernelized k-means as a method to offset the increased time-complexity cost of kernelizing k-means. This is a method that can be applied to any kernel method, and seems to have surprisingly few published papers dedicated to it and (to my knowledge) scarce application in real-world scenarios.

The paper was code-intensive, and saw me writing over 4,500 lines of code. The code is very flexible and, among other things, allows for:

The sampling of points of any number of N-spheres in a space with any dimension, the N-spheres being centered at any specified point within the space and with any radius.
Similar sampling but with with multivariate Gaussians.
The automatic processing of k-means data into useful partitions, with the option of storing accuracy data plots.
The storing of metadata when doing a grid-search for the purposes of easy runtime and accuracy comparison based on grid parameters (e.g. type of random projection, kernelized boolean, kernel variance, projection dimension, etc.).
Useful file/data handling to automatically generate PNGs, FIG, and MP4 (in the case of 3D data) files, automatically sorted and named according to the data set used, the random projection method used, and various method-specific parameters (e.g. N-sphere dimension, kernel variance variance etc, kernel type, etc.)

More information on each section of the paper can be found below, as well as a PDF of the paper at the bottom of the page for in-window viewing.

Introduction

Theory

An overview of the theoretical foundations of random projections, various constructive realizations of random projections, and examples & experimental verification are found in this section

After introductory remarks, my final paper presents the mathematical theory behind random projections. In particular, I introduce the Johnson-Lindenstrauss (JL) Lemma, which provides an upper bound on dimension needed to project data down within some margin of error. I go over the formal statement (seen below), an interpretation of the statement, and an exploration of what makes this result so important.

Next, noting that the JL Lemma is not constructive, I introduce the general process of dimensionality reduction with random projections, and present three methods of random projection.

I go on to explore these projection methods, and how their accuracy depends on the degree to which the dimensionality of the data is reduced and on allowed error

To conclude this part of my paper, I evaluate random projections as a general dimensionality reduction technique, and compare it to another popular method - Principal Component Analysis (PCA). I Include an overview of PCA as well as a review of the basic math required to understand the method (spectral theorem/PSD covariance matrices).

In particular, I note that Gaussian random projection is robust even for non-convex data

PCA basis for 1D subspace of R3

Spectrum of covariance matrix of the data

Spectrum demonstrates ineffectiveness of PCA on non-convex data

Accuracy heatmap of Gaussian random projection demonstrates effectiveness on non-convex data

Applications

In this section I examine potential applications for random projections,
in particular for expediting k-means clustering

The next part of my paper deals with potential applications for random projections. I first provide a recap of the K-Means unsupervised clustering method. This method seeks to find a partition of the data by determining the optimal locations of "centroids", which define the center of closed-balls in the space. All points that fall within a given closed-ball are given the same label. The centroid locations are optimal in the sense that they minimize the sum of squared error over all points, where error is usually defined as the Euclidean distance between a point and its associated centroid.

Example data can be seen here, along with the partition that K-Means would find.

Unlabeled Data Clustered Data

Raw data sampled from two underlying convex distributions

Data clustered using ordinary k-means

I go on to highlight how ordinary K-Means fails on non-convex data, as seen here:

Unlabeled Data Clustered Data

Raw data sampled from two underlying non-convex distributions

Clustering via ordinary k-means demonstrates its inability to capture the underlying structure of the data

And how kernelization can resolve this failure by projecting the data to a higher dimension in which they are linearly separable, shown here:

Separated Data with Linear Separator Clustered Data

The same non-convex data projected into R3, along with convex boundaries (2-spheres) and linear separator (gray)

The same non-convex data clustered using ordinary k-means after kernelization

Finally, I go on to show how random projections can be used to offset the runtime cost that stems from kernelization, providing experimental evidence. Due to the different constructive realizations of random projections (including sparse ones), the fact that the JL lemma provides only an upper bound for projection dimension, and the fact that random projection works on any data set (not just those sampled from manifolds), big-O analysis is not included.

Standard Kernel K-Means (dashed)

vs.

Projected Kernel K-Means (solid) runtimes

Demonstration of how random projections can expedite kernelized k-means

Results of Kernel K-Means on three 2-Spheres

Clustering using Gaussian random projections for kernelized k-means very effective on non-convex, partially intersecting data

Paper

Return to Courses