Special Topics in Linear Algebra
Special Topics in Linear Algebra was a course both in fundamental linear algebraic theory as well as in the applications of linear algebra for solving a wide range of problems
Select course topics can be found below:
Theory:
-
Field/Vector Space Characterization
-
Rank-Nullity Theorem
-
Isomorphism
-
Change of Base Matrices
-
Cauchy-Schwarz / Triangle Inequality
-
Spectral Theorem
-
Power Vectors / Jordan Canonical Form
-
Matrix Exponentials
-
Kronecker Products
Applications:
-
Gram-Schmidt Orthogonalization
-
Polynomial Regression
-
Spectral Graph Theory for Clustering
-
Courant-Fischer Theorem
-
Graph Matching (Sylvester Equation)
-
Rayleigh Quotient (estimating eigenvalues)
-
Gershgorin Circle Theorem
-
Matrix Compression
-
Cauchy Interlacing Theorem
Another requirement of the course was to write a technical paper on a topic related to a course. I chose to write a paper on random projections. The paper covered the theory behind random projections as well as applications, including the application of random projections to kernelized k-means as a method to offset the increased time-complexity cost of kernelizing k-means. This is a method that can be applied to any kernel method, and seems to have surprisingly few published papers dedicated to it and (to my knowledge) scarce application in real-world scenarios.
The paper was code-intensive, and saw me writing over 4,500 lines of code. The code is very flexible and, among other things, allows for:
-
The sampling of points of any number of N-spheres in a space with any dimension, the N-spheres being centered at any specified point within the space and with any radius.
-
Similar sampling but with with multivariate Gaussians.
-
The automatic processing of k-means data into useful partitions, with the option of storing accuracy data plots.
-
The storing of metadata when doing a grid-search for the purposes of easy runtime and accuracy comparison based on grid parameters (e.g. type of random projection, kernelized boolean, kernel variance, projection dimension, etc.).
-
Useful file/data handling to automatically generate PNGs, FIG, and MP4 (in the case of 3D data) files, automatically sorted and named according to the data set used, the random projection method used, and various method-specific parameters (e.g. N-sphere dimension, kernel variance variance etc, kernel type, etc.)
More information on each section of the paper can be found below, as well as a PDF of the paper at the bottom of the page for in-window viewing.
Theory
An overview of the theoretical foundations of random projections, various constructive realizations of random projections, and examples & experimental verification are found in this section
After introductory remarks, my final paper presents the mathematical theory behind random projections. In particular, I introduce the Johnson-Lindenstrauss (JL) Lemma, which provides an upper bound on dimension needed to project data down within some margin of error. I go over the formal statement (seen below), an interpretation of the statement, and an exploration of what makes this result so important.

Next, noting that the JL Lemma is not constructive, I introduce the general process of dimensionality reduction with random projections, and present three methods of random projection.
I go on to explore these projection methods, and how their accuracy depends on the degree to which the dimensionality of the data is reduced and on allowed error
To conclude this part of my paper, I evaluate random projections as a general dimensionality reduction technique, and compare it to another popular method - Principal Component Analysis (PCA). I Include an overview of PCA as well as a review of the basic math required to understand the method (spectral theorem/PSD covariance matrices).
In particular, I note that Gaussian random projection is robust even for non-convex data



PCA basis for 1D subspace of R3
Spectrum of covariance matrix of the data
Spectrum demonstrates ineffectiveness of PCA on non-convex data
Accuracy heatmap of Gaussian random projection demonstrates effectiveness on non-convex data
Applications
In this section I examine potential applications for random projections,
in particular for expediting k-means clustering
The next part of my paper deals with potential applications for random projections. I first provide a recap of the K-Means unsupervised clustering method. This method seeks to find a partition of the data by determining the optimal locations of "centroids", which define the center of closed-balls in the space. All points that fall within a given closed-ball are given the same label. The centroid locations are optimal in the sense that they minimize the sum of squared error over all points, where error is usually defined as the Euclidean distance between a point and its associated centroid.
Example data can be seen here, along with the partition that K-Means would find.


Unlabeled Data Clustered Data
Raw data sampled from two underlying convex distributions
Data clustered using ordinary k-means
I go on to highlight how ordinary K-Means fails on non-convex data, as seen here:


Unlabeled Data Clustered Data
Raw data sampled from two underlying non-convex distributions
Clustering via ordinary k-means demonstrates its inability to capture the underlying structure of the data
And how kernelization can resolve this failure by projecting the data to a higher dimension in which they are linearly separable, shown here:


Separated Data with Linear Separator Clustered Data
The same non-convex data projected into R3, along with convex boundaries (2-spheres) and linear separator (gray)
The same non-convex data clustered using ordinary k-means after kernelization
Finally, I go on to show how random projections can be used to offset the runtime cost that stems from kernelization, providing experimental evidence. Due to the different constructive realizations of random projections (including sparse ones), the fact that the JL lemma provides only an upper bound for projection dimension, and the fact that random projection works on any data set (not just those sampled from manifolds), big-O analysis is not included.

Standard Kernel K-Means (dashed)
vs.
Projected Kernel K-Means (solid) runtimes
Demonstration of how random projections can expedite kernelized k-means
Results of Kernel K-Means on three 2-Spheres
Clustering using Gaussian random projections for kernelized k-means very effective on non-convex, partially intersecting data