One Step Above PCA: What is t-SNE? | by Haein Park | May, 2024


Fig 2. t-SNE animation by Lucas Graybuck

The aim of t-SNE is to effectively reduce the complexity of high-dimensional data into lower dimensions — often just one or two — while maximizing the retention of important information. This makes it particularly adept at revealing patterns in data that traditional methods might overlook.

Fig 3. An image from Statquest that shows the goal of the algorithm

Step 1: Determine the “similarity” of all the points in the higher dimension

We iterate the following process for all the points. Let’s say there are n data points

For the sake of explanation, let’s see point x_i and x_j , where i is not equal to j.

  1. Calculate Euclidean distance between two points.
  2. Imagine x_i​ as the center of a standard normal curve, and we place x_j​ at its measured distance, for example, 0.58 units away.
  3. The corresponding probability of the point x_j is the “unscaled” similarity of the two points.
  4. We scale this similarity by dividing it by the sum of all such similarity scores calculated from x_i​ to other points. This normalization ensures that the total similarity scores for each point sum to one, establishing a probabilistic framework for our data’s relationships.
Fig 4. An image from Statquest for the steps above

By repeating this process for all points, we build a comprehensive map of similarities, crucial for the effective reduction of dimensionality in t-SNE, enhancing our ability to detect patterns that other methods might miss.

Note: The variance of the normal curve, reflecting the density of points surrounding x_i, can result in different similarity measures from x_j to x_i​ compared to from x_i​ to x_j​. To address this asymmetry, we average the similarity scores between each pair of points. This step ensures that the final similarity measurement is balanced and accurately reflects the mutual relationship between the points, enhancing the robustness of the t-SNE’s dimensional reduction.

Here’s another note

Note: The necessity of normalizing similarity scores to sum to one arises from the need to account for the density of data points around xixi​. In t-SNE, the variance of the normal curve — essentially the width of the distribution used for calculating similarity — is influenced by how densely data points are clustered around the target point. By ensuring that the similarity scores sum to one, t-SNE effectively integrates the local density of points into the algorithm. This adjustment allows the model to maintain a consistent interpretative scale across different regions of the data.

Step 2: Calculate the Similarity of Randomly Scattered Data on the Lower Dimension

Initially, points are dispersed randomly within a desired lower-dimensional space, such as along a number line. Subsequently, pairwise similarities are computed similarly to the previous method — first determining the unscaled similarity and then scaling it. However, in this phase, a t-distribution is employed instead of a normal distribution.

Due to the initial random scattering, data points that are similar may not be positioned closely together, resulting in dissimilarities that diverge from those calculated in the original high-dimensional dataset. This discrepancy is crucial for the iterative optimization process that t-SNE undertakes to accurately reposition the points.

Fig 5. An image from Statquest that illustrates the goal of the second step.

The objective of the second step in t-SNE is to iteratively adjust the positions of the points, either by drawing them closer or pushing them further apart. This adjustment aims to refine the similarity scores in the lower-dimensional space so that they closely mirror those in the original high-dimensional dataset. This careful alignment ensures the dimensional reduction retains the intrinsic relationships within the data.

In summary, t-SNE is a non-linear dimensionality reduction technique employed in machine learning. The process begins by calculating the similarity of all data points in the high-dimensional space to form a similarity score matrix. A corresponding matrix is then generated for data points that are initially randomly positioned in a lower-dimensional space. Through iterative adjustments, involving slight movements of these points, t-SNE aims to align the lower-dimensional similarity matrix with that of the high-dimensional space. This methodical adjustment ensures that the two matrices closely resemble each other, effectively preserving the intrinsic relationships of the data through the reduction process.

Thank you for reading my post and stay tuned for the future posts!

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here