The aim of t-SNE is to effectively reduce the complexity of high-dimensional data into lower dimensions — often just one or two — while maximizing the retention of important information. This makes it particularly adept at revealing patterns in data that traditional methods might overlook.
Step 1: Determine the “similarity” of all the points in the higher dimension
We iterate the following process for all the points. Let’s say there are n
data points
For the sake of explanation, let’s see point x_i
and x_j
, where i
is not equal to j
.
- Calculate Euclidean distance between two points.
- Imagine
x_i
as the center of a standard normal curve, and we placex_j
at its measured distance, for example, 0.58 units away. - The corresponding probability of the point
x_j
is the “unscaled” similarity of the two points. - We scale this similarity by dividing it by the sum of all such similarity scores calculated from
x_i
to other points. This normalization ensures that the total similarity scores for each point sum to one, establishing a probabilistic framework for our data’s relationships.
By repeating this process for all points, we build a comprehensive map of similarities, crucial for the effective reduction of dimensionality in t-SNE, enhancing our ability to detect patterns that other methods might miss.
Note: The variance of the normal curve, reflecting the density of points surrounding
x_i
, can result in different similarity measures fromx_j
tox_i
compared to fromx_i
tox_j
. To address this asymmetry, we average the similarity scores between each pair of points. This step ensures that the final similarity measurement is balanced and accurately reflects the mutual relationship between the points, enhancing the robustness of the t-SNE’s dimensional reduction.
Here’s another note
Note: The necessity of normalizing similarity scores to sum to one arises from the need to account for the density of data points around xixi. In t-SNE, the variance of the normal curve — essentially the width of the distribution used for calculating similarity — is influenced by how densely data points are clustered around the target point. By ensuring that the similarity scores sum to one, t-SNE effectively integrates the local density of points into the algorithm. This adjustment allows the model to maintain a consistent interpretative scale across different regions of the data.
Step 2: Calculate the Similarity of Randomly Scattered Data on the Lower Dimension
Initially, points are dispersed randomly within a desired lower-dimensional space, such as along a number line. Subsequently, pairwise similarities are computed similarly to the previous method — first determining the unscaled similarity and then scaling it. However, in this phase, a t-distribution is employed instead of a normal distribution.
Due to the initial random scattering, data points that are similar may not be positioned closely together, resulting in dissimilarities that diverge from those calculated in the original high-dimensional dataset. This discrepancy is crucial for the iterative optimization process that t-SNE undertakes to accurately reposition the points.
The objective of the second step in t-SNE is to iteratively adjust the positions of the points, either by drawing them closer or pushing them further apart. This adjustment aims to refine the similarity scores in the lower-dimensional space so that they closely mirror those in the original high-dimensional dataset. This careful alignment ensures the dimensional reduction retains the intrinsic relationships within the data.
In summary, t-SNE is a non-linear dimensionality reduction technique employed in machine learning. The process begins by calculating the similarity of all data points in the high-dimensional space to form a similarity score matrix. A corresponding matrix is then generated for data points that are initially randomly positioned in a lower-dimensional space. Through iterative adjustments, involving slight movements of these points, t-SNE aims to align the lower-dimensional similarity matrix with that of the high-dimensional space. This methodical adjustment ensures that the two matrices closely resemble each other, effectively preserving the intrinsic relationships of the data through the reduction process.
Thank you for reading my post and stay tuned for the future posts!