Data Science Basics: (7) Correlation Analysis | by Mamdouh Refaat | May, 2024

The formula of the Spearman coefficient is analogous to Pearson’s coefficient, but it uses the ranks of the values in each variable instead of the values themselves. It is usually given the Greek letter (theta). I will use the letter s to write it in Latin characters. The following formula gives the equation of the Spearman correlation coefficient s:

When we compare this formula with Pearson’s correlation coefficient r, we discover that it only replaces the values of x and y by their ranks U and V. One could say that the Spearman coefficient is Pearson’s coefficient using the ranks! That’s why it is called the Spearman’s rank-correlation coefficient. Also, because it is computed using the ranks and not the values, it is also classified as nonparametric.

Like the case of Pearson’s coefficient, the p-value is calculated from t-distribution with the t-value given by the following formula:

Table 3 shows the ranks U and V of the variables x and y in Table 2.

**Table 3:** the ranks of the variables x and y

In this case, the Spearman coefficient will be exactly 1, indicating a 100% correlation between the variables x and y ranks.

Now comes the question: when do we use ranks (Spearman), and when do we use the values (Pearson)? We can summarize the answer in the following two situations:

(1) When we expect that the values of the two variables in question don’t have outliers or significant errors, we should select the Pearson’s coefficient.

(2) We use the Spearman coefficient when we don’t care about the values and only need to know the direction of the relationship between the two variables and when there is a high likelihood of outliers and errors.

The Pearson’s coefficient is usually a good choice for measurements originating from physical systems and variables where the values matter. On the other hand, data from social studies originating from questionnaires, for example, when we ask respondents to give ranked answers, are good candidates for the Spearman’s coefficient.

Data Science Basics: (7) Correlation Analysis | by Mamdouh Refaat | May, 2024

Recent Articles

Why the Newest LLMs use a MoE (Mixture of Experts) Architecture

Using Machine Learning in Customer Segmentation

NYT ‘Connections’ hints and answers for July 27: Tips to solve ‘Connections’ #412.

Crooks Bypassed Google’s Email Verification to Create Workspace Accounts, Access 3rd-Party Services – Krebs on Security

🤖 The AI Developer’s Toolkit: Essential Skills and Resources [2023 Edition] 🔧 | by Jett Black | Jul, 2024

Related Stories

Leave A Reply Cancel reply