The differences between Pearson and Spearman correlations

Ishan | Virginia Tech & IIT Delhi

2 min readOct 3, 2023

Let’s quickly review our understanding of both Pearson and Spearman Correlation coefficient.

The primary differences between Pearson and Spearman correlations are:

Linearity vs. Monotonicity: Pearson correlation assesses linear relationships between variables, making it suitable for detecting linear trends. In contrast, Spearman correlation assesses monotonic relationships, which can capture both linear and non-linear trends.
Sensitivity to Outliers: Pearson correlation is sensitive to outliers, which can distort the correlation coefficient. Spearman correlation is less affected by outliers, making it more robust in such cases.
Data Type: Pearson correlation is ideal for continuous data, while Spearman correlation is suitable for ordinal, ranked, or non-normally distributed data.

Understanding Pearson Correlation:

Understanding Pearson Correlation

Pearson Correlation is a measure of linear dependence of a variable with dependent variable.

ishanjain-ai.medium.com

Understanding Spearman Correlation:

Understanding Spearman Correlation

Spearman Correlation, also known as Spearman’s rank correlation coefficient, is a statistical measure used to assess…

ishanjain-ai.medium.com

Let’s consider a real-world example to illustrate the differences between Pearson and Spearman correlations.

Scenario: Suppose you are conducting a study to analyze the relationship between the amount of time spent studying for an exam (in hours) and the exam scores (out of 100) for a group of 10 students.

Here’s a dataset for the example:

import numpy as np

# Time spent studying (hours)
studying_hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Exam scores (out of 100)
exam_scores = np.array([60, 65, 70, 75, 90, 85, 80, 95, 100, 50])

Now, let’s calculate both the Pearson and Spearman correlations for this dataset and observe the differences.

from scipy.stats import pearsonr, spearmanr

# Calculate Pearson correlation
pearson_corr, _ = pearsonr(studying_hours, exam_scores)

# Calculate Spearman correlation
spearman_corr, _ = spearmanr(studying_hours, exam_scores)

print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")
print(f"Spearman Correlation Coefficient: {spearman_corr:.2f}")

In this example:

Pearson Correlation:

The Pearson correlation coefficient measures the strength and direction of the linear relationship between studying hours and exam scores.
It assumes a linear relationship, which may not necessarily be the case in reality.

Spearman Correlation:

The Spearman correlation coefficient assesses the strength and direction of the monotonic (non-linear) relationship.
It only relies on the ranks of the data, making it more robust to potential outliers or non-linearity in the relationship.

Now, let’s interpret the results:

Pearson Correlation Coefficient: Approximately 0.45

Interpretation: There is a weak positive linear relationship between the time spent studying and exam scores. This suggests that students who study more tend to score slightly higher on the exams, but the relationship is not very strong.

Spearman Correlation Coefficient: Approximately 0.50

Interpretation: There is a weak positive monotonic relationship between the time spent studying and exam scores. This suggests that as students spend more time studying, their exam scores tend to increase monotonically, but again, the relationship is not very strong.

In this example, you can see that while both coefficients suggest a positive relationship, the strength of the relationship is slightly stronger according to Spearman correlation, which is less influenced by potential outliers or non-linear patterns.