The differences between Pearson and Spearman correlations

--

Let’s quickly review our understanding of both Pearson and Spearman Correlation coefficient.

The primary differences between Pearson and Spearman correlations are:

  1. Linearity vs. Monotonicity: Pearson correlation assesses linear relationships between variables, making it suitable for detecting linear trends. In contrast, Spearman correlation assesses monotonic relationships, which can capture both linear and non-linear trends.
  2. Sensitivity to Outliers: Pearson correlation is sensitive to outliers, which can distort the correlation coefficient. Spearman correlation is less affected by outliers, making it more robust in such cases.
  3. Data Type: Pearson correlation is ideal for continuous data, while Spearman correlation is suitable for ordinal, ranked, or non-normally distributed data.

Let’s consider a real-world example to illustrate the differences between Pearson and Spearman correlations.

Scenario: Suppose you are conducting a study to analyze the relationship between the amount of time spent studying for an exam (in hours) and the exam scores (out of 100) for a group of 10 students.

Here’s a dataset for the example:

import numpy as np

# Time spent studying (hours)
studying_hours = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Exam scores (out of 100)
exam_scores = np.array([60, 65, 70, 75, 90, 85, 80, 95, 100, 50])

Now, let’s calculate both the Pearson and Spearman correlations for this dataset and observe the differences.

from scipy.stats import pearsonr, spearmanr

# Calculate Pearson correlation
pearson_corr, _ = pearsonr(studying_hours, exam_scores)

# Calculate Spearman correlation
spearman_corr, _ = spearmanr(studying_hours, exam_scores)

print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")
print(f"Spearman Correlation Coefficient: {spearman_corr:.2f}")

In this example:

Pearson Correlation:

  • The Pearson correlation coefficient measures the strength and direction of the linear relationship between studying hours and exam scores.
  • It assumes a linear relationship, which may not necessarily be the case in reality.

Spearman Correlation:

  • The Spearman correlation coefficient assesses the strength and direction of the monotonic (non-linear) relationship.
  • It only relies on the ranks of the data, making it more robust to potential outliers or non-linearity in the relationship.

Now, let’s interpret the results:

Pearson Correlation Coefficient: Approximately 0.45

  • Interpretation: There is a weak positive linear relationship between the time spent studying and exam scores. This suggests that students who study more tend to score slightly higher on the exams, but the relationship is not very strong.

Spearman Correlation Coefficient: Approximately 0.50

  • Interpretation: There is a weak positive monotonic relationship between the time spent studying and exam scores. This suggests that as students spend more time studying, their exam scores tend to increase monotonically, but again, the relationship is not very strong.

In this example, you can see that while both coefficients suggest a positive relationship, the strength of the relationship is slightly stronger according to Spearman correlation, which is less influenced by potential outliers or non-linear patterns.

--

--

No responses yet