Understanding Kendall’s Tau Rank Correlation
Kendall’s Tau Rank Correlation, often referred to simply as Kendall correlation or Kendall’s Tau, is a non-parametric statistical method used to measure the strength and direction of association between two variables.
Unlike Pearson correlation, Kendall correlation does not assume that the data follows a specific distribution, making it suitable for both continuous and ranked (ordinal) data.
It assesses the similarity in the ranking of data points between two variables.
- Assumption: Non-parametric and makes no assumptions about the data distribution.
- Use Case: Suitable for both continuous and ordinal (ranked) data. Useful when the data may not follow a linear relationship.
- Strength: Robust to outliers and non-linearity.
- Formula: Based on concordant and discordant pairs of data points.
- Interpretation: Measures the similarity in ranking order between two variables, where 1 indicates perfect agreement, -1 indicates perfect disagreement, and 0 suggests no association.
Concordant Pairs: for concordant pairs, if one data point ranks higher (or lower) in one variable, it also ranks higher (or lower) in the other variable.
Suppose you have two variables, A and B, and the following data points:
Variable A ranks: [2, 4, 1, 3, 5]
Variable B ranks: [4, 3, 2, 1, 5]
- Pair 1: (2, 4) — Both ranks increase (concordant).
- Pair 2: (4, 3) — Both ranks decrease (concordant).
- Pair 3: (1, 2) — Both ranks increase (concordant).
- Pair 4: (3, 1) — Both ranks decrease (concordant).
- Pair 5: (5, 5) — Same ranks (neither concordant nor discordant).
For discordant pairs, if one data point ranks higher (or lower) in one variable, it ranks lower (or higher) in the other variable.
Example of Discordant Pairs: Using the same variables A and B as in the previous example:
- Pair 6: (2, 3) — Discordant (rank increases in A but decreases in B).
- Pair 7: (4, 1) — Discordant (rank decreases in A but increases in B).
- Pair 8: (1, 4) — Discordant (rank increases in A but decreases in B).
- Pair 9: (3, 2) — Discordant (rank decreases in A but increases in B).
- Pair 10: (5, 5) — Same ranks (neither concordant nor discordant).
In this example, pairs 6, 7, 8, and 9 are discordant because the ranks change in opposite directions between the two variables.
Formula for Kendall’s Tau:
Interpretation of Kendall’s Tau:
The interpretation of Kendall’s Tau is as follows:
- If τ=1, it indicates a perfect agreement in rankings, suggesting a strong positive association.
- If τ=−1, it indicates a perfect disagreement in rankings, suggesting a strong negative association.
- If τ=0, it suggests no association between the rankings, indicating independence or randomness.
In general, a positive τ value implies that as one variable increases, the other tends to increase in rank as well, and vice versa for negative τ values.
The magnitude of τ indicates the strength of the association, with larger values indicating stronger associations.
Advantages of Kendall’s Tau:
- Robustness: Kendall’s Tau is robust to outliers and works well with skewed or non-normally distributed data.
- Suitable for ordinal data: It can be applied to data with ranked or ordinal categories.
- No assumptions: It does not assume a specific data distribution, making it more versatile.
Disadvantages of Kendall’s Tau:
- Computational complexity: Calculating Kendall’s Tau can be computationally intensive for large datasets.
- Less sensitive to linear relationships: It may not perform as well as Pearson correlation when assessing linear associations.
Kendall’s Tau Rank Correlation VS Pearson Correlation
Let’s explore an example where the values for Pearson and Kendall correlation coefficients are different (significantly):
import numpy as np
from scipy.stats import kendalltau, pearsonr
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 3, 5])
# Calculate Kendall's Tau correlation
kendall_corr, _ = kendalltau(x, y)
# Calculate Pearson correlation
pearson_corr, _ = pearsonr(x, y)
print(f"Kendall's Tau Correlation Coefficient: {kendall_corr:.2f}")
print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")
# Kendall's Tau Correlation Coefficient: 0.20
# Pearson Correlation Coefficient: 0.50
In this code:
- We import NumPy for numerical operations and use
kendalltau
function from SciPy to calculate Kendall's Tau correlation. - We define sample data
x
andy
, which represent the rankings of two variables. - We calculate both Kendall’s Tau and Pearson correlation coefficients.
- Finally, we print the coefficients.
In this case, Kendall’s Tau correlation coefficient is 0.20, indicating a weak positive monotonic relationship between
x
andy
.Pearson correlation, on the other hand, is 0.50, suggesting a moderate positive linear relationship between the two variables.
The difference arises because Pearson correlation is more sensitive to linear relationships, while Kendall's Tau is better at capturing monotonic relationships, even if they are not strictly linear.