Understanding Pearson Correlation

Pearson Correlation is a measure of linear dependence of a variable with dependent variable.

--

Pearson correlation, also known as Pearson’s correlation coefficient, measures the linear relationship between two continuous variables.

It quantifies the degree to which two variables change together, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.

Formula:

To compute the Pearson correlation coefficient in Python, you can use libraries like NumPy or SciPy, as shown in the previous examples.

Here’s the formula again, implemented in Python code:

import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 3, 5])

# Calculate Pearson correlation using the formula
mean_x = np.mean(x)
mean_y = np.mean(y)

numerator = np.sum((x - mean_x) * (y - mean_y))
denominator_x = np.sqrt(np.sum((x - mean_x) ** 2))
denominator_y = np.sqrt(np.sum((y - mean_y) ** 2))

pearson_corr = numerator / (denominator_x * denominator_y)

# Print the Pearson correlation coefficient
print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")

In this code:

  • We start with sample data in the arrays x and y.
  • We calculate the means of x and y using np.mean.
  • We compute the numerator, which is the sum of the product of the deviations of x and y from their means.
  • We calculate the denominators, which are the square roots of the sums of the squared deviations of x and y from their means.
  • Finally, we calculate the Pearson correlation coefficient using the formula and print the result.

You can replace the x and y arrays with your own dataset to compute the Pearson correlation coefficient for your specific data.

Interpretation:

Interpreting Pearson correlation involves understanding the strength and direction of the linear relationship between two variables.

Here’s a Python code example that calculates the Pearson correlation coefficient and provides an interpretation:

import numpy as np
from scipy.stats import pearsonr

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 3, 5])

# Calculate Pearson correlation
pearson_corr, _ = pearsonr(x, y)

# Interpretation
if pearson_corr > 0:
interpretation = "There is a positive linear relationship between x and y."
elif pearson_corr < 0:
interpretation = "There is a negative linear relationship between x and y."
else:
interpretation = "There is no linear relationship between x and y."

# Print the correlation coefficient and interpretation
print(f"Pearson Correlation Coefficient: {pearson_corr:.2f}")
print(f"Interpretation: {interpretation}")

In this code:

We calculate the Pearson correlation coefficient between the sample data x and y.

We then interpret the correlation result based on its sign:

  • If pearson_corr is positive, it indicates a positive linear relationship.
  • If pearson_corr is negative, it indicates a negative linear relationship.
  • If pearson_corr is zero, it indicates no linear relationship.

Finally, we print the correlation coefficient and interpretation.

You can replace the x and y arrays with your own dataset to interpret the Pearson correlation coefficient for your specific data.

Advantages of Pearson Correlation:

  1. Easily interpretable: The Pearson correlation coefficient is straightforward to interpret, as it measures the strength and direction of a linear relationship between variables. It is the most commonly used correlation coefficient and is well-understood in statistics.
  2. Sensitive to linear relationships: It is sensitive to both the magnitude and direction of linear relationships between variables.

Disadvantages of Pearson Correlation:

  1. Limited to linear relationships: Pearson correlation assumes a linear relationship, so it may not capture nonlinear associations between variables effectively.
  2. Sensitive to outliers: Outliers can have a significant impact on Pearson correlation, potentially leading to misleading results.
  3. Requires continuous data: It is suitable for continuous variables and may not work well with categorical or ordinal data.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response