pearson相关分析r值与p值

佚名 2024-10-07 13:42:20 举报

篇首语：本文由小编为大家整理，主要介绍了pearson相关分析r值与p值相关的知识，希望对你有一定的参考价值。

我做相关分析的检验，
质量感知与满意度的相关系数r的 =0.345 且旁边没有*号（别的变量有*号）
相关显著性系数P值来看，p=0.072
请问他们俩的关系是什么？

参考技术A P值大于0.05，无统计学意义，两者之间无相关性

你如何计算 Python 中 Pearson"s r 的置信区间？

【中文标题】你如何计算 Python 中 Pearson"s r 的置信区间？【英文标题】：How do you compute the confidence interval for Pearson"s r in Python?你如何计算 Python 中 Pearson"s r 的置信区间？ 【发布时间】：2016-01-15 12:50:58 【问题描述】：

在 Python 中，我知道如何使用 scipy.stats.pearsonr 计算 r 和相关的 p 值，但我无法找到计算 r 置信区间的方法。这是怎么做到的？感谢您的帮助:)

【问题讨论】：

Equivalent of R"s of cor.test in Python的可能重复 【参考方案1】：

使用 rpy2 和心理测量库（您需要安装 R 并首先在 R 中运行 install.packages("psychometric")）

from rpy2.robjects.packages import importrpsychometric=importr("psychometric")psychometric.CIr(r=.9, n = 100, level = .95)

其中 0.9 是您的相关性，n 是样本量，0.95 是置信水平

【讨论】：

【参考方案2】：

根据[1]，直接用Pearson r计算置信区间是复杂的，因为它不是正态分布的。需要以下步骤：

将 r 转换为 z"，计算 z" 置信区间。 z" 的抽样分布近似正态分布，标准误为 1/sqrt(n-3)。将置信区间转换回 r。

以下是一些示例代码：

def r_to_z(r):    return math.log((1 + r) / (1 - r)) / 2.0def z_to_r(z):    e = math.exp(2 * z)    return((e - 1) / (e + 1))def r_confidence_interval(r, alpha, n):    z = r_to_z(r)    se = 1.0 / math.sqrt(n - 3)    z_crit = stats.norm.ppf(1 - alpha/2)  # 2-tailed z critical value    lo = z - z_crit * se    hi = z + z_crit * se    # Return a sequence    return (z_to_r(lo), z_to_r(hi))

参考：

http://onlinestatbook.com/2/estimation/correlation_ci.html

【讨论】：

【参考方案3】：

bennylp 给出的答案大部分是正确的，但是在第三个函数中计算临界值时有一个小错误。

应该是：

def r_confidence_interval(r, alpha, n):    z = r_to_z(r)    se = 1.0 / math.sqrt(n - 3)    z_crit = stats.norm.ppf((1 + alpha)/2)  # 2-tailed z critical value    lo = z - z_crit * se    hi = z + z_crit * se    # Return a sequence    return (z_to_r(lo), z_to_r(hi))

这里还有一个帖子供参考：Scipy - two tail ppf function for a z value?

【讨论】：

@bennylp 的答案是正确的，它只是假设您将 alpha 值传递给 alpha，而您的假设您正在传递 1 - alpha。即对于 95% 的置信区间，他的函数 alpha=0.05 和你的函数 alpha=0.95 给出相同的答案。【参考方案4】：

这是一个使用自举来计算置信区间的解决方案，而不是 Fisher 变换（假设二元正态性等），借用自 this answer：

import numpy as npdef pearsonr_ci(x, y, ci=95, n_boots=10000):    x = np.asarray(x)    y = np.asarray(y)       # (n_boots, n_observations) paired arrays    rand_ixs = np.random.randint(0, x.shape[0], size=(n_boots, x.shape[0]))    x_boots = x[rand_ixs]    y_boots = y[rand_ixs]        # differences from mean    x_mdiffs = x_boots - x_boots.mean(axis=1)[:, None]    y_mdiffs = y_boots - y_boots.mean(axis=1)[:, None]        # sums of squares    x_ss = np.einsum("ij, ij -> i", x_mdiffs, x_mdiffs)    y_ss = np.einsum("ij, ij -> i", y_mdiffs, y_mdiffs)        # pearson correlations    r_boots = np.einsum("ij, ij -> i", x_mdiffs, y_mdiffs) / np.sqrt(x_ss * y_ss)        # upper and lower bounds for confidence interval    ci_low = np.percentile(r_boots, (100 - ci) / 2)    ci_high = np.percentile(r_boots, (ci + 100) / 2)    return ci_low, ci_high

【讨论】：

【参考方案5】：

我知道上面已经建议了引导，下面提出了它的另一种变体，这可能更适合其他一些设置。

#1对您的数据进行采样（配对 X 和 Y，也可以添加其他重量），在其上拟合原始模型，记录 r2，附加它。然后从记录的所有 R2 分布中提取置信区间。

#2另外可以拟合采样数据并使用采样数据模型预测非采样 X（还可以提供连续范围来扩展您的预测，而不是使用原始 X）时间>获得 Y 帽子的置信区间。

所以在示例代码中：

import numpy as npfrom scipy.optimize import curve_fitimport pandas as pdfrom sklearn.metrics import r2_scorex = np.array([your numbers here])y = np.array([your numbers here])### define list for R2 valuesr2s = []### define dataframe to append your bootstrapped fits for Y hat rangesci_df = pd.DataFrame("x": x)### define how many samples you wanthow_many_straps = 5000### define your fit function/sdef func_exponential(x,a,b):    return np.exp(b) * np.exp(a * x)### fit original, using log because fitting exponentialpolyfit_original = np.polyfit(x                              ,np.log(y)                              ,1                              ,# w= could supply weight for observations here)                              )for i in range(how_many_straps+1):    ### zip into tuples attaching X to Y, can combine more variables as well    zipped_for_boot = pd.Series(tuple(zip(x,y)))    ### sample zipped X & Y pairs above with replacement    zipped_resampled = zipped_for_boot.sample(frac=1,                                               replace=True)    ### creater your sampled X & Y     boot_x = []    boot_y = []        for sample in zipped_resampled:        boot_x.append(sample[0])        boot_y.append(sample[1])         ### predict sampled using original fit    y_hat_boot_via_original_fit = func_exponential(np.asarray(boot_x),                                                   polyfit_original[0],                                                    polyfit_original[1])               ### calculate r2 and append    r2s.append(r2_score(boot_y,  y_hat_boot_via_original_fit))            ### fit sampled    polyfit_boot = np.polyfit(boot_x                              ,np.log(boot_y)                              ,1                              ,# w= could supply weight for observations here)                              )            ### predict original via sampled fit or on a range of min(x) to Z    y_hat_original_via_sampled_fit = func_exponential(x,                                                      polyfit_boot[0],                                                       polyfit_boot[1])             ### insert y hat into dataframe for calculating y hat confidence intervals    ci_df["trial_" + str(i)] = y_hat_original_via_sampled_fit  ### R2 conf intervallow = round(pd.Series(r2s).quantile([0.025, 0.975]).tolist()[0],3)up = round(pd.Series(r2s).quantile([0.025, 0.975]).tolist()[1],3)F"r2 confidence interval = low - up"

【讨论】：

以上是关于pearson相关分析r值与p值的主要内容，如果未能解决你的问题，请参考以下文章