OLS Regression Results
==============================================================================
Dep. Variable: x R-squared: 0.004
Model: OLS Adj. R-squared: -0.006
Method: Least Squares F-statistic: 0.3809
Date: Fri, 04 Jun 2021 Prob (F-statistic): 0.539
Time: 10:15:52 Log-Likelihood: -493.47
No. Observations: 104 AIC: 990.9
Df Residuals: 102 BIC: 996.2
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 46.9506 3.145 14.930 0.000 40.713 53.188
y -0.3424 0.555 -0.617 0.539 -1.443 0.758
==============================================================================
Omnibus: 28.683 Durbin-Watson: 2.046
Prob(Omnibus): 0.000 Jarque-Bera (JB): 52.685
Skew: 1.138 Prob(JB): 3.63e-12
Kurtosis: 5.642 Cond. No. 6.52
==============================================================================
代码
import pandas as pd import numpy import scipy.stats as stats import openpyxl import csv import re from sklearn.linear_model import LinearRegression import statsmodels.api as sm from statsmodels.formula.api import ols from pandas import Panel df = pd.DataFrame(pd.read_excel(r'C:\Users\chenj\Desktop\COPD血清因子浓度.xlsx')) df1=open(r'C:\Users\chenj\Desktop\R语言和python分析COPD血清因子和吸烟、生物燃料分段线性\BS normal人群香烟烟雾暴露的影响.csv','w',newline='') #newline='' 行与行之间没有空行了。以只读的模式建立一个空的CSV文件,新文件会覆盖旧文件。请注意。 writer=csv.writer(df1) # 可以理解为初始化csv writer.writerow('BSE normal人群生物燃料暴露线性关系') # 在CSV文件上写入‘表头’。第一行是表头,所以,这句写在最前面 writer.writerow(['因子','b','p']) # 接着写入列的名称 #exposed_factors=['吸烟年包','生物年时'] exposed_factor='吸烟年包' tested_factors=['IL1b','IL4','IL5','IL6','IL13','IL17','IFNr','EOTAXIN','IP10','MIP1a','MIP1b','PDGFbb','VEGF'] df2 = df.loc[(df['分组'] == 7)] # 去除无暴露因素的人群,也就是说仅选择有暴露的人群 df2 = df2.dropna(subset=[exposed_factor]) # 去除相关列名有空格的行 print(df1) df2 = df2.dropna(subset=['IL1b']) x = df2[exposed_factor].values.reshape(-1, 1) # 必须是.value 然后再reshape(-1.1)转成1行。 y = df2['IL1b'].values.reshape(-1, 1) # print(x) lm_s = ols('x~y', data=df2).fit() # y=a+bx print(lm_s.summary())
运行后得到的结果如上所述。现在想提b值的95%可信区间:黑色粗体字的结果 -1.443 0.758