预测高血压合并射血分数保留的心力衰竭患者死亡风险的可解释机器学习模型

An interpretable machine learning model for predicting mortality risk in patients with hypertension and heart failure with preserved ejection fraction

  • 摘要:
    目的 构建可解释性机器学习模型,预测高血压合并射血分数保留的心力衰竭(HFpEF)患者的全因死亡风险。
    方法 2014年4月至2019年3月期间在山西省三所三级甲等医院选取847例高血压合并HFpEF患者,前瞻性随访至2022年4月1日。以全因死亡为结局事件。将患者按7:3的比例随机分为训练集和测试集,训练集用于构建预测模型,测试集用于模型性能评价。通过最小绝对收缩选择算子(LASSO)-Cox回归筛选预测因子,分别构建极端梯度提升(XGBoost)、logistic回归、随机森林(RF)、决策树(DT)、支持向量机(SVM)和多层感知器(MLP)6种机器学习模型预测患者3年全因死亡风险。采用受试者操作特征(ROC)曲线、校准曲线和临床决策曲线评估模型性能。采用Shapley加法解释(SHAP)对简化后的最佳模型进行可解释性分析,并运用限制性立方样条探究关键影响因素与全因死亡之间的非线性关系。
    结果 中位随访4.25年(P25P75为2.86,6.17年),共有224例患者(26.4%)发生全因死亡。基于LASSO-Cox回归算法,从患者入院时的生命体征、实验室检查结果、影像学指标等临床特征中筛选出17个预测因子。ROC曲线显示,RF模型预测患者3年全因死亡曲线下面积(AUC)为0.823(95%Cl:0.693~0.950),准确率为84.0%,灵敏度为82.3%,特异度为83.0%,F1评分为0.810,预测性能优于其他5种模型。校准曲线和临床决策曲线显示,RF模型具有较好的校准度和临床适用性。SHAP特征重要性排序显示:年龄、估算的肾小球滤过率(eGFR)、收缩压、体重指数(BMI)为高血压合并HFpEF患者全因死亡的前4位影响因素。限制性立方样条进一步分析显示:年龄>72岁、eGFR<72.9 mL/(min·1.73 m2)、收缩压>136 mmHg、BMI>26.6 kg/m2时,将增加患者的全因死亡风险。为提高风险预警阈值的临床适用性,选择临床容易使用的阈值进行Cox回归分析,结果显示,收缩压>135 mmHg(HR=1.362, 95%Cl:1.020~1.819)和eGFR<70 mL/(min·1.73 m2)(HR=1.519, 95%Cl:1.135~2.034)均显著增加患者的全因死亡风险。
    结论 基于RF算法构建的高血压合并HFpEF死亡预测模型能有效地预测患者出院后3年全因死亡风险,SHAP提供的可解释性分析可为临床决策提供明确的依据。

     

    Abstract:
    Objective To develop an interpretable machine learning model to predict all-cause mortality risk in patients with hypertension and heart failure with preserved ejection fraction (HFpEF).
    Methods A prospective cohort of 847 patients diagnosed with hypertension and HFpEF from 3 tertiary hospitals in Shanxi Province between April 2014 and March 2019 was followed until April 1, 2022. All-cause mortality was used as the outcome event. The cohort was randomly divided into training (70%) and testing (30%) sets. The training set was used to construct prediction models, and the testing set was used for performance evaluation. Predictors were selected using the least absolute shrinkage and selection operator (LASSO)-Cox regression. Six machine learning models, including extreme gradient boosting (XGBoost), logistic regression, random forest (RF), decision tree (DT), support vector machine (SVM), and multilayer perceptron (MLP), were developed to predict the 3-year all-cause mortality risk. Model performance was evaluated using receiver operating characteristic (ROC) curves, calibration curves, and clinical decision curves. The Shapley additive explanations (SHAP) framework was applied to the simplified optimal model for interpretability analysis, and restricted cubic spline models were used to explore the nonlinear relationships between key predictors and all-cause mortality.
    Results After a median follow-up of 4.25 (P25, P75 2.86, 6.17) years, 224 patients (26.4%) experienced all-cause mortality. Using the LASSO-Cox regression, 17 predictors were identified from patients’ clinical characteristics, including vital signs, laboratory tests, and imaging results. The RF model achieved the best performance, with an area under the ROC curve (AUC) of 0.823 (95% CI: 0.693–0.950), accuracy of 84.0%, sensitivity of 82.3%, specificity of 83.0%, and an F1 score of 0.810. Calibration and clinical decision curves confirmed the RF model's good calibration and clinical applicability. SHAP feature importance analysis revealed that age, estimated glomerular filtration rate (eGFR), systolic blood pressure, and body mass index (BMI) were the top 4 factors influencing all-cause mortality in patients with hypertension and HFpEF. Further restricted cubic spline analysis indicated that age >72 years, eGFR <72.9 mL/(min·1.73 m2), systolic blood pressure >136 mmHg, and BMI >26.6 kg/m2 were associated with increased all-cause mortality risk. To enhance the clinical applicability of risk warning thresholds, clinically practical thresholds were selected for Cox regression analysis. The results showed that systolic blood pressure >135 mmHg (HR=1.362, 95% CI: 1.020–1.819) and eGFR <70 mL/(min·1.73 m²) (HR=1.519, 95% CI: 1.135–2.034) were both significantly associated with an increased risk of all-cause mortality.
    Conclusions The RF-based prediction model can effectively estimate the 3-year all-cause mortality risk in patients with hypertension and HFpEF after discharge. SHAP-based interpretability analysis can provide clear insights for clinical decision-making.

     

/

返回文章
返回