Prediction of sarcopenia at different time intervals: an interpretable machine learning analysis of modifiable factors

Chen, Xiaodong; Li, Liping

doi:10.1186/s12877-025-05792-1

Research
Open access
Published: 27 February 2025

Prediction of sarcopenia at different time intervals: an interpretable machine learning analysis of modifiable factors

Xiaodong Chen^1,2 &
Liping Li^1,2

BMC Geriatrics volume 25, Article number: 133 (2025) Cite this article

980 Accesses
1 Altmetric
Metrics details

Abstract

Objectives

This study aims to develop sarcopenia risk prediction models for Chinese older adults at different time intervals and to identify and compare modifiable factors contributing to sarcopenia development.

Methods

This study used data from 3,549 participants aged 60 and older in the China Health and Retirement Longitudinal Study (CHARLS). Sarcopenia status was evaluated by the AWGS2019 algorithm. Full models for 2- and 4-year sarcopenia risk, considering multifactorial baseline variables, were compared with modifiable models. Eight machine learning (ML) algorithms were used to build these models, with performance evaluated by the area under the receiver operating characteristic curve (AUC-ROC). SHapley Additive exPlanations (SHAP) was applied for model explanation.

Results

The average age of participants was 67.0 years (SD = 6.1), with 47.8% being female (1,696 participants). The ML models achieved moderate performance, and eXtreme Gradient Boosting (XGBoost) emerged as the best model for both the full and modifiable models in the 2-year prediction, with AUCs of 0.804 and 0.795, respectively (DeLong test, P = 0.665). In contrast, in the 4-year prediction, the Light Gradient Boosting Machine (LightGBM) performed best with AUCs of 0.795 and 0.769, respectively (P = 0.053). The SHAP analysis highlighted gender and estimated glomerular filtration rate (eGFR) as the most important predictors in both the full and modifiable models.

Conclusions

Prediction models based on modifiable factors at different time intervals can help identify older Chinese adults at high risk of sarcopenia. These findings highlight the importance of prioritizing functional capacity and psychosocial determinants in sarcopenia prevention strategies.

Peer Review reports

Introduction

Sarcopenia is a progressive skeletal muscle disorder characterized by the loss of muscle mass and strength [1]. It is associated with an increased risk of adverse outcomes such as malnutrition, physical disability, osteoporosis, falls, fractures, and even death [2, 3]. Although diagnostic criteria for sarcopenia have been established, effective risk prediction models remain scarce. Evidence suggests that muscle mass, strength and physical performance can substantially improve through targeted interventions like physical exercise, lifestyle adjustments, and dietary changes [4, 5]. This highlights the importance of constructing risk prediction models using representative ageing datasets to identify important risk factors, which can provide critical insights for intervention strategies.

Among the risk factors for sarcopenia, modifiable factors variables that can be altered through behavioural or lifestyle changes—play a crucial role in slowing down or reversing disease progression [6]. In contrast, non-modifiable factors, such as age, sex, genetic predispositions, and anthropometric measurements, serve as intrinsic determinants that establish baseline risk but cannot be altered [7]. Identifying modifiable factors like physical activity levels, dietary intake, and other lifestyle behavioural early offers a significant opportunity to intervene and mitigate the impact of sarcopenia, as these factors allow for developing of prevention strategies tailored to high-risk populations [8].

In risk prediction models, the ability of Machine Learning (ML) models to process high-dimensional datasets, identify critical variables, and uncover complex relationships between input variables has made them powerful tools in predicting health outcomes [9]. Clinical Prediction Models that utilize ML allow for integrating multiple factors to predict individual outcomes, offering deeper insights into disease risk determinants and improving the precision of prognosis [10]. However, many machine learning models are perceived as “black boxes,” making them difficult for clinical decision-makers to interpret. Techniques like Shapley Additive Explanations (SHAP), which evaluates and ranks the important predictors, have helped enhance the transparency of these models, allowing for more interpretable results [11]. Interpretable machine learning approaches focusing on modifiable factors could play a key role in guiding community-based sarcopenia prevention strategies. Current sarcopenia risk prediction models often focus on a single time interval [12, 13], which limits understanding how risk factors may change over time. Research shows that the performance of prediction models is closely related to the length of the prediction window [14]. Since sarcopenia is a dynamic, progressive condition that requires time to develop, important factors may differ across varying time frames. Developing prediction models tailored to different time intervals can improve risk assessment accuracy and help identify modifiable factors, enabling more precise interventions for effective sarcopenia prevention.

The current study aims to (1) use ML methods to develop different time intervals prediction models for sarcopenia risk among community-dwelling older adults in China; (2) identify and compare modifiable factors that play important roles in the development of sarcopenia so that preventive strategies can be developed accordingly.

Methods study

Design and participants

The data were obtained from the China Health and Retirement Longitudinal Study (CHARLS), a nationally representative longitudinal study of people aged 45 years or above. Using a stratified multistage probability-proportional-to-size random-cluster sampling method, 17,707 participants across 28 provinces were recruited at baseline in 2011. The CHARLS was approved by the Ethics Review Committee of Peking University and written informed consents were obtained from all participants. Detailed design, sampling methods and data collection have been previously reported [15].

Due to the time limitations of blood sample collection in the CHARLS, which only covered the years 2011 and 2015, and the collection of physical examination indicators (e.g., height, weight) being limited to 2011, 2013, and 2015, this study selected the 2011, 2013, and 2015 waves of CHARLS data for analysis. The 2011 wave was used as the baseline, and the 2013 and 2015 waves were considered follow-up endpoints, respectively. Participants were excluded if any of the following criteria were met: (1) sarcopenia status at baseline; (2) age below 60 years at baseline (Year 2011); (3) missing outcome data (sarcopenia diagnosis information) at baseline or in either of the two follow-up years; (4) loss to follow-up during the study period; (5) more than 20% of variable information missing. Finally, 3,549 participants were included in both the 2011–2013 (2-year) and 2011–2015 (4-year) prediction studies. Details of the sample selection process are shown in Fig. 1.

Outcome variables and input variables

The outcome of the study was the occurrence of sarcopenia in 2013 and 2015. Based on the algorithm recommended by the AWGS2019 [16], individuals with weak muscle strength and low muscle mass/physical performance were regarded as being sarcopenic. Handgrip strength was measured by a trained examiner using a Yuejian™ WL-1000 dynamometer (Nantong Yuejian Physical Measurement Instrument Co., Ltd., Nantong, China) in kilograms [15]. Participants were tested in a standing position with the shoulder flexing at 90° and the arm straight out. Every participant was measured twice on both hands and verbal encouragement was given during the test. Handgrip strength was recorded as the maximum value of these four measurements. If a participant was unable to perform grip strength measurement on one hand due to health reasons (swelling, inflammation, severe pain, or injury), values measured on the other hand would be used. A cut-off value of 18 kg in women and 28 kg in men was used to define weak muscle strength [16].

Appendicular skeletal muscle mass (ASM) was calculated using a previously validated anthropometric equation in Chinese population [17]:

$$\displaylines{{\text{ASM}} = 0.193 \times {\text{weight}}\left( {{\text{kg}}} \right) + 0.107 \times {\text{height}}\left( {{\text{cm}}} \right) \cr - 4.157 \times {\text{gender}} - 0.037 \times {\text{age}}\left( {{\text{years}}} \right) - 2.631 \cr} $$

where for male, gender was set to 1, otherwise to 2. This equation has a high coefficient of determination (R² = 0.90) and low bias (SEE = 1.63 kg), and is in good agreement with dual- energy X-ray absorptiometry (DXA) [17]. The height-adjusted appendicular skeletal muscle mass ($\:\text{A}\text{S}\text{M}/{\text{H}\text{t}}^{2}$) was calculated using the ASM divided by the square of height in meters. Low muscle mass was defined as < 5.4 $\:\text{k}\text{g}/{\text{m}}^{2}$ in women and < 7$\:\text{k}\text{g}/{\text{m}}^{2}$ in men [16]. Considering that the CHARLS walking distance is 2.5 m, which does not align with the recommended 6-meter walking distance, we referred to other literature and assessed physical performance using the 5-time chair stand test, with a time of ≥ 12 s indicating low physical performance [16, 18, 19].

Input variables were first obtained based on previously reported sarcopenia-related variables and data availability in CHARLS [15], followed by reference to other literature to exclude input variables with more than 20% missing information [20]. Finally, 92 input variables were finally chosen as candidate features at baseline (Year 2011). Those variables were collected through questionnaires or measurements. Variables collected by questionnaires included the following aspects: (1) Demographic factors such as age and sex. (2) Health status factors: hypertension, diabetes, stroke, arthritis, etc. (3) Lifestyle factors: smoking, alcohol consumption, sleep duration and leisure activities. (4) Medication factors: antihypertensive medications, diabetes medications, stroke medications, psychotropic medications, etc. (5) Psychological factors: depressive symptoms and cognitive function. (6) Socioeconomic factors: life satisfaction, self-reported health status, health satisfaction, hospitalization history, etc. Variables collected by measurements mainly included: (1) Physical functions: balance ability. (2) Anthropometric Indicators: Body Mass Index (BMI), arm length, knee height. (3) Vital signs: systolic, diastolic, lung function. (4) Blood factors: White Blood Cell (WBC), Hemoglobin (HGB), Hematocrit (HCT), Mean Corpuscular Volume (MCV), etc. Assignments of variables are presented in Supplementary Table S1.

Development and evaluation of prediction model

This study constructs prediction models strictly following the TRIPOD process [21]. In full model, a total of 92 baseline variables were included as input variables. To evaluate the performance of the model that includes only modifiable indicators, the modifiable models were constructed based on only modifiable variables, excluding non-modifiable factors such as age, sex, knee height, arm length, etc., and the performances were compared to full models at different time intervals.

Eight commonly used ML classification models were selected to build risk prediction models for sarcopenia (Fig. 1), i.e., logistic regression (LR), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM) and Artificial Neural Network (ANN). Among them, LR is a classical statistical method using the logit function to capture the relationship between variables, offering simplicity, fast training, and transparency [22]. DT, a tree-based algorithm, makes decisions by recursively splitting data into subsets based on feature values [23]. SVM optimizes data separation by maximizing the interval hyperplane, providing stability [24]. RF, Adaboost, XGBoost, and LightGBM are ensemble techniques using decision trees, based on bagging and boosting, to reduce underfitting and overfitting risks [25]. ANN, a basic neural network model, excels at parallel processing for complex nonlinear relationships [26].

Data preprocessing was performed as follows: The data was first divided into training and test sets in a 7:3 ratio. Missing data were imputed using the MissForest algorithm, a commonly used robust method based on iterative random forests that can handle both continuous and categorical variables [27]. To avoid data leakage and bias, data imputation, class balancing (using the Synthetic Minority Over-sampling Technique [SMOTE] [28]), and feature selection were exclusively applied to the training set. The least absolute shrinkage and selection operator (LASSO) method was applied for feature selection, which identifies important predictors by shrinking regression coefficients through L1 regularization [29]. Features with non-zero coefficients, determined using 5-fold cross-validation to optimize the penalty term (λ), were selected [30]. Each ML model underwent 5-fold cross-validation and Bayesian optimization method for hyperparameter tuning. Model evaluation was performed on the test set and the model with the highest AUC-ROC in the test set was selected as the optimal model [31]. DeLong’s test was performed to assess the differences in AUC. The calibration of the prediction model was determined according to the Brier score, with a smaller score indicating a better fit. The Shapley Additive exPlanations (SHAP) value was used to evaluate the contribution of each predictor in prediction models [32]. Partial Dependence Plots (PDPs) were also employed to visualize the relationship between individual predictors and the predicted outcome, illustrating how changes in specific predictors affect the model’s predictions. Additionally, local SHAP was applied to explain individual predictions by quantifying the impact of each predictor on the model’s output for specific instances.

Statistical analysis

Descriptive statistics were applied to show baseline characteristics of the enrolled participants. Data were presented as median (interquartile range) or frequency (percentage) as appropriate. Group comparisons were performed by Welch-cox test or chi-square test.

Descriptive analysis and variance analysis were performed using R version 4.3. Data preprocessing, feature selection, ML model building and evaluation were done using Python 3.7. A P value < 0.05 was considered as having statistical significance.

Results

Descriptive analysis results

According to the inclusion and exclusion criteria, 3,549 participants were included in the 2-year (2011–2013) and 4-year (2011–2015) sarcopenia risk prediction cohorts. The sample selection process is illustrated in Fig. 1. The average age of participants was 67.0 years (SD = 6.1), with 47.8% being female (1,696 participants). After 2 years of follow-up, 461 individuals (12.99%) developed sarcopenia, while after 4 years, 470 individuals (13.24%) had progressed to sarcopenia. For the 2-year prediction, the LASSO regression algorithm selected 24 variables for the full model and 22 for the modifiable model, out of 92 baseline candidate features (Supplementary Table S2, S4, and Figure S1). Similarly, for the 4-year prediction, 22 variables were selected for the full model and 24 for the modifiable model (Supplementary Table S3 and S5).

Performance evaluation of prediction models

Eight ML models were developed to predict 2-year and 4-year sarcopenia risk based on variables determined by feature selection. The performances of eight ML models on the test set are shown in Table 1. In the 2-year prediction, XGBoost emerged as the best model for both full and modifiable models, with AUCs of 0.804 and 0.795, and Brier scores of 0.100 and 0.106, respectively. In the 4-year prediction, LightGBM was the optimal model for both full and modifiable models, with AUCs of 0.795 and 0.769, and Brier scores of 0.121 and 0.139, respectively. Notably, for predictions at different time points, the models using only modifiable factors performed comparably to the full models (DeLong test, P = 0.665 and 0.053, respectively) (Supplementary Table S6 and S7).

Table 1 Performance of the eight ML model for predicting sarcopenia on the test set

Full size table

Predictors importance and variables interpretation

The SHAP values for the optimal models showed that gender was the most important predictor in the full models at different time intervals, while estimated glomerular filtration rate (eGFR) was the most important predictor in the modifiable models (Fig. 2). The common important predictors in the full model at 2 and 4 years were gender, age, education, lung function, depressive symptoms, BMI, platelets, Glucose, and eGFR. In both the 2-year and 4-year modifiable models, the common important predictors were eGFR, education, lung function, balance, depressive symptoms, BMI, C-Reactive Protein (CRP), and platelets.

We further analyzed the impact of the top three important predictors of the four models on the prediction of sarcopenia using PDP plots (Fig. 3). Specifically, females, older age, higher eGFR, and elevated depressive symptom scores were associated with higher predicted probabilities of sarcopenia. In contrast, Improved lung function and greater education attainment were linked to a reduced likelihood of sarcopenia. To better understand individual predictions, we applied SHAP’s local explanation method. Supplementary Figure S2 illustrates the risk predictions for sarcopenia patients using the XGBoost and LightGBM models. Figures S1A and S1C show true positive cases with high-risk predictions, while Figures S1B and S1D show true negative cases with correct low-risk predictions. For example, Figure S1A demonstrates a true positive case, where predictors such as gender (female), age (older), eGFR (lower), and lung function (worse) significantly increased the risk of sarcopenia, while depressive symptom scores (lower) had a slight protective effect. Despite this, the cumulative effect of the risk-increasing factors dominated, resulting in a prediction of high risk for sarcopenia, which was consistent with the actual diagnosis.

Discussion

Based on follow-up data from CHARLS, eight ML models were developed and validated to assess the 2-year and 4-year risk of sarcopenia in community-dwelling individuals aged 60 years or older using the full set of variables at baseline versus modifiable variables, respectively. Among them, XGBoost exhibited the best performance in both 2-year risk models, while LightGBM performed optimally in both 4-year risk models. The SHAP values for the optimal models showed that gender was the most important predictor in the full models at different time intervals, while eGFR was the most important predictor in the modifiable models.

The performance of all prediction models was moderate, with all 2-year risk models performing better than the 4-year risk model, and the performance of the full 2-year model was optimal. Our findings are consistent with previous studies that have shown a decline in model performance with extended prediction periods [33]. The 2-year model may be more effective at capturing changes in sarcopenia because health changes are typically more pronounced and easier to detect in the short term. In contrast, long-term prediction models are often influenced by more complex health changes and external factors, which can lead to a reduction in prediction performance. In terms of machine learning methods, although the AUROC values of the XGBoost and LightGBM models outperform traditional LR, the difference is not statistically significant. However, the overall performance of these models is better in terms of sensitivity, accuracy and Brier score metrics, suggesting that they may offer potential advantages in practical applications. Future research should focus on exploring the scalability and robustness of these methods in larger and more diverse cohorts to fully realize their potential.

In terms of full model and modifiable model performance, we found a decrease in modifiable model performance, but the decrease was not significantly different. This is very encouraging and means that groups at high risk of sarcopenia can be identified early and interventions targeted at these important modifiable factors can be made. The common important predictors in the full model at 2 and 4 years were gender, age, education, lung function, depressive symptoms, BMI, platelets, Glucose, and eGFR, that is, all of them were key determinants of the risk of sarcopenia at different time intervals. Among them, the association of education, lung function, and platelets with sarcopenia has been less studied. Education has been well-related to lifestyle and health conditions, and existing studies have manifested that higher levels of education are associated with a lower risk of sarcopenia in Western countries [34]. Lung function reflects overall health status and directly affects the body’s motor function [35]. It has been shown that patients with COPD are often associated with limited physical activity and malnutrition, leading to a net loss of muscle protein and increasing the risk of sarcopenia [36, 37]. As a common indicator of inflammation and oxidative stress [38], the identification of platelets provided supportive evidence for the role of inflammation in sarcopenia. Emerging evidence has suggested a link between chronic low-grade inflammation and loss of muscle mass [39]. Additionally, other sarcopenia-related factors such as obesity and high body fat mass, have also exhibited positive correlations with platelets [40]. Such association was consistent with the finding of a previous cross-sectional study, which showed a positive association between platelets and sarcopenia among Asian women aged 65 years or above [40]. In the 2- and 4-year modifiable models, the common important predictors were eGFR, education, lung function, balance, depressive symptoms, BMI, CRP, and platelets. Among these, eGFR emerged as the most important factor. The eGFR reflects renal function and metabolic capacity of muscles within the body [41]. As sarcopenia is accompanied by increased protein catabolism, the burden on renal function is also intense. Current studies have reported that skeletal muscle [42], lean mass [41], body fat and distribution are correlated to eGFR [43]. Yet, studies on the direct association between eGFR and sarcopenia are scarce while our results have provided supportive evidence for such association.

For risk prediction models for sarcopenia over the next 2 years (whether full or modifiable model), MMSE, alcohol consumption, and sleep duration at noon emerged as unique important predictors. MMSE reflects changes in cognitive ability, and its decline is associated with reduced activity and loss of muscle mass. A meta-analysis that included 15 cross-sectional studies found an association between sarcopenia and mild cognitive impairment and dementia [44]. Significantly lower MMSE scores after 1 year of follow-up in older adults with sarcopenia than in the non-sarcopenic older population have been reported in other studies [44]. The relationship between alcohol consumption, a common lifestyle factor, and sarcopenia is controversial. For example, the results of a meta-analysis showed that alcohol consumption was not a risk factor for the development of sarcopenia [45]. One possible explanation for this is that differences in drinking patterns and dose-response relationships may play a role. Similarly, sleep duration at noon, another lifestyle factor, may influence sarcopenia risk indirectly by affecting daily vitality and behavioural patterns [46]. In contrast, predictors specific to the risk prediction model for the next 4 years were Glycated Hemoglobin (HbA1c), Mean Corpuscular Volume (MCV), and HGB. HbA1c reflects long-term glycemic control, and higher levels may be associated with metabolic disorders and chronic diseases, long-term factors that may accelerate the progressive decline in muscle function. A longitudinal study that followed 482 older adults with diabetes for 3 years found that elevated HbA1c levels were associated with an increased risk of sarcopenia [47]. The results of the Baltimore Longitudinal Aging Study showed that elevated HbA1c levels were associated with a decline in muscle strength after 7.5 years [48]. MCV may indicate malnutrition or underlying chronic disease, slowly affecting muscle mass decline. Furthermore, lower HGB may lead to skeletal muscle hypoxia, which in turn affects muscle metabolism and regenerative capacity [49]. Briefly, predictors specific to the 2-year model were more likely to emphasize the importance of cognitive and lifestyle factors in predicting of short-term sarcopenia, whereas predictors specific to the 4-year model were more likely to be associated with chronic disease and changes in long-term health status.

The strengths of this study are the large sample size from a nationwide survey and the inclusion of full and modifiable models at different time intervals. However, this study has some limitations. First, since the diagnoses of sarcopenia in this study was based on the criteria proposed by the AWGS 2019, one should be cautious when generalizing our findings to Western populations as different diagnostic criteria were used. Secondly, this study was limited to identifying associations between sarcopenia and potential factors, and it should be noted that causal inferences cannot be inferred solely based on the results. Third, there was a limited percentage of the group aged ≥ 80 years and more participants with lower literacy levels in this study. Finally, this study was limited to internal validation, without external validation to assess the generalizability of the models. To address this, future research will aim to incorporate samples from diverse regions for external validation, especially older adults with high age and literacy levels, to enhance the robustness of the findings.

Conclusion

In conclusion, this study developed prediction models based on modifiable factors at different time intervals to identify older adults at high risk of sarcopenia. These models can provide valuable insights for early detection and targeted prevention efforts. From a clinical intervention perspective, sarcopenia prevention strategies should specifically focus on functional capacity and psychosocial determinants to effectively mitigate sarcopenia risk.

Data availability

Data available at http://charls.pku.edu.cn.

References

Cruz-Jentoft AJ, Baeyens JP, Bauer JM, Boirie Y, Cederholm T, Landi F, Martin FC, Michel JP, Rolland Y, Schneider SM, et al. Sarcopenia: European consensus on definition and diagnosis: report of the European working group on sarcopenia in older people. AGE AGEING. 2010;39(4):412–23.
Article PubMed PubMed Central Google Scholar
Bachettini NP, Bielemann RM, Barbosa-Silva TG, Menezes A, Tomasi E, Gonzalez MC. Sarcopenia as a mortality predictor in community-dwelling older adults: a comparison of the diagnostic criteria of the European working group on sarcopenia in older people. EUR J CLIN NUTR. 2020;74(4):573–80.
Article CAS PubMed Google Scholar
Xia L, Zhao R, Wan Q, Wu Y, Zhou Y, Wang Y, Cui Y, Shen X, Wu X. Sarcopenia and adverse health-related outcomes: an umbrella review of meta-analyses of observational studies. CANCER MED-US. 2020;9(21):7964–78.
Article Google Scholar
Kim JW, Kim R, Choi H, Lee SJ, Bae GU. Understanding of sarcopenia: from definition to therapeutic strategies. ARCH PHARM RES. 2021;44(9–10):876–89.
Article CAS PubMed Google Scholar
Dent E, Woo J, Scott D, Hoogendijk EO. Toward the recognition and management of sarcopenia in routine clinical care. Nat Aging. 2021;1(11):982–90.
Article PubMed Google Scholar
Granic A, Sayer AA, Robinson SM. Dietary Patterns, Skeletal Muscle Health, and Sarcopenia in Older Adults. NUTRIENTS. 2019;11(4).
Viecelli C, Ewald CY. The non-modifiable factors age, gender, and genetics influence resistance exercise. Front Aging. 2022;3:1005848.
Article PubMed PubMed Central Google Scholar
Tzeng PL, Lin CY, Lai TF, Huang WC, Pien E, Hsueh MC, Lin KP, Park JH, Liao Y. Daily lifestyle behaviors and risks of sarcopenia among older adults. ARCH PUBLIC HEALTH. 2020;78(1):113.
Article PubMed PubMed Central Google Scholar
Nevin L. Advancing the beneficial use of machine learning in health care and medicine: toward a community Understanding. PLOS MED. 2018;15(11):e1002708.
Article PubMed PubMed Central Google Scholar
Watson DS, Krutzinna J, Bruce IN, Griffiths CE, McInnes IB, Barnes MR, Floridi L. Clinical applications of machine learning algorithms: beyond the black box. BMJ-BRIT MED J. 2019;364:l886.
Article Google Scholar
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to global Understanding with explainable AI for trees. NAT MACH INTELL. 2020;2(1):56–67.
Article PubMed PubMed Central Google Scholar
Yu S, Chen L, Zhang Y, Wu P, Wu C, Lang J, Liu Y, Yuan J, Jin K, Chen L. A combined diagnostic approach based on serum biomarkers for sarcopenia in older patients with hip fracture. AUSTRALAS J AGEING. 2022;41(4):e339–47.
Article PubMed Google Scholar
Kang YJ, Yoo JI, Ha YC. Sarcopenia feature selection and risk prediction using machine learning: A cross-sectional study. Medicine. 2019;98(43):e17699.
Article CAS PubMed PubMed Central Google Scholar
Van Houdt G, Mosquera C, Napoles G. A review on the long short-term memory model. Artif Intell Review: Int Sci Eng J. 2020;53(8):5929–55.
Article Google Scholar
Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China health and retirement longitudinal study (CHARLS). INT J EPIDEMIOL. 2014;43(1):61–8.
Article PubMed Google Scholar
Chen LK, Woo J, Assantachai P, Auyeung TW, Chou MY, Iijima K, Jang HC, Kang L, Kim M, Kim S, et al. Asian working group for sarcopenia: 2019 consensus update on sarcopenia diagnosis and treatment. J AM MED DIR ASSOC. 2020;21(3):300–7.
Article PubMed Google Scholar
Wen X, Wang M, Jiang CM, Zhang YM. Anthropometric equation for Estimation of appendicular skeletal muscle mass in Chinese adults. ASIA PAC J CLIN NUTR. 2011;20(4):551–6.
PubMed Google Scholar
Zhu Y, Yin H, Zhong X, Zhang Q, Wang L, Lu R, Jia P. Exploring the mediating roles of depression and cognitive function in the association between sarcopenia and frailty: A Cox survival analysis approach. J ADV RES 2024.
Sri-On J, Fusakul Y, Kredarunsooksree T, Paksopis T, Ruangsiri R. The prevalence and risk factors of sarcopenia among Thai community-dwelling older adults as defined by the Asian working group for sarcopenia (AWGS-2019) criteria: a cross-sectional study. BMC GERIATR. 2022;22(1):786.
Article PubMed PubMed Central Google Scholar
Wu Y, Wang X, Gu C, Zhu J, Fang Y. Investigating predictors of progression from mild cognitive impairment to Alzheimer’s disease based on different time intervals. AGE AGEING 2023, 52(9).
Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ-BRIT MED J. 2015;350:g7594.
Article Google Scholar
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J CLIN EPIDEMIOL. 2019;110:12–22.
Article PubMed Google Scholar
Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybernetics 1991.
Brereton RG, Lloyd GR. Support vector machines for classification and regression. ANALYST. 2010;135(2):230–67.
Article CAS PubMed Google Scholar
Breiman L. Bagging predictors. MACH LEARN 1996.
Drew PJ, Monson JR. Artificial neural networks. SURGERY. 2000;127(1):3–11.
Article CAS PubMed Google Scholar
Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8.
Article PubMed Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority Over-sampling technique. J ARTIF INTELL RES. 2002;16(1):321–57.
Article Google Scholar
Tibshirani R, Tibshirani R. Regression shrinkage via the lasso. 1996.
Zheng X, Wang F, Zhang J, Cui X, Jiang F, Chen N, Zhou J, Chen J, Lin S, Zou J. Using machine learning to predict atrial fibrillation diagnosed after ischemic stroke. INT J CARDIOL. 2022;347:21–7.
Article PubMed Google Scholar
Bradley P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. PATTERN RECOGN. 1997;30(7):1145–59.
Article Google Scholar
Lundberg S, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017(30):4766–75.
Xiang C, Wu Y, Jia M, Fang Y. Machine learning-based prediction of disability risk in geriatric patients with hypertension for different time intervals. ARCH GERONTOL GERIAT. 2023;105:104835.
Article Google Scholar
Adebusoye LA, Ogunbode AM, Olowookere OO, Ajayi SA, Ladipo MM. Factors associated with sarcopenia among older patients attending a geriatric clinic in Nigeria. NIGER J CLIN PRACT. 2018;21(4):443–50.
Article CAS PubMed Google Scholar
Jaitovich A, Barreiro E. Skeletal Muscle Dysfunction in Chronic Obstructive Pulmonary Disease: What We Know and Can Do for Our Patientsvol 198, pg 175, (2018). AM J RESP CRIT CARE. 2018(6):198.
Shrikrishna D, Patel M, Tanner RJ, Seymour JM, Connolly BA, Puthucheary ZA, Walsh SL, Bloch SA, Sidhu PS, Hart N, et al. Quadriceps wasting and physical inactivity in patients with COPD. EUR RESPIR J. 2012;40(5):1115–22.
Article PubMed Google Scholar
Gosselink R, Troosters T, Decramer M. Distribution of muscle weakness in patients with stable chronic obstructive pulmonary disease. J CARDIOPULM REHABIL. 2000;20(6):353–60.
Article CAS PubMed Google Scholar
Franco AT, Corken A, Ware J. Platelets at the interface of thrombosis, inflammation, and cancer. Blood. 2015;126(5):582–8.
Article CAS PubMed PubMed Central Google Scholar
Franceschi C, Garagnani P, Vitale G, Capri M, Salvioli S. Inflammaging and ‘Garb-aging’. TRENDS ENDOCRIN MET. 2017;28(3):199–212.
Article CAS Google Scholar
Lee HS, Koh IH, Kim HS, Kwon YJ. Platelet and white blood cell count are independently associated with sarcopenia: A nationwide population-based study. THROMB RES. 2019;183:36–44.
Article CAS PubMed Google Scholar
Taylor TP, Wang W, Shrayyef MZ, Cheek D, Hutchison FN, Gadegbeku CA. Glomerular filtration rate can be accurately predicted using lean mass measured by dual-energy X-ray absorptiometry. NEPHROL DIAL TRANSPL. 2006;21(1):84–7.
Article Google Scholar
Chew-Harris JS, Florkowski CM, George PM, Elmslie JL, Endre ZH. The relative effects of fat versus muscle mass on Cystatin C and estimates of renal function in healthy young men. ANN CLIN BIOCHEM. 2013;50(Pt 1):39–46.
Article CAS PubMed Google Scholar
Gunnarsson SI, Palsson R, Sigurdsson G, Indridason OS. Relationship between body composition and glomerular filtration rate estimates in the general population. NEPHRON. 2013;123(1–2):22–7.
PubMed Google Scholar
Peng TC, Chen WL, Wu LW, Chang YW, Kao TW. Sarcopenia and cognitive impairment: A systematic review and meta-analysis. CLIN NUTR. 2020;39(9):2695–701.
Article PubMed Google Scholar
Steffl M, Bohannon RW, Petr M, Kohlikova E, Holmerova I. Alcohol consumption as a risk factor for sarcopenia - a meta-analysis. BMC GERIATR. 2016;16:99.
Article PubMed PubMed Central Google Scholar
Li X, He J, Sun Q. Sleep duration and sarcopenia: an updated systematic review and Meta-Analysis. J AM MED DIR ASSOC. 2023;24(8):1193–206.
Article PubMed Google Scholar
Lin Y, Zhang Y, Shen X, Huang L, Yan S. Influence of glucose, insulin fluctuation, and glycosylated hemoglobin on the outcome of sarcopenia in patients with type 2 diabetes mellitus. J DIABETES COMPLICAT. 2021;35(6):107926.
Article CAS Google Scholar
Kalyani RR, Metter EJ, Egan J, Golden SH, Ferrucci L. Hyperglycemia predicts persistently lower muscle strength with aging. Diabetes Care. 2015;38(1):82–90.
Article PubMed Google Scholar
Wang H, Lin P. Association between sarcopenia and hemoglobin level: a systematic review and meta-analysis. FRONT MED-LAUSANNE. 2024;11:1424227.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We expressed our gratitude to the CHARLS research team, the field team, and every respondent.

Funding

This work was supported by the Science and Technology Project of Shantou Medical and Health Care Category (Grant No: STYL2022015).

Author information

Authors and Affiliations

School of Public Health, Shantou University, No. 243 Daxue Road, Shantou, 515063, Guangdong, China
Xiaodong Chen & Liping Li
Injury Prevention Research Center, Shantou University Medical College, Shantou, 515041, China
Xiaodong Chen & Liping Li

Authors

Xiaodong Chen
View author publications
You can also search for this author inPubMed Google Scholar
Liping Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

XD.C and LP. L conceived and designed the research method and helped to draft the manuscript. XD.C collected the data and performed the statistical analysis. XD.C and LP. L revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Liping Li.

Ethics declarations

Conflict of interest

None.

Financial disclosure

No financial disclosures were reported by the authors of this paper.

Ethics approval and consent to participate

The original CHARLS was approved by the Ethical Review Committee of Peking University (IRB00001052-11015), and all participants signed the informed consent at the time of participation.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, X., Li, L. Prediction of sarcopenia at different time intervals: an interpretable machine learning analysis of modifiable factors. BMC Geriatr 25, 133 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12877-025-05792-1

Download citation

Received: 23 September 2024
Accepted: 17 February 2025
Published: 27 February 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12877-025-05792-1

Prediction of sarcopenia at different time intervals: an interpretable machine learning analysis of modifiable factors

Abstract

Objectives

Methods

Results

Conclusions

Introduction

Methods study

Design and participants

Outcome variables and input variables

Development and evaluation of prediction model

Statistical analysis

Results

Descriptive analysis results

Performance evaluation of prediction models

Predictors importance and variables interpretation

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Financial disclosure

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Geriatrics

Contact us