A prediction model for COVID-19 liver dysfunction in patients with normal hepatic biochemical parameters

The composite model of sex and the novel protein marker ANGL3 exhibits great potential in predicting the risk of developing LD in COVID-19 patients with normal hepatic biochemical parameters on admission.


Introduction
Coronavirus disease 2019 , which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a pandemic. Different levels of liver disease have been reported in more than half of the infected population (Cai et al, 2020;Jothimani et al, 2020) and are recognized as an essential component of COVID-19 (Marjot et al, 2021).
In the early stage of COVID-19, some patients with normal hepatic biochemical parameters (alanine aminotransferase, aspartate aminotransferase, γ-glutamyl transferase, and total bilirubin) experience progression to liver dysfunction (LD) after the initiation of hepatotoxic-antiviral drugs, with even dramatic progression to liver failure without predictable circumstances (Carothers et al, 2020;Weber et al, 2020). With the reduction in the rate of COVID-19related severity and death, a means of decreasing the incidence of non-SARS-CoV-2-related complications has become increasingly important for patients with mild disease.
Given the important role of the liver in drug metabolism, coagulation, albumin synthesis, and the generation of acute-phase reactants, abnormal liver function can influence the pathologic course of the systemic disease COVID-19. Accumulating evidence suggests that elevated expression of serum hepatic biochemical parameters is related to adverse events, including mechanical ventilation, shock, and increased risk of death (Cai et al, 2020;Ding et al, 2021;Mao et al, 2020;Weber et al, 2021;Yadav et al, 2021;Yip et al, 2021). However, hepatic abnormalities involve various manifestations, including liver congestion, inflammatory response, drug-induced liver damage, and hepatocyte infection (Marjot et al, 2021), with associated difficulty in identifying susceptible patients at an early stage. Considering the limitations of common hepatic biochemical parameters in representing the actual liver function status of COVID-19 patients, early identification of subgroups with a high risk of COVID-19-based LD through new biomarkers is needed for medical intervention. However, there is no reliable strategy thus far for effective prediction of subsequent LD among COVID-19 patients at admission.
Proteomics profiling has the ability to shed light on molecular changes reflected in sera from COVID-19 patients (Nie et al, 2021;Shen et al, 2020). In this study, we developed and validated an exceptionally parsimonious model through proteomic analysis to predict subsequent risk of liver damage among COVID-19 patients with normal hepatic biochemical parameters on admission, in accordance with Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) standards.
Currently, many clinical predictive models might not have wide adaptability in clinical practice, often because of modeling of complex and multiple variables. In addition, most such models have been classified as having a high risk of bias and might not be generalizable, which is often caused by nonadherence to reporting standards and best practice methods during model development (Wolff et al, 2019;Wynants et al, 2020). The model reported here includes only two variables, namely, sex and angiopoietin-like 3 (ANGL3), which is a liver-specific secreted protein (Gaudet et al, 2017) screened by random forest (RF) and CoxBoost algorithm analyses from 1,517 proteins of proteomic data. The generalizability concerning discrimination, calibration, and clinical utility of this prediction model was comprehensively evaluated according to TRIPOD guidelines. We call this the protein-based AS LD model. The prediction tool and model parameters are freely available online to facilitate risk stratification for therapeutic interventions.

Results
Baseline clinical characteristics of the discovery cohort As shown in Fig 1, 109 participants were recruited between January 2020 and April 2020. During this period, 660 samples were collected from all of these patients. Ninety-eight (89.9%) individuals with available liver outcomes were included in the discovery cohort. A total of 397 serum samples from these individuals were designated as a source of material for identifying biomarkers. The baseline characteristics of the discovery cohort are shown as stratification for LD (Table S1); 52 (53.1%) of the patients experienced LD, with a median follow-up time of 19 (median value and interquartile range [IQR]: 10.2-31.0) days. Significant differences were observed in sex and body mass index (BMI) (P < 0.05).

Protein selection
Proteomic analysis identified 1,517 proteins from the serum samples. 20 proteins were preliminarily screened by a RF algorithm with a mean decrease in accuracy (MDA) of higher than 3 ( Fig S1). The expression patterns of these 20 proteins among different grades of LD samples were compared using principal coordinates analysis (PCoA) mapping ( Fig S2). The first and second principal components (PCoA1 accounted for 43.16% of the variance and PCoA2 for 11.01%) were plotted using the proteomic data, revealing a certain degree of distinction among the four groups.
A heat map was plotted to illustrate the expression patterns of the above 20 proteins in different grades of LD samples (Fig 2A). AL1A1 was first removed owing to a negligible difference between the LD groups. Thirteen proteins were retained according to the following rules: AUC > 0.70 for any grade of LD ( Fig 2B) and significant multivariable associations with LD (P < 0.05) ( Fig 2C). To minimize potential collinearity and overfitting for variables in the final model, we removed six proteins (FAAA, PSA7, PSB1, PSA5, PSA3, and PSA1) that correlated strongly with ALDOB and PSA4 (Spearman correlation coefficient > 0.6) ( Fig 2D).
Levels of the retained proteins (ANGL3, ALDOB, ADH4, ACY1, ADH1B, ARLY, and PSA4) were significantly different between the normal and LD groups ( Fig 3A). Patients in the LD group exhibited higher levels of ANGL3 (

Baseline characteristics of the development cohort
In the development cohort, 30 patients (40.5%) experienced LD within a median follow-up of 22.0 (11.0-32.0) days (Table 1). No significant differences were observed between the patients with normal liver function and those who experienced LD during the follow-up period.

Predictor selection
Baseline measurements included 31 clinical variables at admission and peak values of seven prioritized proteins within 7 d of hospitalization, which were simultaneously introduced into the Cox-Boost model for predictor selection. After CoxBoost processing, 10 variables remained as significant predictors of LD, including ADH4, hypertension, sex, fever, ANGL3, cough, feebleness, smoking, Hepatis B virus (HBV), and chest computed tomography (CT) (Fig S3). The step number and the penalty of the CoxBoost classifier were 101 and 198, respectively.

Model selection
To maximize the clinical convenience and applicability of our prediction model, we limited the predictors to two in the final model and constructed 19 models as follows: ANGL3.Sex ( To assess the model performance over the entire period, we calculated the time-dependent areas under the curves (AUCs) and presented them by graphical visualization (Figs 3B and S4). The ANGL3-based and ADH4-based models yielded AUC values from 0.48-0.83 to 0.53-0.81 over time, respectively. The calibration curves of ANGL3-based models showed better performance than those of ADH4-based models (Figs 3B and S5). The discrimination and calibration of the model ADH4 could not be calculated.
Decision curve analyses of the 19 models at a point estimate of 28 d are shown in Fig 3C. Threshold probabilities for the net benefit associated with the application of the ANGL3-based and ADH4based models in detecting LD ranged from 0.00-0.78 and 0.00-0.46, respectively. Notably, the ANGL3.SE (AS) model showed a higher net benefit than any other ANGL3-based model and the treat all or treat none strategies.
After performing an omni-ensemble analysis with discrimination, calibration, and clinical utility assessment, we prioritized the best-of-state AS model as the final LD prediction model for COVID-19 patients. As shown in Fig 3B, the time-dependent AUCs were 0.60-0.80 for predicting LD, indicating favorable discrimination by the AS model. The calibration curves of the AS model showed adequate agreement between the predicted and observed probability in the development cohort. Decision curves revealed that the AS model achieved more net benefits than others for a broad range of threshold probabilities in the development cohort ( Fig 3C).

Internal validation
We validated the discrimination of the AS model internally using bootstrapping of 1,000 resamples and the multiple fractional polynomial regression model ( Fig S6). The time-dependent AUCs showed results similar to those of the primary analyses.

External validation
The external validation-independent cohort included 13 patients with a median follow-up time of 22.0 (15.2-32.8) days and a median number of 2.0 (1.0-3.0) measurements; 6 (46.2%) were male (Table  S2). The outcome of LD eventually occurred in 7 (53.8%) of these patients, and significant differences in sex were observed. Applying the AS model to the data for 42 serum samples led to the correct assignment 29 (69.0%), 29 (69.0%), and 31 (73.8%) times when the cutoff was defined as 0.3, 0.5, and 0.7, respectively ( Fig 4A). The accuracies of the AS model in the external cohort were approximately 70%, similar to the AUCs for the development cohort (Fig 3B).

Construction of the nomogram and web-based calculator
Based on the final AS model, a nomogram at 14, 21, 28, 35, and 42 d after hospitalization was constructed using the Cox model ( Fig 4B). Fig S7 describes the use of the nomogram to determine the total score and risk of LD in a patient from the development cohort. A free calculator was established online to enable clinicians to automatically acquire a survival plot and survival probabilities (with 95% CI) of normal liver function in COVID-19 patients by entering values for ANGL3 and sex (https://xixihospital-liufang.shinyapps.io/ DynNomapp/) (Fig 4C).

Discussion
In this study, we developed and validated an AS model and built a free online calculator for the prediction of LD in COVID-19 patients with normal hepatic biochemical parameters on admission. Notably, the AS model only integrates two predictors, namely, a novel biomarker ANGL3 and sex, with maximized generalization and simplicity. The prediction model can be freely accessed at an online web server (https://xixihospital-liufang.shinyapps.io/DynNomapp/).
The predictor ANGL3 obtained among 1,517 proteins is a specific secreted protein originally expressed in the liver, enhancing expression of plasma cholesterol (Gaudet et al, 2017). Interestingly, it has been speculated that cholesterol facilitates SARS-CoV-2 infection by promoting viral spike protein-mediated entry via ACE2 and furin processing (Shoemark et al, 2021). Previous reports have also highlighted the pivotal role of host cholesterol metabolism in the viral life cycle during HCV infection (Foka et al, 2014). Taken together, ANGL3 might indirectly participate in SARS-CoV-2 viral replication through cholesterol metabolism, leading to aggravated LD. The positive association of ANGL3 with LD incidence observed in our study confirms this hypothesis.
The protein ADH4, one of the markers screened by the CoxBoost algorithm, is an enzyme involved in reducing retinaldehyde, which also regulates cholesterol metabolism (Pares et al, 2008). In this study, ADH4-based model calibration did not perform well, potentially because ADH4 is not a liver-specific enzyme and is expressed in many epithelial tissues of the body, such as the blood vessels, skin, cornea, and gastrointestinal mucosa (Pares et al, 2008).
Another predictor of the AS model was sex. Previous studies have reported that males are more prone to LD (Hundt et al, 2020;Wang et al, 2020), consistent with our results (Fig S3), probably resulting from smoking and alcohol use. Moreover, the net benefit comparison between the AS and the ANGL3.SM model showed comparable performance above a threshold of 0.35, suggesting that smoking might be a critical factor in LD.
In the present study, the AS model development followed the principle of at least 10 events per variable (Mallett et al, 2010;Peduzzi et al, 1996) to reduce the risk of overfitting caused by the small sample size of our cohort. With few exceptions, most published COVID-19 prediction models lack comprehensive evaluation of calibration, clinical utility, and external applicability (Wynants et al, 2020). Our study followed the TRIPOD guidelines for model development and evaluated all three aspects mentioned above. We also retained ANGL3 as a continuous variable without arbitrary categorization to avoid loss of information . Furthermore, time-dependent discrimination of the AS model provides a dynamic reference for clinicians to predict the appropriate incidence of LD for clinical decision-making.
Our study has several limitations. First, as a single-center and small sample size study, more extensive multicenter clinical research works are required to further validate the potential of the AS model for predicting COVID-19-related LD. Second, although the AS model-based nomogram and online calculator can identify individuals at high risk of LD among COVID-19 patients with normal hepatic biochemical parameters, our study cannot ascertain which treatment strategy might improve the outcomes of these patients. Third, to conclude, the simple AS model presented in this study can predict the incidence of LD earlier than the routine biochemical parameters alanine aminotransferase, aspartate aminotransferase, γ-glutamyl transferase, and total bilirubin. The nomogram and free web-based calculator make the prediction model easily accessible. COVID-19 patients with normal hepatic biochemistry might benefit from early prediction of the probability of subsequent LD to inform clinical decision-making.

Participants and outcome definitions
The retrospective study cohort was recruited from the Affiliated Hangzhou Xixi Hospital, Zhejiang University School of Medicine (Zhejiang, Southeast China; simplified as Xixi Hospital) from January 2020 to April 2020. Proteomic experiments were conducted among 660 serum samples from 109 COVID-19 patients at the Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University (Zhejiang, Southeast China). All serum samples were collected concurrently with clinical blood examination by nurses and subsequently stored at −80°C by two laboratory technicians. Demographic, clinical, and outcome data were collected from the Data Center Laboratory of Xixi Hospital.
Patients with PCR-confirmed positive SARS-CoV-2 RNA (according to the test methods described previously) (Liu et al, 2020) were screened for eligibility. This study is reported following TRIPOD guidance (Collins et al, 2015); it was approved by the Institutional Review Board of Xixi Hospital. Informed consent was waived because of the retrospective nature of this study.
The primary outcome was the subsequent LD after 7 d of admission. Definitions and diagnostic criteria of normal liver function, LD, and its grade (mild, moderate, severe) are summarized in Table  S3

Proteome analysis
Proteome experiment Serum samples were inactivated and sterilized at 56°C for 30 min and processed as in the previous study with some modifications . Briefly, 4 μl serum from each sample was depleted using High Select Top-14 Abundant Protein Depletion MiniSpin Columns (Thermo Fisher Scientific). Then elutes were denatured, reduced, and alkylated to derive protein lysates. The solutions were diluted with 200 μl 100 mM triethylammonium bicarbonate followed by a double-step trypsinization (enzyme-tosubstrate ratio kept at 1:40 in each step). 10% TFA was added to each sample to quench the enzymatic reaction. Digested peptides were cleaned up with SOLAμ (Thermo Fisher Scientific) according to the manufacturer's instructions and labeled by TMTpro 16plex reagents (Thermo Fisher Scientific). The reagents were all MS grade. 660 samples in total (633 serum samples for experiment and 27 randomly selected serum samples for quality control) were divided into 44 batches for the labeling experiment, each batch containing 15 samples and one pool. The labeled samples in each batch were combined and fractionated. Each sample was separated into 30 fractions and then consolidated into 10 fractions; dried and redissolved fractions with 2% acetonitrile (ACN)/0.1% formic acid of MS grade. The re-dissolved peptides were analyzed with a U3000 HPLC system coupled to an Orbitrap Exploris 480 (Thermo Fisher Scientific) in data-dependent acquisition mode combined to a front-end high-field asymmetric waveform ion mobility spectrometry (FAIMS). Peptides of each fraction were loaded onto a pre-column (3 μm, 100Å, 20 mm × 75 mm i.d.) and separated with a 75 min LC gradient at a flow rate of 300 nl/min (analytical column: . Dynamic exclusion was on, and mass tolerance was set to ±10 ppm. Intensity threshold was set at 20,000 for MS1, and HCD collision energy was fixed to 38%. MS/MS data were recorded in centroid mode. Isolation window was set to 0.7 m/z.

Proteomic data analysis
MS data for proteomics were processed with Proteome Discoverer (version 2.4.0.305; Thermo Fisher Scientific) against a manually annotated and reviewed Homo sapiens protein FASTA database (Swiss-Prot, 27 April 2020). For parameter settings, enzyme digestion was set to full-specific trypsin with two missed cleavages. Static modifications were TMTpro of lysine residues and N-terminus peptides and carbamidomethylation of cysteine. Dynamic modifications were set to oxidation of methionine and acetylation of N-terminus peptides. Mass tolerance for precursors and product ions was fixed at 10 ppm and 0.02 Da, respectively. The peptide-spectrum match allowed 1% target false discovery rate (strict) and 5% target false discovery rate (relaxed). 1,517 proteins in total were quantified. The protein abundance ratio of samples to the pooled sample within each batch was considered as the relative protein abundance ratio for the subsequent analysis.
Quality control of proteome data Multiple quality control of proteome data was performed as previously stated . Briefly, mouse liver digest was used as a standard sample for instrument performance evaluation. To avoid carry-over, a blank sample (buffer A) was run between every four sample injections. A pooled peptide sample labeled by TMT pro-126 was contained in each batch for aligning data from different batches and evaluation of quantitative accuracy. We first evaluated the quantitative ratio distribution of 633 samples. The outliers (above upper quartile) were imputed as two times the interquartile. Then we checked the correlation of 27 technical replicates by Pearson correlation coefficient (Fig S8A). R package limma was used to remove batch effects from experiments and instruments. There were no significant differences between 44 batches using principal component analysis (Fig S8B).

Candidate protein selection
Ninety-eight of the 109 patients with LD outcomes were enrolled as the discovery cohort for identifying biomarkers. We estimated the MDA for each of the 1,517 quantified proteins by the RF algorithm in the discovery cohort and prioritized 20 proteins with an MDA larger than 3 for subsequent analysis (Fig S1). In RF analysis, 500 trees were built with 10-fold cross-validation, and this was repeated 100 times. We performed univariable and multivariate Cox models, including age, sex, and BMI, and AUCs (equal to the concordance   Patient predictive values are located on the axis of each variable; a line is then drawn upwards at a 90°angle to determine the number of points for that particular variable. The sum of these numbers is located on the total score axis, and a line is drawn at a 90°angle downward to statistics) to estimate the ability of each protein to discriminate LD versus normal (Fig 2B and C). A protein-protein correlation plot was constructed for the 20 proteins ( Fig 2D). Some strong-correlation proteins were eliminated from candidate variates to minimize potential collinearity and overfitting bias. ANGL3, ALDOB, ADH4, ACY1, ADH1B, ARLY, and PSA4 were eventually retained (Fig 3A).

Selection of potential clinical predictors and model development
We included multiple clinical characteristics in the model development, including demographic variables, medical history, symptoms, and chest CT results on hospital admission. The demographic variables examined included age, sex, BMI, smoking status, highest temperature, systolic blood pressure, and diastolic blood pressure. Medical history included medications, HBV, hypertension, cardiovascular disease, tumor, HIV, HCV, chronic obstructive pulmonary disease, and diabetes. We also considered symptoms, such as cough, rhinorrhea, fever, pidiarrhea, rigor, nausea, dyspnea, muscular soreness, feebleness, headache, chest congestion, sore throat, and sputum. Chest CT results included normal, single pneumonia, and double pneumonia. Of 98 individuals in the discovery cohort, 74 patients hospitalized with normal liver function for 7 d were selected as the development cohort. Peak values of ANGL3, ALDOB, ADH4, ACY1, ADH1B, ARLY, and PSA4 within 7 d of admission to the hospital were subject to next-stage selection together with the other 31 clinical variables, as described above.
Given that time may affect the discrimination of the prediction model, the CoxBoost algorithm (an XGBoost-based survival estimator model) (Unterhuber et al, 2021) was applied to select the most potent predictors that could be retained in the final model. To maximize the generalizability and simplicity of the prediction model, we restricted the number of variables to 2, and the combination was set to contain at least one protein marker. Each combination was introduced into a Cox regression model in view of the long-term horizon for predictions (during inpatient and noninpatient periods) to quantify AUC values for given times.

Model selection
During selection, the discrimination (ability to distinguish individuals who will develop LD from those who will not, as quantified as the AUC), calibration (consistency between the predicted and observed probability, as evaluated by a calibration plot), and clinical utility (as quantified by the net benefit of a decision curve) were assessed among the above-combined prediction models (Steyerberg & Vergouwe 2014). An ideal calibration slope is 1, which suggests that the risk of observed outcomes matches the risk predicted. Decision curve analysis permits evaluation of clinical utility by quantifying the trade-off between correct discrimination of true positives and incorrect discrimination of false positives, as weighted based on the threshold probability (Vickers et al, 2019). The threshold probability represents the benefit ratio for the intervention. In this article, decision curve analysis was conducted at 28 d after hospital admission to quantify the net benefit of implementing the model in clinical practice. For all of the decision curves, data were fitted with a piecewise linear function among all replicates with the following algorithm: double yi 1 , yi 2 ; yi 1 = a 1 +k 1 xi 1 ; yi 2 = yi 1 +k 2 ðxi 2 − xi 1 Þ; if ðx < xi 1 Þ y = a 1 +k 1 x else if ðx < xi 2 Þ y = yi 1 +k 2 ðx − xi 1 Þ; else y = yi 2 +k 3 ðx − xi 2 Þ; Finally, the model that performed comprehensively best across discrimination, calibration, and clinical utility was used to construct a dynamic nomogram plot, which was then used to build a free online calculator.

Model validation
We utilized bootstrapping of 1,000 resamples (with replacement) and multiple fractional polynomial regression modeling to internally validate the stability of the final model using complete development case data.
To further validate the generalizability of the AS model, we collected 42 serum samples from 13 COVID-19 patients with normal hepatic biochemical parameters at Taizhou Hospital of Zhejiang province, affiliated to Wenzhou Medical University between January 2020 and February 2020 at the time of hospitalization for the external validation cohort. The sex information and the maximum values of ANGL3 within 7 d of hospital admission were selected for validation.
A result was considered statistically significant when the twotailed P-value was below 0.05. R software versions 3.6.3 and 4.0.5 (www.r-project.org) were used for the statistical analyses.

Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.