Interpreting Infrared Thermography with Deep Learning to Assess the Mortality Risk of Critically Ill Patients at Risk of Hypoperfusion

Background: Hypoperfusion, a common manifestation of many critical illnesses, could lead to abnormalities in body surface thermal distribution. However, the interpretation of thermal images is difficult. Our aim was to assess the mortality risk of critically ill patients at risk of hypoperfusion in a prospective cohort by infrared thermography combined with deep learning methods. Methods: This post-hoc study was based on a cohort at high-risk of hypoperfusion. Patients’ legs were selected as the region of interest. Thermal images and conventional hypoperfusion parameters were collected. Six deep learning models were attempted to derive the risk of mortality (range: 0 to 100%) for each patient. The area under the receiver operating characteristic curve (AUROC) was used to evaluate predictive accuracy. Results: Fifty-five hospital deaths occurred in a cohort consisting of 373 patients. The conventional hypoperfusion (capillary refill time and diastolic blood pressure) and thermal (low temperature area rate and standard deviation) parameters demonstrated similar predictive accuracies for hospital mortality (AUROC 0.73 and 0.77). The deep learning methods, especially the ResNet (18), could further improve the accuracy. The AUROC of ResNet (18) was 0.94 with a sensitivity of 84% and a specificity of 91% when using a cutoff of 36%. ResNet (18) presented a significantly increasing trend in the risk of mortality in patients with normotension (13 [7 to 26]), hypotension (18 [8 to 32]) and shock (28 [14 to 62]). Conclusions: Interpreting infrared thermography with deep learning enables accurate and non-invasive assessment of the severity of patients at risk of hypoperfusion.


Introduction
Tissue hypoperfusion is a common manifestation of many critical illnesses and is also one of the major contributing factors of in-hospital death [1][2][3].For patients at risk of hypoperfusion, clinicians should recognize the relevant risk factors, assess the current severity, take necessary interventions, and monitor consequent changes.In recent years, investigators have been exploring methods, which are easy to be used and non-physician dependent, to assess the severity of patients with hypoperfusion.These tools, such as lactate, skin mottling and capillary refill time (CRT) [4], however, are difficult to reconcile simplicity with accuracy.
Physiologically, the continuity and quantity of skin blood flow is reduced when tissue perfusion deteriorates, which in turn results in uneven thermal distribution on the body surface [5].Several studies have found that surface temperature differences and trajectories correlate with the prognosis of patients with sepsis [6,7].Our group have established a prospective cohort gathering infrared images of the legs of critically ill patients at high risk of hypoperfusion, collecting routine hypoperfusion parameters and following up on their prognosis.Based on these data, we defined parameters reflecting thermal inhomogeneity of body surface (e.g., low temperature area rate [LTAR] and standard deviation [SD]) using traditional mathematical methods and found that these parameters varied among patients with different severity of hypoperfusion and could be used to predict the risk of mortality [8].
However, the accuracy of interpreting body surface infrared images based on conventional algorithms to predict mortality risk is not yet very satisfactory.As we have known, the body surface thermal distribution is visually a two-dimensional grey-scale image and, in principle, deep learning algorithms (especially convolutional neural networks), which are excellent at supervised image recognition tasks, can identify and interpret the information behind these thermal images [9] and thus enabling more accurate prediction of the severity for patients at high risk of hypoperfusion.
We performed this post hoc analysis of the cohort dataset with the aim of developing deep learning models to interpret infrared thermography to assess the mortality risk of patients at risk of hypoperfusion.

Patients
This study conducted a post-hoc analysis of a 373patient cohort at risk of hypoperfusion from a cardiac surgical intensive care unit (ICU) during a one-year period (June 2020 to May 2021) [8].This cohort was established with the approval of the Ethics Committee of Zhongshan Hospital, Fudan University (Number B2020-057).Patients with any high-risk factors of hypoperfusion were enrolled, including hypotension, cardiac dysfunction, tachycardia, hyperlactatemia, oliguria, skin mottling or prolonged CRT [8].The exclusion criteria were patients <18 years, pregnant, severe arterial or cutaneous abnormalities, other conditions impeding acquisition of complete image or expected to be transferred out of ICU within 24 hours [8].

Data Collection
Data for this study were obtained from body surface infrared images of the original cohort [8].Thermal images of patient's legs (below the perineum and above the ankle) were acquired by an infrared thermography (A615, 640 × 480 pixels, ±0.05 °C, Teledyne FLIR LLC, CA, USA) in the supine position and then converting to temperature matrix by using the official FLIR tools software.Background values outside the lower limb area were removed.With the thermal data, several parameters of thermal inhomogeneity (SD, Kurtosis [10], Skewness [10], Entropy [11] and LTAR) were calculated.The LTAR was defined as the proportion of the leg area with a temperature lower than 10% of the maximum temperature [8].Demographics, routine laboratory examinations, conventional circulatory and hypoperfusion parameters were collected.The dose of vasopressor or inotropes was transferred to vasoactive inotropic score (VIS) [12].

Outcome Definitions
The circulatory status was divided into three categories, i.e., normotension, hypotension (systolic blood pressure [SBP] <90 mmHg or vasopressor use) or shock (hypotension and hyperlactatemia [lactate ≥4 mmol/L]) [13], for comparisons among subgroups.All patients were followed up until death or hospital discharge and the hospital mortality was used as the primary outcome.

Statistical Analysis
Data were presented as the mean ± SD (if normal) or median with interquartile range (IQR) (if non-normal) or to-tal numbers with percentage and compared with Student's t-test or Wilcoxon (or Friedman) rank-sum test or Fisher's exact test, as appropriate.The temperature matrix of lower limbs were used as input.Deep learning models were constructed using convolutional neural network frameworks.Depending on the backbone and the number of layers, there were six models: Alexnet [14], Mobilenet v3 [15], Shufflenet v2 (1.0 or 1.5) [16], Resnet (18 or 34) [17] (Fig. 1 & Supplementary Fig. 1).The final outputs of the models were the risk of mortality, ranging from 0-100%.
Receiver operating characteristic (ROC) curves were generated and the areas under the ROC curves (AUROC) were calculated to evaluate the predictive accuracy for mortality risk.Sensitivity, specificity, positive and negative predictive values (PPV and NPV) and associated 95% confidence intervals (CI) were calculated based on the cutoff value as determined by the Youden Index.The gray zone of best cutoff and patients in the gray zone was also calculated [18].Calibration plot and Brier score were used to assess the agreement between predictions and observations.In addition, we used a 5-fold cross-validation to assess internal validity.The relationships among model outputs and conventional variables were explored by two-dimensional histograms with Loess regression curves.Statistical analyses were performed using Python (version 3.9, Python Software Foundation, Delaware, USA) and R (version 4.1.1,R Foundation for Statistical Computing,Vienna, Austria), p < 0.05 was considered statistically significant.

Models Validation
Supplementary Fig. 2 shows the calibration curves for each deep learning model.ResNet (18) had the best calibration curve performance and the lowest Brier score at 4.8, followed by ResNet (34), while several other models had much worse calibration curve performance.In the cross-validation (Supplementary Fig. 3), the AUROCs of the deep learning models all had fluctuations, but their average values were still relatively consistent with the values in Table 1.Of these, ResNet (18) has the most consistent performance in terms of folds.

Mortality Risk Derived from Deep Learning Model in Different Perfusion Status
For subgroups with normotension, hypotension and shock, they had decreased mean arterial pressure (78 to 70 to 65 mmHg) and urine output (1.3 to 1.2 to 0.8 mL/kg/h, p < 0.001) and increased lactate (1.8 to 1.9 to 7.0), ∆PCO 2 (6.9 to 7.8 to 8.9), CRT (1.1 to 1.3 to 1.6 s) and occurrence of skin mottling (2 to 2 to 16%), but there was no significant differences in ScvO 2 (71 vs. 69 vs. 69%, p = 0.225).For the thermal inhomogeneity parameters, LTAR increased from 1 to 3 to 7%, while SD increased from 0.81 to 0.88 to 0.94 °C.In addition, we found that the risk of mortality derived from deep learning models also exhibited increasing trends in normotension, hypotension and shock patients (Table 2).For example, the mortality probability given by Resnet (18) steadily increases from 13 to 18 to 28% (Table 2).

Mortality Risk Derived from Deep Learning Model Correlated with Conventional Parameters
There were general correlations between the output of the deep learning model, i.e., the risk of mortality, and parameters of conventional circulation, hypoperfusion and thermal inhomogeneity.As the risk of mortality increases, patient's perfusion pressure (MAP) gradually decreased and parameters reflecting the severity of hypoperfusion (CRT, VIS, lactate) gradually increased, along with thermal inhomogeneity parameters (LTAR and SD) (Fig. 3).

Discussion
This study was conducted with a cohort dataset of critically ill patients at risk of hypoperfusion.Deep learning algorithms were developed to interpret the information contained in the infrared thermographic images of the patients' legs.Of them, the model based on the residual network had superior accuracy in predicting mortality risk and demonstrated general correlations with conventional perfusion parameters and the severity of hypoperfusion.
Medical scientists have long noted that local changes in blood flow or metabolism can lead to thermal abnormal-ities, which can then be used to diagnose diseases, such as breast cancer [19,20] and arterial stenosis [21].Such an approach focusing on changing in thermal parameters has been expanded to the intensive care units recently.Peripheral to central temperature gradient was found to be correlated with perfusion pressure and cardiac output [5].Besides, toe-to-room temperature gradient could reflect the severity of sepsis [6].Nagori et al. [22] also used deep learning to interpret whole-body infrared images to achieve prediction of the probability of shock in pediatric patients.
Combing infrared thermography and deep learning algorithms to study hypoperfusion has great potential in making more accurate predictions of patient's mortality risk.Traditionally, CRT is a single, non-invasive, easily accessible, and most prognostically relevant parameter of hypoperfusion.Our previous work showed that infrared thermography-based parameters of inhomogeneity in body surface thermal distribution, such as LTAR and SD, had similar accuracies to CRT and could achieve higher predictive precision when used in combination (AUROC: 0.865) [8].
Despite good interpretability, the accuracy of algorithms constructed on the basis of conventional methods has reached a ceiling and there is little potential for further improvement.Considering that the body surface thermal distribution is a two-dimensional grey-scale image, we can apply deep learning algorithms, particularly convolutional neural networks [9,23,24], to exploit the information behind these images and thus make more accurate predictions about patients' risk of mortality.
In this study, six models were constructed using a deep learning framework based on convolutional neural networks.These models also varied in accuracy, complexity and the amount of computation required to process each sample.AlexNet introduces a Rectified Linear Unit (ReLU) function with a simpler architecture and faster training, achieving an accuracy (AUROC: 0.79) similar to that of CRT and LTAR in this study population.MobileNet, with its hardware-aware network architecture search, could realize higher accuracy (AUROC: 0.82) in a relatively short time.ShuffleNet has a much lighter architecture.We tried the second version, which is currently the most mainstream, but its accuracy improvement was not significant (AUROC: 0.68 and 0.79).ResNet adopted shortcut connections within every stage, so that the stacked layers learn the residual information.The ResNet (18) model is well balanced, with high accuracy (AUROC: 0.89 and 0.94) and moderate model complexity and requirements of computing power.
In the near future, we propose to create an online tool to help other healthcare providers to use our models.
Our study has several limitations.Firstly, this post-hoc study was based on data from a single center.In this dataset, temperature distribution data were measured only once per patient.In the future, validation using external data has also been planned.Secondly, only one image was taken per patient.Dynamic monitoring would be more helpful for clinical management.Thirdly, some patients are excluded because of lower limb vascular disease or pregnancy, reducing the applicability of the population to some extent.Foutrh, the infrared thermography which was not calibrated with a blackbody had better sensitivity than accuracy.Finally, the present study was based primarily on the cardiac critical illness population.The accuracy of the prediction model in other critically ill populations, particularly sepsis, needs to be validated.

Conclusions
The interpretation of infrared thermography images using deep learning algorithms enables non-invasive and more accurate assessment of the risk of mortality in critically ill patients at risk of hypoperfusion.

Fig. 1 .
Fig. 1. Construction of a predictive model based on convolutional neural networks.

Fig. 2 .
Fig. 2. Receiver operating characteristic curves for deep learning models and conventional hypoperfusion or thermal parameters.