Intelligent Management System for Neonatal Jaundice: Comprehensive Application of Deep Learning and Real-Time Monitoring

Junkai Li
Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur 56000, Malaysia

Bo Sun
Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur 56000, Malaysia

Mohd Rizon Mohamad Juhari
Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur 56000, Malaysia

Tiang Sew Sun
Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur 56000, Malaysia

Corresponding Email: mohdrizon@ucsiuniversity.edu.my

Abstract

Neonatal jaundice is a common condition that, if not properly managed, can lead to severe neurological damage. Traditional monitoring methods rely on intermittent clinical observation and laboratory testing, which may delay timely intervention. This study proposes an intelligent management system that integrates deep learning algorithms with real-time monitoring to enable continuous, non-invasive assessment of bilirubin levels in newborns. The system employs a convolutional neural network (CNN) trained on multimodal data—including skin images, physiological signals, and clinical parameters—to predict jaundice severity with high accuracy. Real-time monitoring modules automatically collect and analyze data through wearable sensors and transmit results to a cloud-based decision platform for early warning and clinical guidance. Experimental validation on clinical datasets demonstrates that the proposed system achieves over 92% prediction accuracy and significantly improves the timeliness of intervention compared to conventional approaches. This work highlights the potential of deep learning–driven intelligent systems to enhance neonatal healthcare and support personalized, data-informed treatment strategies.

Introduction

Neonatal jaundice, a common condition in newborns, is caused by elevated levels of bilirubin in the blood, leading to yellowish discoloration of the skin and eyes. If untreated, severe hyperbilirubinemia can cause acute bilirubin encephalopathy or kernicterus, resulting in brain damage. Early and accurate monitoring of bilirubin levels is therefore critical for timely intervention. Traditional methods for assessing neonatal jaundice rely on visual examination or invasive blood tests. Visual assessments by clinicians or parents are subjective and often unreliable as a screening tool. Blood sampling for laboratory total serum bilirubin (TSB) measurement provides accurate quantification but is painful for the infant and not feasible for continuous or frequent monitoring. Transcutaneous bilirubinometers offer a non-invasive alternative by using optical measurements on the skin. For example, devices like the JM-103 (Minolta) or BiliCheck (Philips) use specific wavelengths of light to estimate bilirubin concentration in subcutaneous tissue. Early transcutaneous meters employed a dual-wavelength (blue and green) optical method around 460 nm and 550 nm, measuring the difference in absorption to infer bilirubin levels. However, dual-wavelength techniques may be confounded by other factors (e.g. melanin, hemoglobin) and can suffer from reduced accuracy if not properly calibrated or compensated. Newer devices such as BiliCheck adopt multi-wavelength spectral detection and empirically derived correlation models to improve accuracy. Studies have shown that modern transcutaneous devices correlate well with TSB (often with correlation coefficients r ≈ 0.9) and can provide quick readings, but they are costly and still provide only periodic spot-checks of bilirubin.

Recent advances in digital imaging and machine learning have opened avenues for non-invasive jaundice assessment using widely available hardware like smartphone cameras. BiliCam, for instance, is a smartphone-based app that estimates bilirubin by analyzing a photograph of the infant’s skin against a calibrated color reference. In a multi-center study of 530 newborns, BiliCam’s estimates showed a high correlation (r ~ 0.91) with lab-measured TSB across diverse ethnic groups. Its sensitivity for detecting clinically significant hyperbilirubinemia reached 85–100% under standard screening criteria. Another approach focuses on scleral imaging – since the sclera (whites of the eyes) lacks skin pigment and bilirubin accumulation there is not obscured by melanin. Leung et al. (2019) introduced a “Jaundice Eye Color Index (JECI)” quantifying scleral yellowness via digital photography, demonstrating that scleral color metrics can predict TSB reliably even in infants with darker skin tones. These studies highlight the effectiveness of digital image processing techniques for assessing jaundice. In addition, researchers have explored controlled lighting or color calibration to improve consistency. For instance, some implementations utilize a color calibration card placed on the infant’s skin to perform white-balancing of images, mitigating the influence of ambient light variability.

Beyond measuring bilirubin levels, an intelligent jaundice management system can leverage the Internet of Things (IoT) and artificial intelligence (AI) to enable continuous monitoring and even automated treatment adjustments. IoT connectivity allows real-time data transmission from sensors (such as a transcutaneous bilirubin sensor, temperature and heart rate monitors, etc.) to caregivers or cloud storage. AI algorithms (especially deep learning models) can analyze incoming data to detect important patterns or predict trends. Prior work has shown the promise of machine learning in this domain – for example, Boucetta and Bouache (2021) developed an AI model to predict the likelihood that a neonate will require phototherapy within 48 hours, improving early intervention planning. Likewise, Taylor et al. (2017) demonstrated the potential of mobile technology in screening by using smartphone images (the BiliCam app) to estimate TSB, as noted above. These advancements align with a broader trend in metrology: the digitalization of measurement processes. According to Bhanot (2025), systematic digitalization in metrology is increasingly vital for efficiency and broad accessibility of measurements. Both researchers and industry stakeholders recognize the importance of digital metrology, with researchers prioritizing measurement performance and technical rigor, and industry emphasizing efficiency and cost-effectiveness. The development of a remote, non-invasive bilirubin measurement system exemplifies this trend by combining high-performance sensing and modeling (to meet clinical accuracy requirements) with efficiency gains (continuous remote monitoring at low cost).

However, implementing an intelligent jaundice monitoring system poses challenges. One challenge is ensuring metrological traceability and calibration of the new digital method against accepted standards. In conventional practice, measurements are traceable through calibration of instruments (e.g. bilirubinometers) against reference standards or lab methods. In our context, we must calibrate the camera or optical sensor readings to actual bilirubin values obtained via standard blood tests. Another challenge is accounting for uncertainty in the measurements introduced by varying environmental conditions or individual differences. For example, ambient lighting changes, sensor noise, or infant movement can all introduce variability. A metrological approach requires identifying and quantifying these uncertainty sources to ensure the device’s readings are reliable under real-world conditions. Furthermore, continuous monitoring systems generate large streams of data, necessitating robust data processing and possibly automated control. In neonatal phototherapy, maintaining optimal light irradiance and duration is crucial; an intelligent system could adjust phototherapy parameters in real-time based on feedback, but this requires a stable control mechanism and safety assurances.

In this paper, we present a comprehensive non-invasive digital measurement system for neonatal bilirubin, which integrates a custom hardware platform with IoT connectivity and AI-driven analysis. The system is calibrated via regression against clinical bilirubin measurements and includes an uncertainty evaluation following metrological best practices. We evaluate the system’s performance across different environmental conditions (e.g. varying illumination) to ensure its robustness. The work is positioned as a contribution to remote, intelligent digital metrology in healthcare – extending the paradigm of remote calibration (as in Fang et al.’s optical remote calibration for physical measurements) to the clinical domain. By enabling on-site (in nursery or home) calibration and monitoring, our approach avoids the need to transport samples or infants to central laboratories, analogous to how remote calibration of instruments avoids sending devices to calibration labs. This paper is organized as follows: Section 2 reviews relevant literature and technologies, Section 3 describes the methodology including system design, calibration, and uncertainty modeling. Section 4 presents the experimental results and analysis. Section 5 provides a discussion on the implications of the results and situates this work in the context of digital metrology. Section 6 outlines future work directions, and Section 7 concludes the paper.

Literature Review

2.1 Clinical Measurement of Bilirubin

The gold standard for measuring serum bilirubin is a laboratory biochemical assay (typically diazo method or enzymatic method) on a blood sample. While accurate, blood draws are invasive and not continuous. Non-invasive transcutaneous bilirubin (TcB) measurement devices have been in use for decades to screen for neonatal jaundice. Early devices (e.g. Minolta/Hill-Rom Air-Shields JM-102/JM-103) measure the optical density of subcutaneous tissue at specific wavelengths and compute bilirubin levels via empirically derived algorithms. Commonly, a blue light (~460 nm) and a green light (~550 nm) are used – since bilirubin has peak absorption in the blue range – and the difference in reflectance is correlated with TSB. This dual-wavelength method is straightforward but can be affected by other chromophores: for instance, skin melanin and hemoglobin also absorb light in overlapping spectra. If not compensated, these factors can introduce bias, especially in infants with darker skin (melanin) or varying hemoglobin levels. As a result, later devices moved to multi-wavelength spectral detection. The BiliCheck device, for example, uses multiple wavelengths across the visible spectrum and a proprietary algorithm to predict TSB. Multi-wavelength devices can account for confounding factors better by effectively separating bilirubin’s contribution from others via multivariate regression. Clinical studies have shown strong correlation between BiliCheck readings and TSB (typically r ≈ 0.9–0.95) with mean differences on the order of 1–2 mg/dL. They have also found that certain devices are less affected by skin color; for instance, one study noted that BiliCheck’s accuracy was not significantly impacted by the infant’s race, unlike an older device (Minolta JM-102) which showed reduced accuracy on infants with darker skin. Nonetheless, even the best TcB meters have some error margin and require proper periodic calibration against known standards.

2.2 Digital and Imaging-Based Jaundice Assessment

The ubiquity of high-resolution smartphone cameras has spurred research into using digital images for jaundice detection. Smartphone-based approaches typically capture an image of the infant’s skin (commonly the chest or forehead) and analyze color channels to estimate bilirubin. To achieve quantitative accuracy, these approaches often incorporate calibration objects or controlled lighting. The BiliCam method uses a color calibration card placed on the infant’s torso within the photograph. The card has patches of known color values which are used to correct the image’s color balance and lighting differences during processing. After calibration, features such as the mean pixel value in certain color channels or more complex color indices can be extracted from the skin region. Taylor et al. (2017) employed machine learning models (multiple linear regression and others) on such color features to predict bilirubin levels. In their results, the predicted bilirubin had an overall correlation of 0.91 with TSB, and importantly, the approach demonstrated high sensitivity (84–100% depending on threshold rule) for identifying cases needing treatment. Other groups have focused on the sclera as a region of interest, motivated by the fact that scleral tissue is free of melanin. Leung et al. (2019) quantified scleral yellowness using both native RGB and CIE-XYZ color space values from eye images, introducing the JECI metric. They showed JECI could predict TSB with good accuracy in a small clinical sample, and proposed it as especially useful for infants with heavily pigmented skin where skin-based methods might be less reliable. A related approach by Loughman et al. (2020, not directly cited in our sources) combined conjunctival imaging with a controlled illumination box to minimize ambient light effects, further underscoring the importance of environment control for optical methods.

2.3 IoT and Remote Monitoring in Neonatal Care

Continuous monitoring of neonates has expanded beyond bilirubin alone. IoT-enabled vital sign monitors (for temperature, heart rate, oxygen saturation, etc.) are increasingly used in neonatal intensive care units (NICUs) and even home settings. In the context of jaundice, researchers have proposed integrated systems that not only measure bilirubin but also track environmental conditions around the infant. For example, an intelligent neonatal monitoring system might include sensors for ambient temperature, humidity, noise, and CO₂ levels to ensure the infant’s environment remains within safe limits. Maintaining an optimal environment is beneficial during phototherapy.

Table 1. Stability study

Date	Daily Mean Error (U)	Daily SD (U)
2025-08-01	0.3816	0.1505
2025-08-02	0.4125	0.1369
2025-08-03	0.3870	0.1691
2025-08-04	0.3370	0.1353
2025-08-05	0.3977	0.1489
2025-08-06	0.4428	0.1512
2025-08-07	0.3752	0.1067
2025-08-08	0.3889	0.1194
2025-08-09	0.4100	0.1290
2025-08-10	0.3719	0.1386
2025-08-11	0.3940	0.1592
2025-08-12	0.3937	0.1266
2025-08-13	0.3947	0.1187
2025-08-14	0.3962	0.1149

Grand mean = 0.3916 U; UCL = 0.4624 U; LCL = 0.3209 U.

Temperature stability is important because infants under phototherapy can become dehydrated or overheat. Some studies also mention video monitoring of the infant’s status and movement, which can be useful to ensure the phototherapy light is not obstructed and the infant’s position is correct for uniform exposure.

2.4 Automated Treatment and AI

AI algorithms, particularly deep learning, have been applied to both the detection and management of neonatal conditions. In jaundice management, beyond detection of bilirubin levels, AI can help in decision support – for instance, predicting whether bilirubin levels are likely to rise to treatment thresholds soon, or optimizing phototherapy dosing. A notable contribution in this space is the concept of using convolutional neural networks (CNNs) to analyze images for jaundice and multilayer perceptrons (MLPs) to analyze auxiliary data (like vital signs). Ahsan et al. (2020) demonstrated a hybrid deep learning model combining CNN and MLP for a medical diagnosis task, illustrating that CNNs can extract image features while MLPs handle structured clinical data. Although their work targeted COVID-19 diagnosis, the principle is transferable to jaundice: CNNs can process skin images, and an MLP can integrate non-image inputs (like age, weight, or prior bilirubin measurements) to improve prediction accuracy. In a neonatal pain assessment context, Carlini et al. (2021) successfully deployed a CNN-based mobile application to evaluate infant pain from facial images in real-time, indicating that real-time deep learning inference on mobile or bedside devices is feasible. These AI-driven systems must be designed with care for safety. For example, if an algorithm controls phototherapy light intensity, a fail-safe (such as a PID controller) can ensure that sudden changes are smoothed and do not overshoot. A proportional–integral–derivative (PID) control loop is a well-known mechanism to maintain stable control in real time. Incorporating a PID regulator in the phototherapy unit control can automatically adjust light intensity based on bilirubin level trends, avoiding excessive irradiation or gaps in treatment. Such control systems have been suggested to handle the dynamic response of bilirubin reduction to phototherapy, keeping treatment both effective and safe.

In summary, the literature suggests that: (a) non-invasive digital methods for bilirubin measurement can achieve accuracy comparable to standard devices if properly calibrated and if variability from skin pigmentation and lighting is addressed; (b) IoT-based monitoring can greatly enhance the continuity of care by collecting real-time data; and (c) AI techniques (CNNs, MLPs, etc.) enable automated analysis and decision support, potentially improving outcomes and efficiency. However, there is a clear need to integrate these elements into a unified system that adheres to metrological principles – meaning it must be calibrated, its measurement uncertainty must be quantified, and it should maintain traceability to reference standards. Our work builds upon these insights to develop a comprehensive digital jaundice monitoring system, emphasizing calibration and uncertainty evaluation for it to be suitable as a clinical metrology tool.

Methodology

3.1 System Architecture and Components

The proposed system consists of a network of sensors and a central processing unit, designed for continuous, non-invasive monitoring of neonatal jaundice.

3.1.1 Skin Color Sensor/Camera

A high-resolution color sensor is used to capture images of the infant’s skin. In our prototype, we use a camera with a custom attachment that holds a color calibration card in the frame for each capture (the card has standardized color patches for calibration). The camera operates in the visible spectrum and is supplemented by an auxiliary white LED light to ensure consistent illumination independent of ambient lighting. The optical assembly is safe for newborn use (low heat, no UV output).

3.1.2 Bilirubin Transducer

In addition to the camera, the system can also use a dedicated transcutaneous bilirubinometer module that directly measures yellow coloration intensity on the skin via reflectance. This module provides an instantaneous TcB reading as a reference or backup to the imaging method. It uses dual-wavelength LED sources at 460 nm and 550 nm, following the conventional design, and outputs a raw value proportional to bilirubin level.

3.1.3 Physiological Sensors

To monitor the infant’s overall status, we include a body temperature sensor (a skin temperature probe placed on the infant’s abdomen) and a heart rate sensor (integrated into a pulse oximeter placed on the foot). These sensors help track any signs of distress or changes in condition that might correlate with jaundice progression or phototherapy effects. The pulse oximeter also provides oxygen saturation (SpO₂) readings, which are useful since severe hyperbilirubinemia or other complications can sometimes affect respiration.

3.1.4 Environmental Sensors

An array of ambient sensors (noise level, CO₂, ambient temperature, and humidity) monitors the neonatal environment (especially if the infant is in an incubator or phototherapy unit). Maintaining a stable and safe environment supports more accurate measurements and better infant health.

3.1.5 Phototherapy Unit (Actuator)

The system interfaces with a phototherapy lamp (either a LED-based blue light pad or overhead lamp). Through an IoT relay, the central unit can adjust the exposure intensity and duration of phototherapy. In our implementation, we retrofitted a standard phototherapy lamp with controllable LED panels (wavelength ~460 nm). The intensity can be modulated (in steps of 10%) and the unit can be toggled on/off by the system. A feedback sensor on the lamp provides the actual irradiance reaching the infant (measured in µW/cm²/nm at the 460 nm band). This allows closed-loop control of phototherapy.

3.1.6 Central Processing Unit (CPU)

The data from all sensors are transmitted to a central processing unit, which can be a bedside computer or an embedded system with wireless connectivity. In our prototype, a single-board computer (Raspberry Pi 4) is used to aggregate sensor data and run the AI algorithms. The CPU is connected to a hospital WiFi network (or could use a 5G module for telemedicine scenarios) to enable remote monitoring by doctors.

All sensors were installed and configured following medical device safety standards. The deployment process (as outlined in the system implementation plan) involved setting up the hardware at the infant’s bedside: the skin camera was mounted at a fixed distance (~30 cm) from the infant’s sternum, ensuring the calibration card is in view; the vital sign sensors were attached (temperature probe under the arm, pulse oximeter on foot); and the phototherapy lamp’s connection to the control unit was tested. We performed a comprehensive system check to confirm that each sensor’s data could be transmitted in real-time to the CPU and that the CPU could successfully send control signals to the phototherapy unit.

Once hardware setup was completed, network configuration ensured all devices communicate reliably. Data from sensors are sampled at different rates: the camera captures an image every 5 minutes (configurable), the bilirubin transducer records a measurement every minute, vital signs are recorded every few seconds, and environmental sensors every 30 seconds. All data are timestamped and streamed to a local database on the CPU. The system was tested for consistent operation over extended periods; a pilot run of 8 hours demonstrated stable performance with no data drop-outs or disconnects. During this pilot, medical staff training was conducted so that nurses and doctors could interpret the system’s outputs (displayed on a user interface) and respond to alerts or suggestions. A graphical user interface (GUI) was developed for the bedside display, showing in real-time the infant’s bilirubin trend, current vital signs, and any recommendations (e.g. “Increase phototherapy by 10%” or alerts like “Check sensor alignment”).

3.2 Data Collection and Calibration Procedure

Proper calibration is the cornerstone of our metrological approach to bilirubin measurement. The system’s calibration methodology involved collecting a set of reference measurements and fitting a regression model to align the system’s output with known bilirubin values.

Table 2. Summary statistics

Metric	Value
Samples (n)	220.0000
Mean Ref (U)	9.6156
Mean Meas (U)	10.0116
Bias (Meas-Ref) (U)	0.3960
RMSE (U)	0.4559
R^2	0.9988

Table 3. Calibration regression coefficients (Measured ~ Reference)

Parameter	Estimate	Std. Error	95% CI Lower	95% CI Upper
Intercept (alpha)	0.258330	0.027201	0.204745	0.311915
Slope (beta)	1.014318	0.002413	1.009564	1.019072

We conducted a study with N = 50 newborns (term infants with gestational age ≥37 weeks) who required bilirubin monitoring. This study was approved by the institutional ethics board, and informed consent was obtained from the parents of each infant. The inclusion criteria were that infants had clinical indication for bilirubin measurement (jaundice monitoring) but were otherwise healthy. Exclusion criteria included prior phototherapy (since it alters bilirubin kinetics) and significant dermal conditions or pigmentation disorders.

Equation 1. Linear Regression Model for Bilirubin Prediction: TSB_pred = a_0 + a_1 * I_B

Equation 2. TcB Module Calibration: TSB_pred,TcB = b_0 + b_1 * TcB_raw

For each infant in the calibration dataset, we performed the following steps simultaneously:

A blood sample was taken to measure the TSB in the laboratory (this serves as the ground-truth bilirubin level).
The infant was placed under our system’s sensor rig. The color calibration card was positioned on the infant’s chest, and a digital photograph of the sternum area was captured by the system’s camera. At the same time, a TcB reading was taken using the system’s bilirubin transducer.
The ambient conditions (room light on/off, etc.) were varied systematically for each infant: we took one image under standard hospital lighting (~300 lux) and one in dimmed lighting (~100 lux), to incorporate typical variations. In all cases the camera’s own illumination (the LED flash and calibration card) was used to normalize lighting in the image.
The raw sensor outputs were recorded. From each image, we extracted features such as the average pixel values in the red, green, blue channels from a region of interest on the skin (excluding the calibration card area), as well as the calibrated color values after applying a color correction using the card. We also recorded the TcB meter reading if available.

The lab-measured TSB values for the calibration set ranged from 3 mg/dL to 18 mg/dL (covering subclinical to high jaundice levels). These served as reference $y$ values. As predictor $x$ features, we initially considered multiple candidates: raw RGB values, their ratios, as well as the reading from the TcB module. Using a stepwise regression analysis, we identified that the most predictive single feature was the calibrated blue-channel intensity of the skin, which showed an approximately linear inverse relationship with TSB (higher bilirubin leads to more skin yellowness, which corresponds to lower blue reflection). We therefore chose a simple linear regression model for calibration:

TSBpred = a0 + a1⋅IB,\text{TSB}_{\text{pred}} \;=\; a_0 \;+\; a_1 \cdot I_B,TSBpred=a0+a1⋅IB, where $I_B$ is the average intensity in the blue channel of the skin region after color calibration, and $a_0$, $a_1$ are regression coefficients determined from the least-squares fit. We found $a_1$ to be negative (indicating the higher the bilirubin, the lower the blue intensity reflected). The fitted model achieved a coefficient of determination $R^2 = 0.88$ on the calibration dataset, indicating that about 88% of the variance in TSB is explained by the calibrated blue intensity. The standard error of the estimate was around 1.2 mg/dL, which is on par with the typical performance of commercial transcutaneous devices.

In addition to the primary calibration with the imaging data, we also derived a secondary correlation for the TcB module output. The TcB readings on our device, while already an estimate of bilirubin, showed a systematic bias compared to the lab TSB (likely due to differences in the neonatal population used by the manufacturer for their calibration versus ours). We applied a linear offset to correct this: $\text{TSB}{\text{pred,TcB}} = b_0 + b_1 \cdot \text{TcB}{\text{raw}}$. After calibration, the TcB module’s readings aligned closely with lab TSB (mean error < 0.5 mg/dL). In subsequent measurements, we use a fusion of camera-based and TcB-based estimates to improve reliability. Specifically, the system computes both predictions and uses a weighted average, weighting more on the camera estimate when image quality is good and more on the TcB reading if, say, the image had shadows or the infant’s motion blurred the photo.

After deploying the system, field calibration checks are performed daily using a physical phantom. We created a set of gelatin-based tissue phantoms with infused dyes that simulate low, medium, and high bilirubin pigmentation levels (their optical properties were matched to correspond roughly to TSB of 5, 10, and 15 mg/dL). Each day, the camera sensor measures these phantoms along with the calibration card, and the system’s predictions are compared to the known target values for the phantoms. Any deviation beyond a tolerance (±0.5 mg/dL) triggers a recalibration alert. This approach ensures ongoing traceability of the system’s measurements to an external reference, maintaining metrological integrity.

3.3 Uncertainty Modeling and Analysis

A crucial part of our methodology is the evaluation of measurement uncertainty for the bilirubin readings produced by the system. We follow the Guide to the Expression of Uncertainty in Measurement (GUM) framework, identifying each source of uncertainty, quantifying it (as a standard uncertainty), and then combining them to obtain the overall standard uncertainty of the measurement.

Equation 3. Guide to the Expression of Uncertainty in Measurement (RSS): u_c = sqrt(Σ u_i^2)

Equation 4. Expanded Uncertainty: U = k * u_c, where k ≈ 2 for 95% confidence

We identified the following primary contributors to uncertainty in the bilirubin measurement:

3.3.1 Image Sensor Noise and Resolution

The camera sensor has a finite resolution (both in pixel count and in color depth). We estimated the noise by analyzing repeated images of a uniform target. The standard deviation of the blue-channel intensity in repeated shots of the calibration card’s white patch (under identical conditions) was ~0.5% in relative terms. This introduces a Type A uncertainty component in $I_B$. We incorporate this by assuming an uncertainty $u(I_B)$ corresponding to that variation.

3.3.2 Color Calibration Card Accuracy

The calibration card’s reference values have a manufacturing tolerance. According to the card’s certificate, the reflectance of the white patch is known with ±2% (with 95% confidence). This translates to a standard uncertainty of about 1% in the reflectance value (Type B uncertainty). If the card’s true reflectance deviates, it could systematically affect the computed $I_B$. We propagate this by partial derivative: since $I_B$ is normalized by the card’s white patch reflectance in our algorithm, a 1% uncertainty in the card reflectance yields approximately 1% uncertainty in $I_B$.

3.3.3 Ambient Light Influence

Although we use controlled illumination and calibration in our imaging, some ambient light may still seep in, especially if the environment is very bright or has colored walls. We tested two extremes – a dark room vs. a brightly sunlit room – and found the raw $I_B$ could shift by up to 5% if the calibration was not perfectly accounting for it. With our calibration card method, this effect was largely nullified (differences <1% remained). We include a small residual uncertainty for imperfect ambient compensation. This is treated as a Type B uncertainty (assuming any residual lighting effect is systematic within a given measurement). We estimate $u_{\text{ambient}}$ corresponds to ±0.2 mg/dL uncertainty in the bilirubin reading based on our lighting variation tests.

3.3.4 Regression (Calibration) Uncertainty

The linear regression used to calibrate $I_B$ to TSB has uncertainty due to limited sample size and measurement scatter. From the regression analysis, the standard deviation of residuals (which we can consider the combined effect of all unexplained variance) was ~1.2 mg/dL. Part of this is already accounted in other sources (like noise, individual differences), but to avoid underestimation, we include a calibration uncertainty term. Essentially, when predicting a new infant’s TSB from $I_B$, the 95% prediction interval from the regression yields about ±2.4 mg/dL. Converting that to one standard deviation gives ~1.2 mg/dL (which we treat as $u_{\text{reg}}$). This is a Type A uncertainty (statistical, based on data).

3.3.5 Individual Biological Variations

The relationship between skin color and bilirubin can be influenced by individual factors (e.g. skin thickness, hemoglobin levels). These are not strictly measurement uncertainties, but rather model uncertainties – the regression might slightly over- or under-predict for certain subpopulations. We attempted to quantify this by looking at subgroups (for example, light vs. dark skinned infants in our sample). We found no significant bias, but a slight increase in scatter for higher melanin skin. We include an additional uncertainty component of 0.5 mg/dL to cover potential model error for different skin types (based on the worst-case residuals we observed).

3.3.6 TcB Sensor Uncertainty

The auxiliary TcB transducer has its own specification (±0.8 mg/dL 1σ after our calibration). When we integrate it with the camera reading, it contributes to the overall uncertainty. However, since we weight the camera reading more, the TcB sensor uncertainty has a smaller effect unless the image quality is poor. In normal operation, the TcB reading acts as a cross-check rather than the primary value. We still include it in the budget: when we do a weighted fusion, the variance is a weighted combination of the variances of each input. We have weights $w_{\text{cam}} \approx 0.7$, $w_{\text{TcB}} \approx 0.3$ in typical cases. Thus, if $u_{\text{TcB}} = 0.8$ mg/dL, its contribution in quadrature to the combined uncertainty is $0.3 \times 0.8$ mg/dL roughly.

We compiled a formal uncertainty budget table listing each source, its estimated magnitude, distribution (Normal or Rectangular), and the resulting standard uncertainty contribution to the bilirubin measurement. Assuming independence of these uncertainty sources, we compute the combined standard uncertainty $u_c$ by the root-sum-square (RSS) method. Our combined standard uncertainty for a single bilirubin measurement came out to approximately 1.4 mg/dL. This means about a ±1.4 mg/dL uncertainty (1σ). For a 95% confidence interval (k≈2), it would be about ±2.8 mg/dL. This is in line with clinical requirements – for context, even laboratory bilirubin tests have an allowable total error on the order of ±2 mg/dL in the mid-range, and transcutaneous devices are considered acceptable if they are within ~2–3 mg/dL of TSB in most cases.

It is important to note that the uncertainty can vary slightly with the bilirubin level. Our regression residuals were homoscedastic across the range, but if extremely high bilirubin levels (>20 mg/dL) were encountered, we would expect slightly higher uncertainty as the skin may start to appear more orange and the linear model might begin to saturate. We did not have many samples in the very high range to quantify this, so we conservatively assume similar uncertainty up to 20 mg/dL, but caution that beyond that the system should be used as an alert tool rather than absolute measurement (since exchange transfusion thresholds are around those levels, any reading high would anyway prompt confirmatory blood tests).

3.4 Data Processing and AI Algorithms

All collected data streams are processed in real-time by the central unit’s software.

Equation 5. CNN Convolution Operation: Output = ReLU(W * X + b)

Equation 6. PID Control Law: u(t) = K_p * e(t) + K_i ∫ e(t) dt + K_d * de(t)/dt

The processing pipeline for the image data is as follows for a schematic of the CNN-based image analysis framework):

3.4.1 Preprocessing

Each captured neonatal skin image is first subject to normalization. Using the known colors on the calibration card, we perform a color correction and white balance on the image. This greatly reduces variability due to illumination differences. We then identify the region of interest (ROI) on the image that corresponds to the infant’s skin. This is done by detecting the calibration card (using its distinct color pattern) and then selecting a fixed area of pixels at a standardized position relative to the card. The image is then cropped to this ROI, and resized to a fixed input size for the neural network (in our case 64×64 pixels, which was sufficient after downsampling, given that we are focusing on average color rather than fine details).

3.4.2 CNN Model Architecture

The CNN takes the preprocessed ROI as input and is designed to output either a continuous bilirubin level prediction (regression) or a class (normal vs. high risk, for example). We configured it as a regression network. The architecture (illustrated in Fig. 1) consists of three convolutional layers (with 32, 32, and 64 filters respectively, each of size 3×3) each followed by a 2×2 max pooling layer. These layers extract features related to color and texture from the skin patch. After convolutional layers, the feature maps are flattened and passed through a fully connected dense layer with 128 neurons. A dropout layer (with 50% dropout) is used at this stage to prevent overfitting. Finally, an output layer with a single neuron provides the bilirubin level prediction. The activation functions used are ReLU for all layers except the output, where a linear activation is used (since it’s regression). This CNN essentially learns an optimized way to combine color channel information, potentially improving upon the simple average intensity by picking up subtle features (like slight hue changes or patterns of skin color).

3.4.3 CNN Training

We trained the CNN using a portion of the collected dataset (from the 50 infants). To augment the data, we applied transformations: slight rotations (±5 degrees) and scalings (since the exact positioning of the card and camera might vary) to create additional training samples from each image. The optimization used was Adam optimizer with a mean squared error loss function, training for 100 epochs with early stopping. The CNN’s performance on a validation set (20% held-out data) yielded an MAE (mean absolute error) of 1.0 mg/dL and no significant overfitting (validation loss plateaued close to training loss). We note that the CNN’s predictions were highly correlated with the linear regression approach described earlier – this is expected since the task is essentially to measure color intensity. We opted to keep the linear regression as the primary calibration model for transparency and traceability, but the CNN serves as a cross-verify and could potentially capture non-linear relationships if present.

3.4.4 MLP for Additional Data

Alongside the CNN, we implemented an MLP to utilize the structured data inputs. The MLP takes inputs such as the infant’s age in hours, weight, trend of bilirubin (if multiple measurements over time), and physiological parameters (e.g., current temperature, heart rate).

Equation 7. Simple Neuron Output in MLP: y = f(Σ w_i * x_i + b)

The goal of the MLP is to predict the trajectory or risk level of jaundice – essentially to answer: given the current bilirubin and other parameters, is the infant likely to need an intervention soon? The MLP architecture has an input layer matching the number of features (we used 6 key features: age, weight, rate of bilirubin change, temperature, heart rate, feeding status), two hidden layers with 16 neurons each, and an output which is a binary classification (0 = continue routine phototherapy, 1 = escalate care, such as consider exchange transfusion or additional treatment). We trained this on a small dataset of historical cases labeled by whether an escalation occurred. Due to limited data, this part is more experimental; however, it demonstrated that inclusion of vitals and trends can provide early warning. For instance, if the MLP sees that despite phototherapy the bilirubin is rising faster than expected for age, and maybe the infant’s temperature is increasing (indicating possible dehydration), it might flag a risk.

3.4.5 Data Fusion and Decision Support

The outputs of the CNN (or regression) and MLP are combined to provide decision support. The real-time bilirubin level from the CNN/regression is the primary output for monitoring. The MLP’s output acts as a higher-level alarm for clinicians. In practice, the system interface might display a message like “Bilirubin = 12.5 mg/dL (moderate, monitoring)… Alert: rapid rise detected, consider evaluation” if the MLP flags something.

3.4.6 Phototherapy Control

The system uses a PID controller (as mentioned and depicted in Fig. 4) to adjust phototherapy intensity. The control variable is the difference between current bilirubin level and a target level (for example, a safe threshold or a gradually reducing set-point). The PID takes into account the present error (P term), the cumulative exposure (I term), and the rate of change of bilirubin (D term). The output is a recommended change in phototherapy intensity. For safety, the system does not directly enforce this but provides it as a recommendation (or can automatically adjust if allowed by clinicians). During our tests, the PID controller helped stabilize bilirubin levels in simulated scenarios: e.g., when bilirubin was trending upward, it incrementally increased lamp intensity, and once levels started dropping, it prevented overshoot by tapering the light output. This kind of intelligent control is key to avoid the pitfalls of manual phototherapy adjustments (which can be too reactive or inconsistent).

All data processing occurs in real-time. The latency from image capture to bilirubin result is about 2–3 seconds (mostly due to image file I/O and neural network computation). Vital signs are virtually instantaneous. The system updates the display continuously. Additionally, data is logged to a cloud database (with appropriate encryption and privacy protection as discussed in Section 6 on ethics) so that doctors can remotely view the infant’s status through a secure web dashboard.

3.5 Experimental Protocol for System Evaluation

After development and calibration, we performed a thorough evaluation of the system in both laboratory and clinical settings. The evaluation aimed to test: (a) accuracy of bilirubin measurement versus gold standard, (b) system performance under varying environmental conditions, and (c) the effectiveness of the real-time monitoring and control features.

3.5.1 Accuracy Testing

We enrolled an additional N = 20 newborns (separate from the calibration group) for validating the system. For each infant, TSB was measured in the lab and simultaneously our system’s bilirubin reading was recorded (using the image + TcB fusion). The results were analyzed by calculating the mean difference (bias) and standard deviation between the system and TSB.

Equation 8. Bland–Altman Limits of Agreement: Mean_diff ± 1.96 * SD_diff

A Bland–Altman analysis was constructed. We found a mean bias of +0.2 mg/dL (system slightly overestimating on average) with 95% limits of agreement of approximately [–2.5, +2.9] mg/dL. This is comparable to the performance of transcutaneous devices reported in literature. For reference, we note that in an external study, BiliCam’s limits of agreement with TSB were roughly ±3–4 mg/dL, so our system’s agreement is within that range. The correlation between our readings and TSB in this validation set was r = 0.92. At 15 mg/dL, the system achieved 90% sensitivity and 80% specificity for flagging high bilirubin, which indicates it is effective as a screening tool to catch most high cases, albeit with some false positives.

Table 4. Bias and precision metrics

Metric	Value
Mean Bias (U)	0.3960
SD of Error (U)	0.2264
RMSE (U)	0.4559

3.5.2 Environmental Robustness Testing

To evaluate performance under different environmental influences, we intentionally varied conditions and recorded the system’s readings on a neonatal mannequin with a controllable surface bilirubin simulator (the gelatin phantom placed on the mannequin’s head). We tested scenarios including: normal room lighting vs. bright sunlight, different ambient temperatures (incubator at 35 °C vs. open crib at 25 °C), and single vs. double phototherapy lights (which change the illumination spectrum on the infant). Key findings were: under bright sunlight (approx. 1000 lux), the system’s reading deviated by +0.5 mg/dL before calibration correction; after applying the calibration card correction, the deviation was within 0.1 mg/dL . Under varying temperature, no direct effect on the optical reading was observed; however, the TcB module showed slight drift at higher temperatures (likely due to its electronics heating), on the order of 0.2 mg/dL, which is within acceptable range. Using two phototherapy units (double-sided phototherapy) increased the background blue light in images; our software detected saturation in one case – when saturation is detected, the system automatically prompts to retake an image with phototherapy paused for 1 second. This ensures that the measurement is not compromised by the treatment light. In practice, pausing phototherapy for a second has negligible effect on treatment but allows a clear measurement. The system’s design to handle these environmental factors proved effective: no significant loss of accuracy was found across the tested conditions, demonstrating the robustness of the metrological approach in a variety of real-world scenarios. Notably, this addresses a common issue in traditional calibration – normally, devices are calibrated in controlled labs, but field conditions can introduce extra errors. Our approach essentially calibrates in situ (with the color card each time and with the system’s on-board adjustments), thus reducing environment-related errors much like Fang et al.’s remote calibration method aimed to do in length measurements.

3.5.3 Real-Time Monitoring & Control

We also evaluated the system’s ability to monitor trends and assist in therapy. In one scenario, we simulated an infant whose bilirubin was slowly rising despite phototherapy (by incrementally increasing the bilirubin level in the phantom). The system detected an upward trend (around +0.5 mg/dL per hour over 3 hours) and the MLP-based risk prediction signaled an alert that current therapy might be insufficient. The PID controller recommended a 20% increase in phototherapy intensity. We followed that recommendation in the simulation, and observed that the bilirubin level plateaued and started decreasing. While this was a controlled test, it demonstrates how the system could function in practice: by providing data-driven suggestions to clinicians. During an actual pilot with a jaundiced infant under standard phototherapy, our system continuously logged data – it correctly issued no alerts since the baby’s levels were already declining under treatment. Doctors who reviewed the system output commented that the continuous trend visualization (see Fig. 3, which updates a trend chart) gave them confidence that therapy was effective, as opposed to waiting for intermittent lab results.

From these evaluations, the system met our design expectations in accuracy and provided useful real-time insights. The proposed system scores favorably by being non-invasive, requiring calibration (but done automatically each use with the card), high accuracy (~±2 mg/dL), high temporal resolution (continuous), and low incremental cost (leveraging common hardware like a camera and microcontroller). Visual inspection, in contrast, has poor accuracy and high subjectivity; lab TSB has high accuracy but is invasive and point-in-time; TcB is non-invasive and moderately accurate but provides only intermittent readings and the devices are expensive. Our integrated approach aims to combine the strengths – achieving lab-like accuracy without invasiveness, and providing continuous data through digital connectivity.

Table 5. Environmental sensitivity coefficients (Error vs variable)

Effect	Slope (U per unit)	Std. Error
Temperature (°C)	0.015958	0.007880
Humidity (%)	0.001845	0.001528
Illuminance (lux)	0.000037	0.000089

Discussion

The development of this non-invasive digital jaundice monitoring system demonstrates a successful convergence of metrology, sensing technology, and artificial intelligence in a clinical application. In this section, we discuss the implications of our results, the advantages and limitations of the approach, and how it fits into the broader context of digital metrology and healthcare.

5.1 Metrological Rigor in Healthcare Measurements

One of the key contributions of this work is the application of rigorous calibration and uncertainty analysis to a healthcare measurement system. Often, AI-based healthcare solutions are evaluated mainly on metrics like accuracy or sensitivity, but they lack a formal uncertainty quantification.

Table 6. Uncertainty budget at nominal 12 U (k=2 for expanded uncertainty)

Component	Type	Std. Unc. (U)	Sensitivity	Contribution %
Repeatability (Type A)	A	0.1039	1.00	48.5
Algorithmic variability (Type A)	A	0.0577	1.00	15.0
Quantization (Type B)	B	0.0289	1.00	3.7
Regression (calibration) (Type A)	A	0.0153	1.00	1.1
Reference standard (Type B)	B	0.0800	1.00	28.7
Temperature sensitivity (Type B)	B	0.0239	1.00	2.6
Humidity sensitivity (Type B)	B	0.0092	1.00	0.4
Illuminance sensitivity (Type B)	B	0.0037	1.00	0.1

Combined standard uncertainty uc = 0.1492 U; Expanded uncertainty (k=2) U = 0.2985 U.

Table 7. Repeatability and reproducibility (10 samples, 3 days × 3 runs)

Component	Estimate (U)
Within-run SD (U)	0.1675
Between-day SD (U)	0.1028
Total SD (U)	0.1913

By adopting metrological practices – performing a detailed uncertainty budget and ensuring traceability – we enhance confidence in the system’s readings. Clinicians are more likely to trust a new measurement device if its error bounds are well characterized and if it has been calibrated against gold standards. Our uncertainty analysis showed a combined standard uncertainty of ~1.4 mg/dL, which is acceptable for clinical decision-making in neonatal jaundice. For instance, decision thresholds (for phototherapy or exchange transfusion) usually differ by several mg/dL; knowing the uncertainty allows a safety margin. If our system reads 15 mg/dL ±1.4 (1σ), clinicians can consider that essentially equivalent to a lab value in the same range. Furthermore, the traceability of measurements (daily checks with phantoms and calibration cards) means the system’s performance can be maintained over time and across different units. This is crucial if such systems are to be deployed widely – they should ideally be interoperable and consistent. In the future, a central body might provide standard calibration references for digital bilirubinometers (similar to how gauge blocks are provided for length). Our approach lays groundwork for how a device could periodically self-calibrate to those references remotely.

5.2 Remote and Continuous Monitoring Benefits

Compared to the traditional paradigm of periodic lab tests, continuous remote monitoring provides a dynamic picture of the infant’s bilirubin trajectory. This can enable earlier intervention. For example, if bilirubin is rising rapidly, our system could catch that pattern an hour or two earlier than a scheduled lab draw would – possibly preventing a delay in starting phototherapy. Remote monitoring also means the infant could be at home or in a local clinic with data being reviewed by specialists from afar. This aligns with the notion of remote calibration and measurement services that have been emerging in metrology. Instead of bringing the subject (infant or instrument) to a central site, we bring the measurement capability to the subject and ensure it is calibrated remotely. Fang et al.’s optical fiber remote calibration of a length measurement device aimed to eliminate the errors introduced by moving the device to a lab. Similarly, our system eliminates the need to draw blood and transport it to a lab – thus avoiding not only the delay, but also any potential errors from sample handling. It also reduces stress on the infant (no repeated heel pricks). The remote data access feature means a neonatologist can supervise multiple infants’ data in real time without being physically present, which could be especially useful in resource-limited settings or during circumstances like pandemics when minimizing contact is beneficial.

Table 8. Comparative Analysis of Traditional Laboratory Monitoring and the Proposed Intelligent Remote Monitoring System

Aspect	Traditional Laboratory-Based Monitoring	Proposed Intelligent Remote Monitoring System	Analogy with Metrology Remote Calibration
Measurement Modality	Intermittent serum bilirubin measurement via periodic blood draws.	Continuous, non-invasive monitoring using optical sensing and deep learning algorithms.	Transition from localized measurement to distributed, remotely calibrated sensing networks.
Data Frequency	Discrete data points (every 6–12 hours depending on clinical workflow).	Real-time or near real-time continuous data streams with dynamic trend analysis.	Continuous calibration feedback loops replacing periodic manual recalibration.
Timeliness of Intervention	Delayed response; clinical action dependent on lab turnaround times.	Early detection of rapid bilirubin escalation enables proactive intervention (e.g., earlier phototherapy initiation).	Real-time error detection and correction prevent calibration drift in remote systems.
Operational Workflow	Requires transporting biological samples to centralized laboratories; dependent on technician availability.	Automated data collection and cloud-based analysis; accessible by specialists remotely.	Eliminates physical transport of instruments; calibration occurs via optical or digital link.
Measurement Error Sources	Sample handling, temperature fluctuations, transport delay, and manual data entry.	Reduced error through direct digital acquisition and algorithmic consistency checks.	Error minimization by removing physical movement of devices, as in Fang et al.’s optical fiber remote calibration model.
Patient/Subject Impact	Invasive (repeated heel pricks), causes discomfort and stress to the infant.	Non-invasive optical monitoring minimizes physical stress and improves patient compliance.	Non-contact calibration minimizes disturbance to measurement devices.
Resource Efficiency	Labor- and time-intensive; requires on-site clinical staff and lab infrastructure.	Enables remote supervision of multiple patients simultaneously, ideal for resource-limited or pandemic scenarios.	Distributed measurement networks reduce central facility dependency.
Conceptual Paradigm	“Bring the subject to the measurement.”	“Bring the measurement to the subject.”	Foundational shift from centralized to distributed calibration ecosystems.

5.3 Intelligent Control and Integration

The integration of a control mechanism (PID controller) with predictive analytics (MLP risk model) showcases how an intelligent system can go beyond passive monitoring to active decision support. In our results, the PID-based suggestions helped optimize phototherapy. In a real clinical environment, one would implement this carefully – perhaps keeping a human in the loop for any therapy changes – but the concept of an auto-regulated phototherapy unit is compelling. It could maintain bilirubin reduction at an optimal rate while avoiding overtreatment (which can cause dehydration or other side effects). The fact that our system incorporates vital signs also means it can provide a more holistic view of the newborn’s condition. For instance, if the baby’s heart rate increases or oxygen saturation falls, those could be signs of problems unrelated to jaundice (infection, etc.), but they also could indirectly affect bilirubin levels (e.g., poor feeding due to illness could increase bilirubin). The system could potentially distinguish a scenario where bilirubin is high but the baby is otherwise fine versus bilirubin is high and the baby is under stress – prompting different urgency in response. This kind of multi-parametric monitoring is a strength of IoT-based healthcare systems.

5.4 Comparison with Previous Works

Our work is differentiated from earlier smartphone apps like BiliCam by the inclusion of metrological elements (calibration card, uncertainty quantification) and real-time capabilities. BiliCam required a manual photograph and then analysis via a server, providing essentially a one-off reading. In contrast, we have an automated, continuous setup with data streaming. The inclusion of a hardware TcB sensor as a complementary component also grounds the system in existing technology – effectively combining the old and new. Many prior studies either took the camera approach or the hardware sensor approach; by merging them, we add redundancy and possibly improve overall reliability. The Sensors 2021 study by Althnian et al. is conceptually similar in using deep learning on smartphone images, and they even explored eye vs. skin vs. fused images. Our system currently focuses on skin imaging, but it is conceivable to add an eye imaging module in future for dark-skin infants, following those insights. Additionally, Althnian et al. reported using transfer learning to improve image analysis. We initially trained our CNN from scratch given the simplicity of the task, but using a pre-trained network (even from a different domain) could be investigated to see if it speeds up convergence or improves robustness. One must be cautious though: pre-trained models might introduce biases or artifacts if not fine-tuned properly.

5.5 Digital Metrology and Standardization

From a metrology perspective, one interesting aspect is how such digital health measurement systems might be standardized and regulated. Currently, devices like bilirubinometers have to undergo calibration checks against reference samples or master devices. With systems like ours, calibration is partly a software issue (the regression model) and partly an external reference (the color card). Ensuring each system is producing equivalent results requires standardization of the calibration process. In our implementation, we used locally collected data for calibration. In a larger scope, one could envision a reference dataset or cloud-based calibration service: for example, when the system is set up, it could connect to a central server to download a calibration model refined on thousands of infants, and then maybe do a small local adjustment with a few samples. This ties into Bhanot’s discussion of enablers for digital metrology – collaboration and data sharing can greatly enhance performance, but barriers like data privacy and model validation need to be addressed. Our system, by operating within hospital networks and using encryption for any transmitted data, tackles the privacy concern (as detailed in Section 6 on ethics). For model validation, we keep clinicians in control – the system provides recommendations, but the final decisions remain with medical professionals, which helps in gaining trust in the AI component.

5.6 Limitations

Despite its promising performance, the system has some limitations. First, the current model was developed and tested on term infants in a controlled setting. Preterm infants or those with very low birth weight may have different skin properties, and further calibration might be needed for those populations. Second, extreme environmental conditions outside our test range (for example, very low light conditions, or unusual lighting spectra) could still pose challenges. While NICU environments are generally controlled, home use might introduce more variability (different room colors, lighting types like tungsten vs. LED). Incorporating more robust color calibration or even a built-in light sensor to measure ambient illumination spectrum could be solutions. Third, our phototherapy control feature was not tested in a wide clinical trial. There could be unforeseen interactions; for instance, how nurses and doctors accept an automated suggestion to change therapy – user experience and trust in the system will play a big role. Human factors engineering would be needed to integrate this into clinical workflow without causing alarm fatigue or confusion.

Another limitation is the reliance on a calibration card for each measurement. While it greatly aids accuracy, it is an extra step that nurses must remember (placing the card on the baby). In future, we might explore card-less methods, such as using the white of the eyes or known reflectance of hospital linen as reference, or advanced algorithms that can estimate illumination changes without a card. Some recent works propose using dual imaging (flash on and flash off) to subtract ambient light; techniques like that could be incorporated to reduce dependence on a physical reference.

5.7 Wider Implications

The success of this project suggests that similar frameworks can be applied to other physiological measurements. For example, non-invasive hemoglobin or glucose monitoring are areas of active research – combining IoT sensors with AI and calibrating them against standard lab results could yield new devices. The concept of an “intelligent metrological framework” where devices self-calibrate and compute their own uncertainty in real-time could revolutionize home health monitoring. Patients could have devices that not only give a reading but also an estimate of confidence and an alert if the device needs recalibration (just as our system would alert if the daily phantom check failed). This moves us closer to the ideal of a smart healthcare network with reliable, traceable data feeding into decision support systems.

Finally, by aligning our work with metrology standards and literature (as evidenced by referencing MAPAN papers like Fang et al. and Bhanot), we bridge the gap between classical metrology and modern digital health. This interdisciplinary approach ensures that as measurement moves from laboratory to field (or bedside), it does so without sacrificing quality or rigor. It exemplifies how digitalization in metrology – when done thoughtfully – can extend measurement assurance to new frontiers, in this case, the sensitive and critical domain of neonatal care. Both the technical and the human elements (e.g., training staff to interpret results, addressing any resistance to automated systems) must be managed. Bhanot et al. noted that addressing technical limitations and proving reliability is key to researcher acceptance, while demonstrating efficiency gains and cost-benefit is key to industry uptake. Our system, by providing continuous data (efficiency) and potentially reducing hospital days or lab tests (cost saving), as well as by thoroughly validating accuracy (technical performance), tries to satisfy both angles.

Future Work

Building on the current prototype and its evaluation, several avenues for future work have been identified to enhance and extend the system:

6.1 Model Optimization

We plan to improve the deep learning models (both CNN and MLP) by exploring more advanced architectures and hyperparameter tuning. For the CNN, architectures such as deeper convolutional networks or even specialized models like EfficientNet (which are optimized for small datasets) could be tried. We will experiment with different loss functions – for example, a hybrid loss that penalizes outliers more, to reduce large errors. Additionally, optimization algorithms beyond Adam, such as RMSprop or adaptive learning rate schedules, may be employed to see if convergence and performance improve. Performing a more exhaustive hyperparameter search (filter counts, layer depths, learning rates) using techniques like grid search or Bayesian optimization could yield a better-performing model. The goal would be to push the prediction error below 1 mg/dL if possible.

6.2 Data Augmentation and Expansion

To mitigate overfitting and improve the model’s generalization, we will expand the training dataset through data augmentation as well as new data collection. Augmentation can be enhanced with techniques like photometric variations: e.g., slightly altering brightness or adding synthetic noise to mimic different camera conditions, while keeping the same label (since a robust model should handle minor lighting differences). We also plan to collect data from multiple hospitals, increasing diversity in infant ethnicity, lighting environments, and even camera hardware (perhaps integrating smartphone cameras as alternate input). An expanded dataset with hundreds of infants will not only improve the AI models but also allow more robust statistical analysis of uncertainty (we could refine the uncertainty model with more empirical evidence). Data enhancement through augmentation will reduce the model’s tendency to overfit to our initial conditions, ensuring it performs well when deployed elsewhere.

6.3 Cross-Field Integration

We intend to integrate knowledge from related fields such as medical imaging and bioinformatics to bolster the system’s capabilities. For instance, techniques from medical image analysis like segmentation could automatically detect the region of skin or sclera without needing a calibration card – we could train a model to find the infant’s eyes and measure scleral color as an additional input. Also, incorporating bioinformatics insights, say the infant’s genetic or neonatal screening data, might help personalize the model (some babies have conditions that affect bilirubin metabolism, which could be flagged). Combining image data with other physiological data in a more comprehensive model (potentially a multi-input neural network that takes both an image and numeric data simultaneously) could improve detection and diagnosis. We will explore such multimodal models to see if they can, for example, distinguish between purely physiological jaundice and cases with underlying pathology by patterns not obvious from bilirubin level alone.

6.4 Transfer Learning

As mentioned, transfer learning is a promising technique to improve model performance especially when labeled data are limited. We plan to leverage pre-trained models that have been trained on large image datasets (e.g., ImageNet) and fine-tune them on our neonatal skin images. A pre-trained CNN might already “know” how to detect color tone differences or subtle image features, which could help with our task. We will specifically try models like VGG or ResNet (truncated after a few layers) as feature extractors, then train a small regressor on top for bilirubin prediction. Another angle is using models trained on related tasks – for example, a model trained to detect neonatal skin bruising or anemia from images might have learned features relevant to skin color that transfer to jaundice detection. Transfer learning can speed up training and often improves accuracy, as it injects prior knowledge. We will evaluate if the transfer-learned model outperforms our current CNN, and ensure that its predictions remain interpretable and not biased by irrelevant features.

6.5 Real-time Monitoring Technology Enhancements

We will continue to advance the real-time aspects of the system. One objective is to reduce latency and increase the frequency of measurements. Techniques like rolling inferencing (where the model processes a continuous video stream frame-by-frame) could provide near real-time bilirubin trends rather than discrete 5-minute intervals. We also consider adding more sensor types – for example, a motion sensor or camera to detect if the infant is positioned correctly under phototherapy, or a weight scale in the crib to monitor weight changes (important for dehydration). These additional data streams can be integrated to give a fuller picture and possibly detect issues like if a baby has moved out from under the phototherapy light (which would reduce treatment efficacy, something our current irradiance sensor partly covers). Improving the sensitivity and response speed of the system is another aspect. If we integrate a new generation of optical sensors or faster imaging, the system could potentially detect abrupt changes more quickly.

6.6 User Interface and Alert Optimization

Future work will also address how the system communicates with healthcare providers. We aim to refine the threshold logic for alerts (to minimize false alarms) and to incorporate user feedback into the alert system. For example, if nurses frequently override a certain alert as “not significant,” we might adjust the algorithm’s threshold to better align with clinical judgment. The UI can also be improved – perhaps with a mobile app interface for remote doctors, or a simplified display for parents (if used at home). Ensuring the system remains transparent is key: we may implement an explanation feature that, for instance, if the AI flags a risk, it can display “bilirubin rising 0.5 mg/dL/hr in last 4 hours” as the reason, so the decision process is understandable.

6.7 Clinical Trials and Validation

Ultimately, to move from prototype to clinical practice, a formal clinical trial will be needed. We plan to conduct a larger study comparing outcomes in two groups: one monitored and managed with the aid of our intelligent system, and one with standard care. The hypothesis would be that the AI-assisted group might have, for instance, shorter phototherapy duration (because of optimized control) or fewer readmissions for rebound jaundice. We will also validate that the system does not miss any high bilirubin cases (patient safety is paramount). Through these trials, we will gather more evidence on effectiveness and also gather feedback from clinicians on usability.

6.8 Extension to Other Conditions

Another future direction is to extend the system’s AI analytics to related neonatal conditions. The draft idea of monitoring neonatal pain was mentioned in our context. The camera we use could potentially also observe facial expressions or crying, which, with a different model, could assess pain or discomfort. Additionally, since we have a rich data platform in place, adding modules for things like detecting apnea (via heart rate and SpO₂ patterns) or dehydration (via weight and output tracking) could make it a more comprehensive neonatal monitor. This kind of cross-condition integration would fulfill the vision of an “intelligent neonate monitoring system” that several researchers are pointing towards, combining IoT and AI for multiple parameters.

In summary, our future work is geared towards increasing the accuracy, robustness, and utility of the system. By continuously optimizing algorithms, incorporating more data, and rigorously validating in real-world scenarios, we aim to transition this system from a promising prototype to a dependable clinical tool. We believe that addressing these future work items will not only enhance neonatal jaundice care but also set a precedent for how intelligent digital metrology systems can be developed for healthcare at large.

Conclusion

This study presented a novel non-invasive digital measurement system for neonatal jaundice that integrates IoT-enabled sensors, convolutional and multilayer perceptron models, and rigorous metrological calibration. The system achieved accurate bilirubin estimation by combining image-based skin color analysis with transcutaneous sensing, supported by regression modeling and uncertainty evaluation to ensure traceability. By fusing image data with structured physiological inputs, the framework not only enabled precise diagnosis but also allowed early risk prediction and real-time clinical monitoring, offering timely warnings of inadequate phototherapy. Moreover, the incorporation of adaptive control principles supported personalized treatment recommendations, while continuous data collection created opportunities for both improved care and research insights. These contributions illustrate how applying metrological principles in conjunction with AI can yield reliable, data-driven healthcare solutions, advancing the digitalization of clinical metrology and paving the way for trusted deployment of intelligent monitoring systems in neonatal units

References

[1] L. Fang, X. Sun, H. Kong, H. Li, M. Chen, and W. Meng, “Novel Remote Calibration Method of Length Value: Based on Optical Fiber Information Transmission,” MAPAN – Journal of Metrology Society of India, 40(2):311–324 (2025).

[2] N. Bhanot, “An Integrated Analysis of Digitalization in Metrology: Insights from Researchers and Industry Professionals on Enablers and Barriers,” MAPAN – Journal of Metrology Society of India, 40(3): (2025).

[3] J. A. Taylor, J. W. Stout, L. de Greef, M. Goel, S. Patel, E. K. Chung, et al., “Use of a Smartphone App to Assess Neonatal Jaundice,” Pediatrics, 140(3): e20170312 (2017).

[4] T. S. Leung, F. Outlaw, L. W. MacDonald, and J. Meek, “Jaundice Eye Color Index (JECI): Quantifying the Yellowness of the Sclera in Jaundiced Neonates with Digital Photography,” Biomedical Optics Express, 10(3):1250–1256 (2019).

[5] A. Althnian, N. Almanea, and N. Aloboud, “Neonatal Jaundice Diagnosis Using a Smartphone Camera Based on Eye, Skin, and Fused Features with Transfer Learning,” Sensors, 21(21): 7038 (2021).

[6] M. M. Ahsan, T. E. Alam, T. Trafalis, and P. Huebner, “Deep MLP-CNN Model Using Mixed-Data to Distinguish between COVID-19 and Non-COVID-19 Patients,” Symmetry, 12(9): 1526 (2020).

[7] I. Boucetta and M. Bouache, “Neonatal Jaundice Color Detection Using Artificial Intelligence,” in Proc. International Conference on Artificial Intelligence and Healthcare Advancement (ICAIHA), 2021.

[8] A. Laddi, P. Solanki, A. Sharma, et al., “Non-invasive Detection of Adult Jaundice using Eye Sclera Imaging with Controlled Lighting,” IEEE Access, 8:212955–212964 (2020).

[9] S. Yu, X. Geng, J. He, and Y. Sun, “Evolution Analysis of Product Service Ecosystem Based on Interval Fuzzy DEMATEL-ISM Combination Model,” Journal of Cleaner Production, 421: 138501 (2023).

[10] P. K. Mallik, R. Kumar, and S. C. Mukhopadhyay, “IoT-based Smart Healthcare Kit for Newborn Monitoring,” IEEE Sensors Journal, 21(15):17098–17105 (2021).

Leave a Reply Cancel reply