ESPE2018 Free Communications GH & IGFs (6 abstracts)
aBioinformatics Knowledge Unit, The Lokey Interdisciplinary Center for Life Sciences, Technion-Israel Institute of Technology, Haifa, Israel; bPediatric Department, Bnei Zion Medical Center, Haifa, Israel; cPediatric Endocrinology, Clalit Health Service, Haifa, Israel; dFaculty of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel; ePediatric Endocrinology Department, UCL Great Ormond Street Institute of Child Health, London, UK; fDepartment of Pediatrics, Institute of Clinical Sciences, Göteborg Pediatric Growth Research Center (GP-GRC), Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; gDepartment Physiology/Endocrinology; Institute Neuroscience and Physiology, Sahlgrenska Academy at Gothenburg University, Gothenburg, Sweden; hFaculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
Context: Growth analyses have traditionally been done by either non-structural descriptive statistics or by fitting models. While we usually describe height and weight separately, we assume reciprocity of weight and height on each other. We utilize ML to predict ages 718 y height based on height and weight data up to age 6y.
Methods: After pre-processing the height and weight, primary and secondary features (height SDS, BMI, growth velocity) of 1596 subjects (798 boys) age 019 y from the longitudinal GrowUp 1974 Gothenburg cohort with emphasis (~20 measurements) on infancy (012 mo) were utilized to train multiple regressors: Linear, MultiLayer Perceptron (MLP), Decision Tree, Random Forest. For evaluating the accuracy of the model for each learning algorithm and choosing the best regressor, we cross validated the system 5-fold, and the out-of-sample performance was tested on 5X100 other subjects and 600 additional subjects of the same study. We then validated the system with the Edinburgh Longitudinal Growth Study cohort of 180 subjects that were measured at 3, 6 mo intervals from age 020.
Results: Random Forest Regressor with 51 trees produce the most accurate predictions. The best predicting features (top of the tree) are sex and heights at age 3.36.0 y. Accuracy of the predictions against actual final height (R-square) increase from 0.580 at age 7 and 0.592 at age 13 to 0.837 at age 18, when actual heights are 173.8±9.2 (S.D.) and predicted heights 173.3±8.0 cm, with a prediction error of −0.4±4.0 cm. Verification of prediction for 600 additional GrowUp children show prediction/actual R2 of 0.76; predictions accuracy correlates negatively with age 18 height (P=1.8e-15). The final height of thin 6 y.o. children (1st quartile BMI) were underestimated as compared to 4th quartile (P=5.2e-4). When the algorithm is tested on the Edinburgh cohort, accuracy is 0.38 and prediction errors are 2.0±7.2 cm.
Conclusions: 1. ML and AI is used here successfully for the first time to predict adult height based on early (<=6y) height and weight. 2. Prediction accuracy at age 06y for age 18 y height is better than the bone-age based TW3 method. 3. The best features for prediction are sex and heights at age 3.3-6.0 y, when childhood growth velocity has stabilized. 4. Prediction errors are greater for tall (overestimates) and thin subjects 5. The success of ML strongly depends on the structure of data sampling and cannot be easily inferred between dissimilar cohorts.