Context: Growth analyses have traditionally been done by either non-structural descriptive statistics or by fitting models. While we usually describe height and weight separately, we assume reciprocity of weight and height on each other. We utilize ML to predict ages 718 y height based on height and weight data up to age 6y.
Methods: After pre-processing the height and weight, primary and secondary features (height SDS, BMI, growth velocity) of 1596 subjects (798 boys) age 019 y from the longitudinal GrowUp 1974 Gothenburg cohort with emphasis (~20 measurements) on infancy (012 mo) were utilized to train multiple regressors: Linear, MultiLayer Perceptron (MLP), Decision Tree, Random Forest. For evaluating the accuracy of the model for each learning algorithm and choosing the best regressor, we cross validated the system 5-fold, and the out-of-sample performance was tested on 5X100 other subjects and 600 additional subjects of the same study. We then validated the system with the Edinburgh Longitudinal Growth Study cohort of 180 subjects that were measured at 3, 6 mo intervals from age 020.
Results: Random Forest Regressor with 51 trees produce the most accurate predictions. The best predicting features (top of the tree) are sex and heights at age 3.36.0 y. Accuracy of the predictions against actual final height (R-square) increase from 0.580 at age 7 and 0.592 at age 13 to 0.837 at age 18, when actual heights are 173.8±9.2 (S.D.) and predicted heights 173.3±8.0 cm, with a prediction error of −0.4±4.0 cm. Verification of prediction for 600 additional GrowUp children show prediction/actual R2 of 0.76; predictions accuracy correlates negatively with age 18 height (P=1.8e-15). The final height of thin 6 y.o. children (1st quartile BMI) were underestimated as compared to 4th quartile (P=5.2e-4). When the algorithm is tested on the Edinburgh cohort, accuracy is 0.38 and prediction errors are 2.0±7.2 cm.
Conclusions: 1. ML and AI is used here successfully for the first time to predict adult height based on early (<=6y) height and weight. 2. Prediction accuracy at age 06y for age 18 y height is better than the bone-age based TW3 method. 3. The best features for prediction are sex and heights at age 3.3-6.0 y, when childhood growth velocity has stabilized. 4. Prediction errors are greater for tall (overestimates) and thin subjects 5. The success of ML strongly depends on the structure of data sampling and cannot be easily inferred between dissimilar cohorts.
27 - 29 Sep 2018
European Society for Paediatric Endocrinology