Development and evaluation of a computer vision algorithm for quantification of children's microactivities.
Academic Article
Overview
abstract
BACKGROUND: Estimates of microactivity (e.g., hand- and object-to-mouth contact) frequencies are essential for modeling children's environmental exposures but are challenging to obtain due to the time and human costs of manually labeling behaviors from pre-recorded videos. OBJECTIVES: We aim to develop and evaluate a computer vision model to quantify microactivities for young children. METHODS: The vision model was trained and validated using video footage (collected via four concurrent Go-Pro cameras) of 25 children 6-18 months playing in their homes in Baltimore, MD. We leveraged computer vision techniques to develop an algorithm to assess children's pose by identifying and tracking 3D key points (e.g., locations of children's eyes, hands, wrists, elbows, etc.). We enabled automatic measurement to track the distance between the child's hands and mouth in every video frame. When the distance reached a minimum threshold, the model logged a "contact event." We compared the timing and number of events for three microactivities (left- and right-hand-to-mouth, and object-to-mouth) yielded by the vision model to the outputs from comparable human behavioral coding. RESULTS: Our method recognizes children's microactivities. The timing and number of contact events detected were accurate (96-99%) on a second-level basis with minimal counting errors (<0.04-2.18 per video). We observed higher rates of object-to-mouth contacts (mean = 27 contacts/h) compared to hand-to-mouth contacts (mean = 3 contacts/h). IMPACT: This study developed and evaluated a computer vision method for accurately identifying and quantifying young children's hand-to-mouth and object-to-mouth contacts from collected video, greatly reducing the costs and burden of generating microactivity data needed for soil and dust exposure modeling.