Multimodal machine learning for video based single question mental health assessment
Authors: B Grimm, P Yilmam, B Talbot, L Larsen
Published: 2025
Publication: npj Digital Medicine, 2025 - nature.com
A multimodal machine learning model using text (MPNet) and voice (HuBERT) analysis predicts depression, anxiety, and trauma from a single video-based question with strong performance and demographic consistency while significantly reducing assessment time.
Methods: Multimodal analysis combining MPNet for textual data and HuBERT for prosodic voice features trained on video-based responses.
Key Findings: Efficient prediction of self-reported scores for depression (PHQ-9), anxiety (GAD-7), and trauma (PCL-5) from brief video responses.
Limitations: Potential biases in self-reported data and reliance on willingness of participants to use video-based assessments.
Institution: Videra Health
Research Area: Computational Mental Health Assessment, Multimodal Machine Learning
Discipline: Computational Health,Digital Medicine
Sample Size: 2420 participants