Abstract
Face-to-face interaction is one of the most natural forms of human communication. Unsurprisingly, Video Conferencing (VC) Applications have experienced a significant rise in demand over the past decade. With the widespread availability of cellular devices equipped with high-resolution cameras, Instant Messaging Video Call Applications (IMVCAs) now constitute a substantial portion of VC communications. Given the multitude of IMVCA options, maintaining a high Quality of Experience (QoE) is critical. While content providers can measure QoE directly through end-to-end connections, Internet Service Providers (ISPs) must infer QoE indirectly from network traffic—a non-trivial task, especially when most traffic is encrypted. In this paper, we analyze a large dataset collected from WhatsApp IMVCA, comprising over 25,000 s of VC sessions. We apply four Machine Learning (ML) algorithms and a Large Multimodal Model (LMM)-based agent, achieving mean errors of 4.61%, 5.36%, and 13.24% for three popular QoE metrics: BRISQUE, PIQE, and FPS, respectively.
| Original language | English |
|---|---|
| Article number | 4450 |
| Journal | Sensors |
| Volume | 25 |
| Issue number | 14 |
| DOIs | |
| State | Published - 1 Jul 2025 |
Keywords
- Large Multimodal Models
- encrypted traffic
- machine learning
- quality of experience
- video conferencing
ASJC Scopus subject areas
- Analytical Chemistry
- Information Systems
- Atomic and Molecular Physics, and Optics
- Biochemistry
- Instrumentation
- Electrical and Electronic Engineering