Evaluating LLMs on XR Devices: The AIvaluateXR Framework

The integration of large language models (LLMs) with extended reality (XR) devices is poised to revolutionize human-computer interaction. However, the challenge lies in identifying the right model-device combination for optimal performance. Enter AIvaluateXR, a new framework setting standards for benchmarking LLMs on XR platforms.

The Framework

AIvaluateXR isn't just another evaluation tool. It deploys 17 LLMs across four XR devices: Magic Leap 2, Meta Quest 3, Vivo X100s Pro, and Apple Vision Pro. Each combination undergoes rigorous testing on performance consistency, processing speed, memory usage, and battery consumption. That's a whopping 68 different model-device pairs being analyzed for their efficiency in real-time XR scenarios. But why does this matter?

In XR, real-time performance is important. Any lag or inefficiency can disrupt the immersive experience, making the choice of model-device pairing critical. AIvaluateXR sheds light on this by employing the 3D Pareto Optimality theory. This theory helps in selecting the best pairs based on both quality and speed objectives. The question is, can this framework set a new standard for future XR applications?

Beyond the Device

While AIvaluateXR predominantly focuses on on-device LLMs, it also compares these with client-server and cloud setups. The findings indicate that on-device LLMs can be surprisingly efficient, challenging the traditional reliance on cloud-based solutions. But is it time to ditch the cloud for XR?

Two interactive tasks further test the accuracy of these model-device pairs. The results provide insights into optimizing LLM deployment on XR, suggesting that on-device processing could be more than just a viable alternative. It's a call to action for researchers and developers to rethink their approach to XR device capabilities.

What's Next?

This study's implications extend beyond mere evaluation. It proposes a unified method that could become a standard for future research and development in XR. The framework is more than a static tool, it's a foundation for continuous innovation in the field.

To the skeptics: wouldn't you want a reliable benchmark that lays the groundwork for future technological advancements? AIvaluateXR offers precisely that, with code and data available at www.nanovis.org/AIvaluateXR.html for reproducibility and further exploration.

Evaluating LLMs on XR Devices: The AIvaluateXR Framework

The Framework

Beyond the Device

What's Next?

Key Terms Explained