Google’s New LiteRT Accelerator Redefines On-Device AI Speed for Snapdragon-Powered Phones
AI on your phone just got a serious upgrade — and it’s not just a small step forward. Google’s latest move with the new LiteRT accelerator, built on Qualcomm’s cutting-edge AI Engine Direct (QNN), is turning Snapdragon devices into real-time AI powerhouses. But here’s where it gets interesting: while many people assume that GPUs are the answer for mobile AI, Google’s engineers reveal why that may no longer be the case.
Breaking Past the GPU Bottleneck
Modern smartphones are powerful, but when it comes to running AI models — especially large ones — even top-tier GPUs can hit their limits. Google software engineers Lu Wang, Wiyi Wanf, and Andrew Wang point out that scenarios like running a text-to-image generator while using a live ML-powered camera segmentation model can overload the GPU. The result? Lagging visuals, jittery apps, and dropped frames. It’s a frustrating experience for anyone expecting real-time responsiveness.
And this is the part most people miss: many Android devices today feature dedicated Neural Processing Units (NPUs). These specialized AI chips outperform GPUs in both efficiency and speed while drawing less power — making them ideal for continuous, heavy AI workloads.
Inside Google and Qualcomm’s Joint Breakthrough
To fully harness that potential, Google teamed up closely with Qualcomm to create QNN — a high-performance accelerator that replaces the older TFLite QNN delegate. The new system integrates multiple SoC compilers and runtimes under one clean, consistent API, giving developers a streamlined way to optimize their apps for NPU execution.
QNN supports over 90 LiteRT operations and is specifically designed to enable full model delegation — a crucial step for squeezing every drop of performance from hardware. It also ships with tailored kernels and deep optimizations that significantly enhance the performance of large language and vision models like Gemma and FastVLM.
Real-World Performance That Turns Heads
When Google tested QNN across 72 machine learning models, the results spoke loud and clear: 64 of those models ran entirely on the NPU, achieving speed boosts up to 100x faster than CPU execution and 10x faster than GPU. Those are not just small improvements — they’re transformative leaps.
On Qualcomm’s latest Snapdragon 8 Elite Gen 5 chip, the difference becomes even more striking. Over 56 models complete execution in under 5 milliseconds with the NPU, compared to just 13 on the CPU. This kind of responsiveness opens the door to seamless real-time AI features — from live video enhancement to instant image generation and beyond.
A Glimpse Into the Next Generation of AI Apps
To showcase the NPU’s muscle, Google engineers built a concept application using a customized version of Apple’s FastVLM-0.5B vision encoder. This app can interpret a live camera scene in the blink of an eye. Running on the Snapdragon 8 Elite Gen 5 NPU, it delivers a remarkable time-to-first-token (TTFT) of only 0.12 seconds on high-resolution 1024×1024 images, processing over 11,000 tokens per second during prefill and 100+ tokens per second during decoding.
This feat is powered by Apple’s model optimization using int8 weight quantization and int16 activation quantization — a technique that, according to Google’s team, unlocks the NPU’s ultra-fast int16 kernels. It’s a fascinating example of how cross-platform collaboration and hybrid engineering make mobile AI more efficient than ever.
Limited Availability, Unlimited Potential
Currently, QNN acceleration is supported on a selected group of Snapdragon 8 and 8+ devices. Developers interested in experimenting can explore the NPU acceleration guide and grab LiteRT from Google’s official GitHub repository. While the hardware support list is narrow for now, that exclusivity may soon expand — especially as more flagship phones adopt NPUs as central components.
So, could this mark the beginning of a new era where NPUs replace GPUs for mobile AI entirely? That’s bound to divide opinions among developers and hardware enthusiasts alike. Some will argue GPUs still have their edge in versatility, while others will see NPUs as the faster, cleaner path forward.
What do you think — are NPUs the future of mobile AI dominance, or will GPUs continue to hold their ground? Share your thoughts — this debate is just getting started.