On-Device LLM for Edge Computing Challenge
Deploy small language models on smartphones and low-end devices for fully offline AI assistants
Build Statement
Develop a production-ready system that deploys small language models (Phi-3, Gemma, TinyLlama, or similar <3B parameter models) on resource-constrained devices with <4GB RAM, achieving minimum 5 tokens/second inference speed while maintaining >80% of original model quality. Implement advanced optimization techniques including INT8/INT4 quantization, structured/unstructured pruning with 50%+ sparsity, knowledge distillation from larger models, and dynamic memory management.
The solution must include a complete mobile application demonstrating practical offline use cases (translation, education, healthcare, or business assistance), comprehensive benchmarking across diverse hardware (Snapdragon 400-series, MediaTek Helio, Mali-G52 GPUs), and a reusable optimization pipeline that can be applied to future models.
Address key challenges including model loading in memory-constrained environments, efficient KV-cache management, battery consumption optimization, and maintaining conversational context across sessions.
Full Description
The On-Device LLM for Edge Computing Challenge seeks groundbreaking solutions that democratize access to AI by bringing language models directly to resource-constrained devices. This challenge addresses the critical need for AI capabilities in areas with limited or no internet connectivity, particularly relevant for emerging markets and remote regions.
Participants will work with cutting-edge small language models including Phi-3, Gemma, TinyLlama, and other sub-3B parameter models to create practical, fully offline AI assistants. The challenge emphasizes advanced optimization techniques such as quantization (INT8/INT4), structured and unstructured pruning, knowledge distillation, and novel compression methods.
Successful solutions must achieve practical inference speeds (>5 tokens/second) on devices with less than 4GB RAM, including entry-level smartphones, feature phones with basic processors, and low-end laptops. Applications should demonstrate real-world utility such as offline translation, educational tutoring, healthcare guidance, agricultural advice, or business assistance.
We encourage innovative approaches to model optimization, efficient memory management, and creative solutions for model switching and caching. Special consideration will be given to solutions that implement novel techniques for maintaining model quality while achieving extreme compression ratios.
Submission Requirements
• Submit up to 10 supporting links (documents, demos, repositories)
• Additional text content and explanations are supported
• Ensure all materials are accessible and properly formatted
• Review your submission before final submission
Online Submission
Submit your solution online