On-Device LLM for Edge Computing Challenge

Deploy small language models on smartphones and low-end devices for fully offline AI assistants

Build Statement

Develop a production-ready system that deploys small language models (Phi-3, Gemma, TinyLlama, or similar <3B parameter models) on resource-constrained devices with <4GB RAM, achieving minimum 5 tokens/second inference speed while maintaining >80% of original model quality. Implement advanced optimization techniques including INT8/INT4 quantization, structured/unstructured pruning with 50%+ sparsity, knowledge distillation from larger models, and dynamic memory management.

The solution must include a complete mobile application demonstrating practical offline use cases (translation, education, healthcare, or business assistance), comprehensive benchmarking across diverse hardware (Snapdragon 400-series, MediaTek Helio, Mali-G52 GPUs), and a reusable optimization pipeline that can be applied to future models.

Address key challenges including model loading in memory-constrained environments, efficient KV-cache management, battery consumption optimization, and maintaining conversational context across sessions.

Full Description

The On-Device LLM for Edge Computing Challenge seeks groundbreaking solutions that democratize access to AI by bringing language models directly to resource-constrained devices. This challenge addresses the critical need for AI capabilities in areas with limited or no internet connectivity, particularly relevant for emerging markets and remote regions.

Participants will work with cutting-edge small language models including Phi-3, Gemma, TinyLlama, and other sub-3B parameter models to create practical, fully offline AI assistants. The challenge emphasizes advanced optimization techniques such as quantization (INT8/INT4), structured and unstructured pruning, knowledge distillation, and novel compression methods.

Successful solutions must achieve practical inference speeds (>5 tokens/second) on devices with less than 4GB RAM, including entry-level smartphones, feature phones with basic processors, and low-end laptops. Applications should demonstrate real-world utility such as offline translation, educational tutoring, healthcare guidance, agricultural advice, or business assistance.

We encourage innovative approaches to model optimization, efficient memory management, and creative solutions for model switching and caching. Special consideration will be given to solutions that implement novel techniques for maintaining model quality while achieving extreme compression ratios.

Submission Requirements

• Submit up to 10 supporting links (documents, demos, repositories)

• Additional text content and explanations are supported

• Ensure all materials are accessible and properly formatted

• Review your submission before final submission

Online Submission

Submit your solution online

Deadline

November 30, 2025 at 12:00 AM

Prize Pool

$1,000 USD + Internship + Project Sponsorship

Cash Prize

$1000

Organizer

Build54

Evaluation Criteria

Performance on Target Devices 25%

Inference speed, memory usage, and battery efficiency on devices with <4GB RAM

Model Quality Retention 20%

Maintaining acceptable accuracy and coherence after optimization

Technical Innovation 20%

Novel optimization techniques, creative compression methods, or unique architectural modifications

Practical Utility 15%

Real-world applicability and user value of the offline AI assistant

Reproducibility 10%

Clear documentation, reproducible benchmarks, and reusable optimization pipeline

Cross-Device Compatibility 10%

Ability to run on diverse hardware configurations and operating systems