โ† Back to all papers Machine Learning

A Survey of Transformer Compression Techniques for Edge Devices

Project: TinyAttention

Members / AuthorsDr. Kavita Rao, Sameer Joshi
Author typeProfessor
College / OrganisationIndian Institute of Technology, Bombay
Keywordstransformers, quantization, pruning, distillation, edge AI
Published29 Jun 2026

Abstract

Deploying large language models on the edge requires aggressive compression. We survey quantization, pruning, knowledge distillation, and low-rank factorization, comparing them on a common benchmark of mobile-class hardware. We propose a decision chart that maps latency and memory budgets to recommended techniques.

No file attached to this sample paper.