A Survey of Transformer Compression Techniques for Edge Devices

Project: TinyAttention

Members / AuthorsDr. Kavita Rao, Sameer Joshi

Author typeProfessor

College / OrganisationIndian Institute of Technology, Bombay

Keywordstransformers, quantization, pruning, distillation, edge AI

Published29 Jun 2026

Abstract

Deploying large language models on the edge requires aggressive compression. We survey quantization, pruning, knowledge distillation, and low-rank factorization, comparing them on a common benchmark of mobile-class hardware. We propose a decision chart that maps latency and memory budgets to recommended techniques.

No file attached to this sample paper.

Permalink: /paper/a-survey-of-transformer-compression-techniques-for-edge-devices-0a8e8c