Quick Navigation

Topics

Quantum Optimization

Block removal for large language models through constrained binary optimization

arXiv
Authors: David Jansen, Roman Rausch, David Montero, Roman Orus

Year

2026

Paper ID

3104

Status

Preprint

Abstract Read

~2 min

Abstract Words

163

Citations

N/A

Abstract

Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an exponentially difficult combinatorial problem. In this paper, we formulate block removal as a constrained binary optimization problem that can be mapped to a physical system (Ising model), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations and yields many high-quality, non-trivial solutions beyond consecutive regions. We demonstrate that our approach outperforms state-of-the-art block-removal methods across several benchmarks, with performance gains persisting after short retraining, and reaching improvements of up to 6 points on the MMLU benchmark. Our method requires only forward and backward passes for a few active parameters, together with an (at least approximate) Ising solver, and can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.

Why This Paper Matters

  • This paper contributes to the Quantum Optimization research area in the Quantum Articles archive.
  • It adds a 2026 reference point for readers tracking recent quantum research.
  • Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an...

Paper Tools

Become a member to use research tools

Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.

Show Paper arXiv Publisher Share Cite This Paper Copy URL Compare Copy DOI Add to Reading List Category Correction Request

References & Citation Signals

Local Citation Graph (Related-Paper Links)

Current Paper #3104 #69549 REGRID-QAOA: A Resource-Efficie... #69528 QALM: Escaping Local Minima via...

External citation index: OpenAlex citation signal

Community Reactions

Quick sentiment from readers on this paper.

Score: 0
Likes: 0 Dislikes: 0

Sign in to react to this paper.

Discussion & Reviews (Moderated)

Average Rating: 0.0 / 5 (0 ratings)

No written reviews yet.