Quick Navigation

Topics

Quantum Machine Learning

Higher-Order Token Interactions via Quantum Attention

arXiv
Authors: Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

Year

2026

Paper ID

68835

Status

Preprint

Abstract Read

~2 min

Abstract Words

263

Citations

0

Abstract

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-k interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce Quantum Higher-Order Attention (QHA), a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-k token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension m, H heads and p-bit precision satisfying mHp=o\(N/loglog N\) cannot represent the order-k correlation family that one QHA head represents with circuit depth O\(log k\) (O(k) two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and O\(log n\) depth the gradient variance is Ω\(1/poly(n\)) (no barren plateau), which we confirm empirically - while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a 6.5times smaller parameter budget, QHA generalizes hidden-subset parity of every order kle6 from disjoint inputs, whereas the larger classical attention head collapses past order 2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

Why This Paper Matters

  • This paper contributes to the Quantum Machine Learning research area in the Quantum Articles archive.
  • It adds a 2026 reference point for readers tracking recent quantum research.
  • Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-k interaction is known to...

Paper Tools

Become a member to use research tools

Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.

Show Paper arXiv Publisher Share Cite This Paper Copy URL Compare Copy DOI Add to Reading List Category Correction Request

References & Citation Signals

Local Citation Graph (Related-Paper Links)

Current Paper #68835 #69034 Hardware-aware Low-latency Quan... #69025 Machine-Learning Optimization a... #69003 QBugLM: An Agentic Benchmarking... #68993 Tomography of quantum states wi...

External citation index: OpenAlex citation signal • updated 2026-06-14 06:57:12

Community Reactions

Quick sentiment from readers on this paper.

Score: 0
Likes: 0 Dislikes: 0

Sign in to react to this paper.

Discussion & Reviews (Moderated)

Average Rating: 0.0 / 5 (0 ratings)

No written reviews yet.