Quick Navigation

Overview Topics Abstract Importance Paper Tools References Reactions Discussion Publication Related

Topics

Quantum Machine Learning

Higher-Order Token Interactions via Quantum Attention

arXiv

Authors: Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

Year

2026

Paper ID

68835

Status

Preprint

Abstract Read

~2 min

Abstract Words

263

Citations

Abstract

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-k interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce Quantum Higher-Order Attention (QHA), a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-k token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension m, H heads and p-bit precision satisfying mHp=o\(N/loglog N\) cannot represent the order-k correlation family that one QHA head represents with circuit depth O\(log k\) (O(k) two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and O\(log n\) depth the gradient variance is Ω\(1/poly(n\)) (no barren plateau), which we confirm empirically - while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a 6.5times smaller parameter budget, QHA generalizes hidden-subset parity of every order kle6 from disjoint inputs, whereas the larger classical attention head collapses past order 2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

Why This Paper Matters

This paper contributes to the Quantum Machine Learning research area in the Quantum Articles archive.
It adds a 2026 reference point for readers tracking recent quantum research.
Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-k interaction is known to...

Paper Tools

Become a member to use research tools

Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.

Become a member Sign in

Show Paper arXiv Publisher Share Cite This Paper Copy URL Compare Copy DOI Add to Reading List Category Correction Request

References & Citation Signals

[1] DOI https://doi.org/arXiv:2606.11673 [2] arXiv https://arxiv.org/abs/2606.11673 [3] Publisher https://arxiv.org/abs/2606.11673

Local Citation Graph (Related-Paper Links)

External citation index: OpenAlex citation signal • updated 2026-06-14 06:57:12

Community Reactions

Quick sentiment from readers on this paper.

Score: 0

Likes: 0 Dislikes: 0

Discussion & Reviews (Moderated)

Average Rating: 0.0 / 5 (0 ratings)

No written reviews yet.