Quick Navigation
Topics
Quantum Machine Learning
Quantum Simulation
Quantum Chemistry
Gatekeepers and Hallucinations: A Layered Evaluation Framework for LLM-Driven Quantum Circuit Generation
arXiv
Authors: Christopher Coleman, Sharon Marfatia
Year
2026
Paper ID
69422
Status
Preprint
Abstract Read
~2 min
Abstract Words
197
Citations
N/A
Abstract
As large language models (LLMs) become embedded in quantum simulation workflows (IDE copilots, notebook assistants, agentic pipelines), evaluation must move beyond functional correctness to anticipate and catch structured failures before they propagate through expensive pipelines. We present a layered evaluation framework for materials-informed Variational Quantum Eigensolver (VQE) circuit generation: (i) a gatekeeper screening rubric across seven physical and framework criteria; (ii) a circuit fidelity analysis comparing model outputs against analytical and reference-implementation values for H2/STO-3G/Jordan-Wigner/UCCSD, with ansatz classification and gate-composition breakdown; and (iii) design entropy, a run-to-run behavioral consistency metric. We surface a taxonomy of five distinct LLM failure modes (geometry hallucination, nonexistent API usage, runtime integration failures, constraint violations, and plausible-but-unverifiable output), each with distinct detectability profiles and structural to the task rather than to any one model. A forensic audit of the evaluation platform's own source code further establishes that two apparent model failures originated in the harness through silent fallback-template substitution, demonstrating that evaluation infrastructure belongs inside the same trust boundary as the models it tests. Applied across multiple foundation models on a Materials Project integrated pipeline, the framework shows that gatekeeper-style validation is necessary, not optional, for reliable deployment.
Why This Paper Matters
- This paper contributes to the Quantum Machine Learning research area in the Quantum Articles archive.
- It adds a 2026 reference point for readers tracking recent quantum research.
- As large language models (LLMs) become embedded in quantum simulation workflows (IDE copilots, notebook assistants, agentic pipelines), evaluation must move beyond functional...
Paper Tools
Become a member to use research tools
Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.
Show Paper arXiv Publisher Share
Cite This Paper
Copy URL
Compare
Copy DOI Add to Reading List
Category Correction Request
Category Correction Request
Help us improve classification quality by proposing a better category. Every request is reviewed by an admin.
Sign in to submit a category correction request for this paper.
Log In to SubmitReferences & Citation Signals
Community Reactions
Quick sentiment from readers on this paper.
Score:
0
Likes: 0
Dislikes: 0
Sign in to react to this paper.
Discussion & Reviews (Moderated)
Average Rating: 0.0 / 5 (0 ratings)
No written reviews yet.