Quick Navigation
Topics
Quantum Optimization
Quantum Machine Learning
Quantum Simulation
Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
arXiv
Authors: S. K. Rithvik
Year
2025
Paper ID
17418
Status
Preprint
Abstract Read
~2 min
Abstract Words
174
Citations
N/A
Abstract
We present a systematic evaluation of large language models on quantum mechanics problem-solving. Our study evaluates 15 models from five providers (OpenAI, Anthropic, Google, Alibaba, DeepSeek) spanning three capability tiers on 20 tasks covering derivations, creative problems, non-standard concepts, and numerical computation, comprising 900 baseline and 75 tool-augmented assessments. Results reveal clear tier stratification: flagship models achieve 81% average accuracy, outperforming mid-tier (77%) and fast models (67%) by 4pp and 14pp respectively. Task difficulty patterns emerge distinctly: derivations show highest performance (92% average, 100% for flagship models), while numerical computation remains most challenging (42%). Tool augmentation on numerical tasks yields task-dependent effects: modest overall improvement (+4.4pp) at 3x token cost masks dramatic heterogeneity ranging from +29pp gains to -16pp degradation. Reproducibility analysis across three runs quantifies 6.3pp average variance, with flagship models demonstrating exceptional stability (GPT-5 achieves zero variance) while specialized models require multi-run evaluation. This work contributes: (i) a benchmark for quantum mechanics with automatic verification, (ii) systematic evaluation quantifying tier-based performance hierarchies, (iii) empirical analysis of tool augmentation trade-offs, and (iv) reproducibility characterization. All tasks, verifiers, and results are publicly released.
Why This Paper Matters
- This paper contributes to the Quantum Machine Learning research area in the Quantum Articles archive.
- It adds a 2025 reference point for readers tracking recent quantum research.
- We present a systematic evaluation of large language models on quantum mechanics problem-solving.
Paper Tools
Become a member to use research tools
Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.
Show Paper arXiv Publisher Share
Cite This Paper
Copy URL
Compare
Copy DOI Add to Reading List
Category Correction Request
Category Correction Request
Help us improve classification quality by proposing a better category. Every request is reviewed by an admin.
Sign in to submit a category correction request for this paper.
Log In to SubmitReferences & Citation Signals
Community Reactions
Quick sentiment from readers on this paper.
Score:
0
Likes: 0
Dislikes: 0
Sign in to react to this paper.
Discussion & Reviews (Moderated)
Average Rating: 0.0 / 5 (0 ratings)
No written reviews yet.