Quick Navigation

Topics

Quantum Optimization Quantum Machine Learning Quantum Simulation

Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks

arXiv
Authors: S. K. Rithvik

Year

2025

Paper ID

17418

Status

Preprint

Abstract Read

~2 min

Abstract Words

174

Citations

N/A

Abstract

We present a systematic evaluation of large language models on quantum mechanics problem-solving. Our study evaluates 15 models from five providers (OpenAI, Anthropic, Google, Alibaba, DeepSeek) spanning three capability tiers on 20 tasks covering derivations, creative problems, non-standard concepts, and numerical computation, comprising 900 baseline and 75 tool-augmented assessments. Results reveal clear tier stratification: flagship models achieve 81% average accuracy, outperforming mid-tier (77%) and fast models (67%) by 4pp and 14pp respectively. Task difficulty patterns emerge distinctly: derivations show highest performance (92% average, 100% for flagship models), while numerical computation remains most challenging (42%). Tool augmentation on numerical tasks yields task-dependent effects: modest overall improvement (+4.4pp) at 3x token cost masks dramatic heterogeneity ranging from +29pp gains to -16pp degradation. Reproducibility analysis across three runs quantifies 6.3pp average variance, with flagship models demonstrating exceptional stability (GPT-5 achieves zero variance) while specialized models require multi-run evaluation. This work contributes: (i) a benchmark for quantum mechanics with automatic verification, (ii) systematic evaluation quantifying tier-based performance hierarchies, (iii) empirical analysis of tool augmentation trade-offs, and (iv) reproducibility characterization. All tasks, verifiers, and results are publicly released.

Why This Paper Matters

  • This paper contributes to the Quantum Machine Learning research area in the Quantum Articles archive.
  • It adds a 2025 reference point for readers tracking recent quantum research.
  • We present a systematic evaluation of large language models on quantum mechanics problem-solving.

Paper Tools

Become a member to use research tools

Sign in to open papers, visit source links, share, cite, compare, copy DOI links, request category corrections, and build your reading list.

Show Paper arXiv Publisher Share Cite This Paper Copy URL Compare Copy DOI Add to Reading List Category Correction Request

References & Citation Signals

Local Citation Graph (Related-Paper Links)

Current Paper #17418 #68978 Repair Before Veto, When Repair... #69034 Hardware-aware Low-latency Quan... #69003 QBugLM: An Agentic Benchmarking... #68993 Tomography of quantum states wi...

External citation index: OpenAlex citation signal

Community Reactions

Quick sentiment from readers on this paper.

Score: 0
Likes: 0 Dislikes: 0

Sign in to react to this paper.

Discussion & Reviews (Moderated)

Average Rating: 0.0 / 5 (0 ratings)

No written reviews yet.