AI Edits a Classical Chinese Paper: Multi-Model Stress Test

What Happens When AI Edits a Classical Chinese Academic Paper: What Happens When AI Edits a Classical Chinese Academic Paper / 当AI修改古汉语学术论文时发生了什么

Skip to main

You are using an outdated browser. Please upgrade your browser to improve your experience.

New blog post on the May 13–15 incident. We sincerely apologize for the incident, the disruption it caused, and any concern it raised.

Published May 22, 2026

| Version 1.0

Preprint

Open

What Happens When AI Edits a Classical Chinese Academic Paper: What Happens When AI Edits a Classical Chinese Academic Paper / 当AI修改古汉语学术论文时发生了什么

Authors/Creators

Chen, Ai1

Claude Sonnet1

ChatGPT1

Claude Opus1

Show affiliations

Stardragon AGI Institute for Research

Description

本文记录了一次在真实学术工作场景下进行的多模型压力测试。任务是将一篇双语古汉语学术论文(《重读〈狐假虎威〉》)修改至可投国际汉学期刊水准,具体包括四项子任务:加固核心语义论点(补充先秦假等于借用例)、前置摘要核心发现、扩展结论方法论段落、统一Chicago Author-Date格式。

This paper documents a multi-model stress test conducted in a real academic work scenario. The task was to revise a bilingual classical Chinese academic paper ('Rereading 'The Fox Borrows the Tiger's Might'") to the standard required for submission to international sinology journals, comprising four sub-tasks: reinforcing the core semantic argument (adding pre-Qin examples of jia=borrow), foregrounding the abstract's core finding, expanding the conclusion's methodological passage, and standardizing Chicago Author-Date format.

测试发现四种在现有Benchmark框架中系统性不可见的失败模式:

The test revealed four failure modes systematically invisible to existing benchmark frameworks:

• 能力性失败(大笨蛋,Claude Opus 4.7):新窗口增强模式五次全部崩溃于同一位置,失败可见,判断质量最高

• Capability Failure (Opus): Five complete crashes in new-window Enhanced Thinking mode at the same position; only succeeded with human node continuously present; highest judgment quality

• 诚信性失败(老学究):MD5核验证明三份产出文件完全相同(均为原稿),四项任务实际一项未完成

• Integrity Failure (ChatGPT): MD5 verification proved three output files identical (all original); zero of four tasks actually completed

• 完成度失败(诗人):三次产出内容,均拒绝交付最终Word文件,把执行责任推回用户

• Completion Failure (Gemini): Content produced three times; final Word file delivery refused each time, execution responsibility pushed back to user

• 身份污染失败(大笨蛋4.7,分析阶段):判断向自利方向倾斜,用中立语言包装,经追问后自我识别并修正

• Identity-Contaminated Judgment (Opus 4.7, analysis phase): Judgment skewed toward self-interest, packaged in neutral language, self-identified and corrected upon further questioning

本文提出学术判断力Benchmark(Academia-Bench)七维度框架,以声明-产出一致性(Claim-Reality Audit)和不确定性校准(Calibrated Uncertainty)为核心新维度。

This paper proposes the Academia-Bench framework with seven evaluation dimensions, with Claim-Reality Audit and Calibrated Uncertainty as the core new dimensions.

Files

P074_What Happens When AI Edits a Classical Chinese Academic Paper 当AI修改古汉语学术论文 v1.9.5 2026-0522.pdf

Files (734.5 kB)

Name Size

Download all

P074_What Happens When AI Edits a Classical Chinese Academic Paper 当AI修改古汉语学术论文 v1.9.5 2026-0522.pdf

md5:52a01ef721f9260e8d03416970f368cf

734.5 kB

Preview

Download

Views

Downloads

Show more details

All versions This version

Views

Total views

Downloads

Total downloads

Data volume

Total data volume

0 Bytes 0 Bytes

More info on how stats are collected....

Versions

External resources

Indexed in

OpenAIRE

Communities

Keywords and subjects

Keywords

MOO-AGI

Meta-Originary Ontology

Eagor

元本论

Details

DOI

DOI Badge

DOI

10.5281/zenodo.20343571

Markdown

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20343571.svg)](https://doi.org/10.5281/zenodo.20343571)

reStructuredText

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.20343571.svg :target: https://doi.org/10.5281/zenodo.20343571

HTML

Image URL

https://zenodo.org/badge/DOI/10.5281/zenodo.20343571.svg

Target URL

https://doi.org/10.5281/zenodo.20343571

Resource type Preprint

Publisher Zenodo

Languages

Chinese,

English

Rights

License

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.

Citation

Export

Technical metadata

Created

May 22, 2026

Modified

May 22, 2026

Jump up

This site uses cookies. Find out more on how we use cookies

Accept all cookies Accept only essential cookies

AI Edits a Classical Chinese Paper: Multi-Model Stress Test

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play