English Edition Status¶
The Chinese edition is the canonical 2026 Springer mainline for this book. Its frozen publication scope is 14 parts, 48 chapters, 15 project case studies, and 8 appendices (A-H), with front matter and an afterword in the site edition.
The English edition is synchronized against that structure with a quality-first policy. Reader-facing navigation follows the Chinese mainline, existing complete English chapters are preserved when their quality and structure are usable, and stale or missing pages are replaced with edited English chapters rather than filled with raw machine translation.
Current Policy¶
- Chinese: latest complete 2026 Springer mainline.
- English: structure synchronized; content translated and under final editorial review.
- Japanese: separate incremental edition and external communication view.
Canonical Chinese Scope¶
| Part | Chinese mainline scope | English status |
|---|---|---|
| Part 1 | Overview and Infrastructure, Ch01-Ch03 | Translated; release-audited |
| Part 2 | Text Pre-training Data Engineering, Ch04-Ch07 | Translated; release-audited |
| Part 3 | Multimodal Data Engineering, Ch08-Ch11 | Translated; release-audited |
| Part 4 | Instruction Fine-tuning and Preference Data, Ch12-Ch14 | Translated; release-audited |
| Part 5 | Synthetic Data Engineering, Ch15-Ch17 | Translated; release-audited |
| Part 6 | Reasoning and Agent Data Engineering, Ch18-Ch20 | Translated; release-audited |
| Part 7 | Application-Level Data Engineering, Ch21-Ch23 | Translated; release-audited |
| Part 8 | DataOps and Platform Engineering, Ch24-Ch26 | Translated; release-audited |
| Part 9 | Data Assets, Data Products, and Data Contracts, Ch27-Ch30 | Translated; release-audited |
| Part 10 | Agentic Data Engineering, Ch31-Ch35 | Translated; release-audited |
| Part 11 | Privacy, Compliance, and Data Security, Ch36-Ch37 | Translated; release-audited |
| Part 12 | Specialized Dataset Case Studies, Ch38-Ch43 | Translated; release-audited |
| Part 13 | Open-source Model Data Recipes, Ch44-Ch48 | Translated; release-audited |
| Part 14 | Practical Projects, P01-P15 | Translated; release-audited |
Quality Gates¶
Each English translation batch is checked for missing English counterparts, translation placeholders, residual Chinese text outside code/link targets, broken Markdown links, MkDocs build failures, and representative browser rendering issues.
Reading Guidance¶
Use the English edition as the current translated web edition. The Chinese edition remains the canonical 2026 source text, and future English changes should continue to be reviewed against it for terminology, figure integrity, and release quality.