Appendix F: Terminology and Chinese-English Mapping¶

F.1 Purpose of This Appendix¶

This appendix standardizes the book's high-frequency terms, abbreviations, and Chinese-English mappings, especially concepts that appear repeatedly across chapters but are easy for different teams to describe with different wording.

For an engineering book with this much scope, inconsistent terminology is itself a cost. It causes the same object to appear under several names across chapters, forcing readers, editors, instructors, and project teams to translate again during alignment. The purpose of this appendix is therefore not to build a dictionary for its own sake, but to establish a stable engineering vocabulary for the whole book.

The terminology table is also more than a translation aid. In data engineering, governance, evaluation, agents, and privacy technologies, many terms look interchangeable but have distinct boundaries. "Masking," "anonymization," "usable," and "releasable" are not the same. "Federated learning," "privacy-enhancing technologies," and "secure multi-party computation" are not freely interchangeable. When terms are used incorrectly, readers may mistake engineering constraints for legal conclusions, or mistake a research prototype for a production solution. Recent research on foundation-model transparency, language-model risk, and trustworthy evaluation all emphasizes that unclear conceptual boundaries directly affect risk identification, responsibility allocation, and result interpretation (Bommasani et al. 2023; Weidinger et al. 2022; Liang et al. 2023).

F.2 Principles for Term Use¶

Prefer the unified terminology used across this book.
Provide both the Chinese and English full form at first mention.
Use synonyms only when explaining history or aligning with external literature.
For ambiguous abbreviations, state the intended boundary.
For high-frequency cross-part terms, provide the recommended translation.
Do not switch the Chinese-English order repeatedly within the same chapter.
If an abbreviation may be ambiguous, write the full form first.

A practical test is this: after leaving the page, can the reader use the same terms to explain the idea to someone else? If not, the terminology is not yet stable enough.

F.3 Core Terminology Mapping¶

Chinese term	English / abbreviation	Note
数据工程	Data Engineering	Engineering organization around the data lifecycle
数据治理	Data Governance	Management of data use, permissions, responsibility, and boundaries
数据资产	Data Asset	A reusable, traceable, maintainable data object
数据卡片	Data Card	A record of dataset sources, scope, limits, and versions
模型卡片	Model Card	A record of model use, limits, risks, and evaluation
版本冻结	Version Freeze	Fixing data or configuration to guarantee reproducibility
污染	Contamination	Information leakage across training, evaluation, or tasks
切片	Slice	Evaluation split by subgroup, condition, or scenario
回写	Write-back	Returning evaluation or operational results to the data side
谱系	Lineage	Records of data and action origins and paths
闸门	Gate	A control or approval point at a critical step
轨迹	Trajectory	An event sequence for an agent or reasoning process
合成数据	Synthetic Data	Data generated by a model or rules
隐私增强技术	PETs	Privacy Enhancing Technologies
联邦学习	Federated Learning	Joint training without centralizing raw data
差分隐私	DP	Differential Privacy
安全多方计算	MPC	Secure Multi-Party Computation
可信执行环境	TEE	Trusted Execution Environment
同态加密	HE	Homomorphic Encryption
访问控制	Access Control	Managing operational boundaries for data and tools
审批流	Approval Flow	Layered confirmation for sensitive operations
留痕	Audit Trail	Traceable operation records
法域	Jurisdiction	Applicable legal and regulatory scope

Data cards, model cards, and dataset documentation are not ad hoc phrases invented for this book. They continue the discussion in Datasheets for Datasets, Model Cards, and Data Cards about documenting sources, intended use, limits, and evaluation information (Gebru et al. 2021; Mitchell et al. 2019; Pushkarna et al. 2022). Keeping these primary terms in the glossary lets the body chapters, appendices, and external literature point to one another consistently.

F.4 Commonly Confused Terms¶

F.4.1 "Masking" and "Anonymization"¶

Masking refers to a set of engineering treatments that reduce identification risk. Anonymization emphasizes a stronger goal: irreversible non-identifiability. Both are common in practice, but they are not substitutes for each other. In this book, use "masking" for engineering treatments and "anonymization" for the stronger identity-non-identification objective, and state the relevant jurisdiction and business requirements when needed.

F.4.2 "Usable" and "Releasable"¶

Data that is usable for internal experiments is not necessarily safe to publish. Public release usually also involves licensing, boundaries, review, reversibility risk, withdrawal mechanisms, and responsibility assignment. Do not write "internally usable" as "publicly releasable," and do not write "released" as "compliant."

F.4.3 "Evaluation Score" and "Engineering Quality"¶

An evaluation score is only one part of quality. Version stability, sample coverage, evidence-chain completeness, and rollback ability are also engineering quality. If a chapter mentions only scores, readers may conclude that the engineering objective has only one metric.

F.4.4 "Agent Trajectory" and "Log"¶

A trajectory is a structured event chain used for decision reconstruction and reproducibility. A log is only one carrier of records. Not every log can directly serve as a trajectory. Only records that express state changes, tool calls, inputs, outputs, and write-back results are enough to support reproduction.

F.4.5 "Data Asset" and "Dataset"¶

A dataset is primarily a collection. A data asset emphasizes governability, reuse, traceability, and maintainability. A dataset becomes closer to an asset only after it has a stable source, version, owner, and usage boundary.

F.5 Chinese-English Writing Guidance¶

Recommended form	Avoid
Federated learning (FL)	Mixing "Federated Learning" and the Chinese term without first definition
Privacy-enhancing technologies (PETs)	Mixing PETs, privacy technology, and privacy computing without explanation
Data Protection Impact Assessment (DPIA)	Writing only DPIA without the full form
Data Card	Mixing data note, card, and dataset note
Lineage	Mixing lineage, source chain, and path record without distinction

Additional writing rules:

At first mention, prefer "Chinese primary term first, English in parentheses" in the Chinese edition, and "English primary term first, Chinese in parentheses" only when the English edition needs to preserve a Chinese mapping.
Do not switch repeatedly between abbreviation and full form in one paragraph.
If an abbreviation appears frequently across chapters, repeat the full form at the beginning of a chapter when useful.
Tables may retain English terms, but prose should keep the primary term consistent.
For unfamiliar abbreviations, repeat the full form rather than assuming the reader remembers it.

F.6 Terminology Maintenance Rules¶

At every full-book revision, check three things:

Whether newly introduced terms should enter this table.
Whether older terms have acquired multiple translations.
Whether abbreviations in chapters remain consistent with this appendix.

In multi-author writing, assign a terminology owner. This role is not about language taste; it is a consistency gatekeeper responsible for:

Whether a new term really needs to be included.
Whether a term already has a book-wide agreed form.
Whether an abbreviation is misused across chapters.
Which terms need updates because of regulation or industry practice.

F.7 Supplementary Term Notes¶

F.7.1 Data Governance¶

Data governance is not "having more people manage data." It is an institutional arrangement around permissions, responsibility, boundaries, audit, and change. It depends on engineering implementation: without engineering, governance is a slogan; without governance boundaries, engineering becomes temporary assembly.

F.7.2 Privacy Technologies¶

PETs is an umbrella term. Federated learning, differential privacy, MPC, TEE, and homomorphic encryption may all belong to this family, but their constraints and cost models differ. Avoid writing "we use PETs" as if it were a permanent privacy guarantee.

F.7.3 Versioning and Reproducibility¶

Version freeze, snapshot, rollback, lineage, and audit trail often appear together, but they focus on different things. Version freeze emphasizes fixation; snapshot emphasizes a state cross-section; rollback emphasizes recovery; lineage emphasizes origin; audit trail emphasizes reviewability. Mixing them makes it hard for readers to understand what the system actually does.

F.8 Example Terminology Page¶

In body text, introduce terms like this:

This chapter uses "Data Card" to record dataset sources, scope, limits, and versions; "Lineage" to record data-flow paths; and "Gate" to indicate approval or control at critical points.

This style has two benefits: readers know the abbreviation immediately, and later prose can use the primary term consistently without adding noise.

If a chapter repeatedly uses a concept group, define it near the beginning:

In this chapter, "usable" means usable for internal experimentation, while "releasable" means ready for public release after review, boundary control, rollback, and responsibility requirements are met.

This reduces ambiguity later in the chapter.

F.9 Terminology Update Workflow¶

If the glossary is treated as a static page that is finished once written, it will quickly age. A better approach is to treat it as shared configuration for the whole book and maintain a lightweight but stable workflow.

For each revision:

Scan the body chapters for new high-frequency terms and abbreviations.
Check whether older terms have acquired a second translation.
Verify consistency across tables, figure captions, chapter titles, and prose.
For contested terms, decide the primary term first and aliases second.
Record the affected scope in the revision notes so other chapters do not silently diverge.

In multi-author collaboration, terminology changes should also go through review. A common workflow is: the terminology owner proposes, chapter authors confirm semantics, the editor confirms wording, and the full-book maintainer publishes the decision. The goal is not more process; it is avoiding the cycle where one chapter is fixed while another drifts.

F.9.1 Inclusion Criteria¶

Not every word belongs in the glossary. Prioritize:

Core concepts that appear repeatedly across the book.
Abbreviations or translations that are easy to confuse.
Boundary terms that affect compliance judgment.
Terms that affect engineering implementation paths.
High-frequency terms that appear more than three times in the body.

Ordinary words that appear only once and are easy to understand usually do not need inclusion. The value of a glossary is stabilizing keywords, not endlessly expanding a dictionary.

F.9.2 Handling Term Conflicts¶

When two terms both look correct, decide by three criteria:

Which one is easier for readers to understand.
Which one better matches industry practice.
Which one stays consistent with the rest of the book.

For example, "anonymization," "masking," and "de-identification" may all appear in privacy contexts. If the book's main line is engineering control and release boundaries, "masking" is usually the better primary term. If the discussion is about legal non-identifiability, switch explicitly to the stricter meaning and add jurisdictional context.

F.9.3 Glossary and Chapter Titles¶

Chapter titles should use the book's primary terms rather than temporary new phrasing. Titles enter navigation, indexes, cross-references, and search results; once a title diverges, maintenance costs become higher than in body text. Terms in titles should satisfy two conditions:

They are book-wide primary terms.
They summarize the chapter's core content.

If a title must retain an industry phrase, define the full form in the first paragraph and state the book's chosen usage.

F.10 Appendix Checklist¶

Before finalizing a chapter, run this checklist:

Are key terms mapped to the unified forms in this appendix?
Does the first mention provide the full form?
Could any abbreviation conflict with another chapter?
Do tables and prose use the same primary term?
Does the term involve compliance, privacy, or jurisdictional boundaries?
If the term affects implementation, does the chapter explain its engineering meaning?
If the term affects review, is the definition also present in a caption or note where needed?

The purpose is to stabilize terms before writing. Stable terms make the content more stable.

F.11 Summary¶

Terminology consistency is not a formatting issue; it is a collaboration issue. Stable vocabulary makes structure stable, and stable structure makes engineering reusable. For the whole book, this appendix is not about memorizing more words. It is about making all chapters speak the same engineering language.

F.12 Extended Supplementary Terms¶

To keep the body chapters consistent, the following table adds common terms that are easy to scatter.

Chinese term	English / abbreviation	Note
数据快照	Snapshot	A cross-section of data at a point in time
数据版本	Data Version	A traceable identifier for data state
配置版本	Config Version	A traceable identifier for configuration state
回归测试	Regression Test	Verifies whether an old issue reappears
抽检	Sampling Audit	Limited inspection of samples
证据链	Evidence Chain	The complete material chain supporting a judgment
审批门禁	Approval Gate	A step that requires confirmation before continuing
上下文窗口	Context Window	The information visible to a model or tool
失败回退	Fallback	An alternative path when the main path fails
发布窗口	Release Window	A time window in which changes may be released

This table covers terms that are not necessarily primary concepts but frequently appear in prose. Many texts become inconsistent not because major terms are undefined, but because smaller terms are left unmanaged.

F.13 Terminology Boundaries in Writing¶

Some words cannot be translated literally; their function in this book must be considered first.

F.13.1 "Model" and "System"¶

In many chapters, the deliverable is not the model alone. It is a system that includes model, data, rules, cache, interfaces, and monitoring. If the prose says only "model," readers may assume the whole problem is algorithmic.

F.13.2 "Accuracy" and "Usability"¶

Accuracy is a metric. Usability is an experience. A system can have acceptable accuracy but be unusable because of latency, poor fallback, or complicated permissions. The reverse can also occur.

F.13.3 "Security" and "Compliance"¶

Security is closer to technical protection. Compliance is closer to institutional constraint. They are related but not interchangeable. Equating a security control with compliance satisfaction misstates the boundary.

F.13.4 "Privacy" and "Confidentiality"¶

Privacy usually involves identifiability and usage boundaries for individuals. Confidentiality emphasizes preventing information leakage. They overlap in some scenarios, but they are not the same.

F.14 Translation Selection Principles¶

One English term may have several Chinese translations. Choose in this order:

Book-wide consistency first.
Industry practice second.
Reader comprehension cost third.
Jurisdictional or standards requirements as special exceptions.

For terms that appear frequently across chapters, consistency is more important than each chapter choosing wording that feels locally smoother.

F.15 Terminology Checkpoints in Chapter Writing¶

Before finalizing each chapter, check these locations:

Chapter title.
First-paragraph definitions.
Figure titles and captions.
Table headers and notes.
Variable explanations around formulas.
Abbreviations and full forms in citations.
Supplemental definitions in footnotes.

Many terminology problems do not appear in paragraphs; they appear in figures and footnotes. In technical books, once terms in figures diverge, readers lose context while skimming.

Terms related to compliance and privacy require special care because they often carry both engineering and legal meanings.

Risk-classification and transparency research usually first defines system capabilities, stakeholders, usage scenarios, and disclosure scope before discussing control measures. Therefore, terms such as "legal," "authorized," and "shareable" in the glossary should be understood as judgments that require contextual confirmation, not as default states (Weidinger et al. 2022; Bommasani et al. 2023).

F.16.1 "Lawful"¶

Lawfulness is not an engineering default. It is a conclusion that must be confirmed by jurisdiction, purpose, data type, and processing action. Do not equate "we implemented a control" with "this is lawful."

F.16.2 "Authorization"¶

Authorization is not only "can this be accessed." It also includes "can this be used for the current purpose." Permission and purpose are two separate lines.

F.16.3 "Shareable"¶

Shareability usually comes with scope, conditions, review, and withdrawal mechanisms. Writing only "shareable" can easily be misread as unconditional openness.

F.17 Usage Examples¶

If a chapter discusses data cards, lineage, and gates, write:

This chapter uses "Data Card" to record dataset sources, scope, and limits; "Lineage" to record transfer paths; and "Gate" to represent control points at critical steps.

If a chapter discusses privacy computing, write:

This chapter uses "privacy-enhancing technologies (PETs)" as the umbrella term. When federated learning, differential privacy, or secure multi-party computation is discussed specifically, their boundaries and applicable scenarios are stated separately.

This keeps readers from seeing several competing labels on the same page.

F.18 Supplementary Summary¶

Expanding the glossary is not meant to make the prose heavier. It makes the book more stable. For an engineering book, words are interfaces. Once interfaces are unified, chapters can connect.

F.19 Terminology Consistency in Chapter Titles¶

Chapter titles are the easiest place to overlook terminology, and the easiest place for terminology to drift out of control. Once a title diverges, navigation, search, and cross-references diverge with it.

Use three principles:

Prefer book-wide primary terms.
Avoid renaming the same concept across titles.
If a title needs to help readers understand, use a subtitle for clarification but keep the primary term stable.

For example, if the book standardizes on "Data Card," do not alternate between "data description" and "dataset note" in titles. Stable titles reduce downstream terminology maintenance.

F.20 Terminology in Figures and Tables¶

Figure titles, captions, and table headers are more error-prone than prose because they are often added late.

Recommended rules:

Keep figure titles short, but keep primary terms accurate.
At first mention in a caption, provide the full form.
Use unified primary terms in table headers.
Avoid switching abbreviations inside a figure.

If a figure becomes too complex, split it into two figures rather than putting three competing labels into one. A confusing figure forces readers to search the prose for definitions.

F.21 Including and Retiring Abbreviations¶

More abbreviations are not automatically better. Decide whether to keep an abbreviation by four criteria:

Does it appear repeatedly across the book?
Does it actually reduce cognitive cost compared with the full form?
Is it unlikely to conflict with another abbreviation?
Is it widely accepted in the field?

If an abbreviation appears only once or twice, or collides with other terms, use it sparingly or not at all. One fewer abbreviation is usually better than one more guessing game.

F.22 Additional Common Terms¶

Chinese term	English / abbreviation	Note
检索增强生成	RAG	Retrieval-Augmented Generation
数据运维	DataOps	Operations for data pipelines
机器学习运维	MLOps	Operations for the model lifecycle
大模型运维	LLMOps	Operations for large-model applications
特征存储	Feature Store	A shared layer for managing features
访问控制	RBAC / ABAC	Role-based or attribute-based access control
个体信息	PII	Personally Identifiable Information
数据处理协议	DPA	Data Processing Agreement
数据保护影响评估	DPIA	Data Protection Impact Assessment

These terms are common in technical books, but their meanings are not always stable. The safest approach is to provide the full form at first mention and define the scope used in this book.

F.23 Glossary Maintenance Workflow¶

The glossary also needs version management. Each revision should follow four steps:

Scan new terminology.
Check whether old terms have acquired new translations.
Verify consistency across figures, titles, and prose.
Record the affected scope of this change.

If multiple people maintain the book, assign a final decision-maker. Otherwise the glossary will grow larger but less coherent.

F.24 Extended Terms for This Book's Main Line¶

The following terms should be brought directly into the book-wide convention.

Chinese term	English / abbreviation	Note
推理	Inference	The process by which a model generates a result from input
轨迹	Trace / Trajectory	The event path of an agent or process
检索增强生成	RAG	A design that combines retrieval and generation
大模型应用	LLM Application	Application-layer systems built for deployment
多模态模型	Multimodal Model	A model that processes text, images, and other modalities
视觉语言模型	VLM	Vision-Language Model
文生图	T2I	Text-to-Image
文生视频	T2V	Text-to-Video
数据运营	DataOps	Operations and governance for data pipelines
隐私保护	Privacy Protection	Boundaries around access, use, and release
风险评估	Risk Assessment	Judgment of failure, misuse, and compliance risk

These terms most often diverge in chapter titles, figure titles, and tables. Prose should use the primary term and keep English at first mention when needed.

F.25 Applying Terms Inside Chapters¶

A term is stable only when it appears consistently in the body, not merely in a table. Each chapter should do at least two things:

Define the chapter's primary terms in the opening section.
Keep the same labels in figures and tables.

For example, an agent chapter should not alternate among "intelligent agent," "agent," and "assistant" unless it explicitly distinguishes layers. Multimodal chapters should define VLM, T2I, and T2V early so readers do not mistake them for the same category.

F.26 Three Common Misuses of Glossaries¶

F.26.1 Treating English Abbreviations as Replacements for Primary Terms¶

Abbreviations are for communication efficiency, not mystery. If the primary term is clearer in prose, do not overload the page with abbreviations.

F.26.2 Treating Synonyms as Multiple Primary Terms¶

If one concept has two primary terms in the same chapter, readers will wonder whether two objects are being discussed. The point of a glossary is to prevent that split.

F.26.3 Turning Engineering Concepts into Legal Conclusions¶

This is especially risky around privacy, compliance, and release boundaries. Terms can slide from engineering description into legal judgment. Clearly distinguish "what the system does" from "whether the legal requirement is satisfied."

References¶

Bommasani R, Klyman K, Zhang D, Liang P (2023) The Foundation Model Transparency Index. arXiv preprint arXiv:2310.12941.

Gebru T, Morgenstern J, Vecchione B, Vaughan J W, Wallach H, Daumé H, Crawford K (2021) Datasheets for Datasets. Communications of the ACM 64(12):86-92. https://doi.org/10.1145/3458723.

Liang P, Bommasani R, Lee T, et al. (2023) Holistic Evaluation of Language Models. Transactions on Machine Learning Research. arXiv:2211.09110.

Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji I D, Gebru T (2019) Model Cards for Model Reporting. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp 220-229. https://doi.org/10.1145/3287560.3287596.

Pushkarna M, Zaldivar A, Kjartansson O (2022) Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp 1776-1826. https://doi.org/10.1145/3531146.3533231.

Wang B, Chen W, Pei H, et al. (2023) DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. In: Advances in Neural Information Processing Systems 36. https://doi.org/10.52202/075280-1361.

Weidinger L, Uesato J, Rauh M, Griffin C, Huang P-S, Mellor J, Glaese A, Cheng M, Balle B, Kasirzadeh A, Kenton Z, Brown S, Hawkins W, Stepleton T, Birhane A, Haas J, Rimell L, Hendricks L A, Isaac W, Legassick S, Irving G, Gabriel I (2022) Taxonomy of Risks posed by Language Models. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp 214-229. https://doi.org/10.1145/3531146.3533088.