Skip to content

Appendix F: Terminology and Chinese-English Mapping

F.1 Purpose of This Appendix

This appendix standardizes the book's high-frequency terms, abbreviations, and Chinese-English mappings, especially concepts that appear repeatedly across chapters but are easy for different teams to describe with different wording.

For an engineering book with this much scope, inconsistent terminology is itself a cost. It causes the same object to appear under several names across chapters, forcing readers, editors, instructors, and project teams to translate again during alignment. The purpose of this appendix is therefore not to build a dictionary for its own sake, but to establish a stable engineering vocabulary for the whole book.

The terminology table is also more than a translation aid. In data engineering, governance, evaluation, agents, and privacy technologies, many terms look interchangeable but have distinct boundaries. "Masking," "anonymization," "usable," and "releasable" are not the same. "Federated learning," "privacy-enhancing technologies," and "secure multi-party computation" are not freely interchangeable. When terms are used incorrectly, readers may mistake engineering constraints for legal conclusions, or mistake a research prototype for a production solution. Recent research on foundation-model transparency, language-model risk, and trustworthy evaluation all emphasizes that unclear conceptual boundaries directly affect risk identification, responsibility allocation, and result interpretation (Bommasani et al. 2023; Weidinger et al. 2022; Liang et al. 2023).

F.2 Principles for Term Use

  1. Prefer the unified terminology used across this book.
  2. Provide both the Chinese and English full form at first mention.
  3. Use synonyms only when explaining history or aligning with external literature.
  4. For ambiguous abbreviations, state the intended boundary.
  5. For high-frequency cross-part terms, provide the recommended translation.
  6. Do not switch the Chinese-English order repeatedly within the same chapter.
  7. If an abbreviation may be ambiguous, write the full form first.

A practical test is this: after leaving the page, can the reader use the same terms to explain the idea to someone else? If not, the terminology is not yet stable enough.

F.3 Core Terminology Mapping

Chinese term English / abbreviation Note
数据工程 Data Engineering Engineering organization around the data lifecycle
数据治理 Data Governance Management of data use, permissions, responsibility, and boundaries
数据资产 Data Asset A reusable, traceable, maintainable data object
数据卡片 Data Card A record of dataset sources, scope, limits, and versions
模型卡片 Model Card A record of model use, limits, risks, and evaluation
版本冻结 Version Freeze Fixing data or configuration to guarantee reproducibility
污染 Contamination Information leakage across training, evaluation, or tasks
切片 Slice Evaluation split by subgroup, condition, or scenario
回写 Write-back Returning evaluation or operational results to the data side
谱系 Lineage Records of data and action origins and paths
闸门 Gate A control or approval point at a critical step
轨迹 Trajectory An event sequence for an agent or reasoning process
合成数据 Synthetic Data Data generated by a model or rules
隐私增强技术 PETs Privacy Enhancing Technologies
联邦学习 Federated Learning Joint training without centralizing raw data
差分隐私 DP Differential Privacy
安全多方计算 MPC Secure Multi-Party Computation
可信执行环境 TEE Trusted Execution Environment
同态加密 HE Homomorphic Encryption
访问控制 Access Control Managing operational boundaries for data and tools
审批流 Approval Flow Layered confirmation for sensitive operations
留痕 Audit Trail Traceable operation records
法域 Jurisdiction Applicable legal and regulatory scope

Data cards, model cards, and dataset documentation are not ad hoc phrases invented for this book. They continue the discussion in Datasheets for Datasets, Model Cards, and Data Cards about documenting sources, intended use, limits, and evaluation information (Gebru et al. 2021; Mitchell et al. 2019; Pushkarna et al. 2022). Keeping these primary terms in the glossary lets the body chapters, appendices, and external literature point to one another consistently.

F.4 Commonly Confused Terms

F.4.1 "Masking" and "Anonymization"

Masking refers to a set of engineering treatments that reduce identification risk. Anonymization emphasizes a stronger goal: irreversible non-identifiability. Both are common in practice, but they are not substitutes for each other. In this book, use "masking" for engineering treatments and "anonymization" for the stronger identity-non-identification objective, and state the relevant jurisdiction and business requirements when needed.

F.4.2 "Usable" and "Releasable"

Data that is usable for internal experiments is not necessarily safe to publish. Public release usually also involves licensing, boundaries, review, reversibility risk, withdrawal mechanisms, and responsibility assignment. Do not write "internally usable" as "publicly releasable," and do not write "released" as "compliant."

F.4.3 "Evaluation Score" and "Engineering Quality"

An evaluation score is only one part of quality. Version stability, sample coverage, evidence-chain completeness, and rollback ability are also engineering quality. If a chapter mentions only scores, readers may conclude that the engineering objective has only one metric.

F.4.4 "Agent Trajectory" and "Log"

A trajectory is a structured event chain used for decision reconstruction and reproducibility. A log is only one carrier of records. Not every log can directly serve as a trajectory. Only records that express state changes, tool calls, inputs, outputs, and write-back results are enough to support reproduction.

F.4.5 "Data Asset" and "Dataset"

A dataset is primarily a collection. A data asset emphasizes governability, reuse, traceability, and maintainability. A dataset becomes closer to an asset only after it has a stable source, version, owner, and usage boundary.

F.5 Chinese-English Writing Guidance

Recommended form Avoid
Federated learning (FL) Mixing "Federated Learning" and the Chinese term without first definition
Privacy-enhancing technologies (PETs) Mixing PETs, privacy technology, and privacy computing without explanation
Data Protection Impact Assessment (DPIA) Writing only DPIA without the full form
Data Card Mixing data note, card, and dataset note
Lineage Mixing lineage, source chain, and path record without distinction

Additional writing rules:

  1. At first mention, prefer "Chinese primary term first, English in parentheses" in the Chinese edition, and "English primary term first, Chinese in parentheses" only when the English edition needs to preserve a Chinese mapping.
  2. Do not switch repeatedly between abbreviation and full form in one paragraph.
  3. If an abbreviation appears frequently across chapters, repeat the full form at the beginning of a chapter when useful.
  4. Tables may retain English terms, but prose should keep the primary term consistent.
  5. For unfamiliar abbreviations, repeat the full form rather than assuming the reader remembers it.

F.6 Terminology Maintenance Rules

At every full-book revision, check three things:

  1. Whether newly introduced terms should enter this table.
  2. Whether older terms have acquired multiple translations.
  3. Whether abbreviations in chapters remain consistent with this appendix.

In multi-author writing, assign a terminology owner. This role is not about language taste; it is a consistency gatekeeper responsible for:

  • Whether a new term really needs to be included.
  • Whether a term already has a book-wide agreed form.
  • Whether an abbreviation is misused across chapters.
  • Which terms need updates because of regulation or industry practice.

F.7 Supplementary Term Notes

F.7.1 Data Governance

Data governance is not "having more people manage data." It is an institutional arrangement around permissions, responsibility, boundaries, audit, and change. It depends on engineering implementation: without engineering, governance is a slogan; without governance boundaries, engineering becomes temporary assembly.

F.7.2 Privacy Technologies

PETs is an umbrella term. Federated learning, differential privacy, MPC, TEE, and homomorphic encryption may all belong to this family, but their constraints and cost models differ. Avoid writing "we use PETs" as if it were a permanent privacy guarantee.

F.7.3 Versioning and Reproducibility

Version freeze, snapshot, rollback, lineage, and audit trail often appear together, but they focus on different things. Version freeze emphasizes fixation; snapshot emphasizes a state cross-section; rollback emphasizes recovery; lineage emphasizes origin; audit trail emphasizes reviewability. Mixing them makes it hard for readers to understand what the system actually does.

F.8 Example Terminology Page

In body text, introduce terms like this:

This chapter uses "Data Card" to record dataset sources, scope, limits, and versions; "Lineage" to record data-flow paths; and "Gate" to indicate approval or control at critical points.

This style has two benefits: readers know the abbreviation immediately, and later prose can use the primary term consistently without adding noise.

If a chapter repeatedly uses a concept group, define it near the beginning:

In this chapter, "usable" means usable for internal experimentation, while "releasable" means ready for public release after review, boundary control, rollback, and responsibility requirements are met.

This reduces ambiguity later in the chapter.

F.9 Terminology Update Workflow

If the glossary is treated as a static page that is finished once written, it will quickly age. A better approach is to treat it as shared configuration for the whole book and maintain a lightweight but stable workflow.

For each revision:

  1. Scan the body chapters for new high-frequency terms and abbreviations.
  2. Check whether older terms have acquired a second translation.
  3. Verify consistency across tables, figure captions, chapter titles, and prose.
  4. For contested terms, decide the primary term first and aliases second.
  5. Record the affected scope in the revision notes so other chapters do not silently diverge.

In multi-author collaboration, terminology changes should also go through review. A common workflow is: the terminology owner proposes, chapter authors confirm semantics, the editor confirms wording, and the full-book maintainer publishes the decision. The goal is not more process; it is avoiding the cycle where one chapter is fixed while another drifts.

F.9.1 Inclusion Criteria

Not every word belongs in the glossary. Prioritize:

  • Core concepts that appear repeatedly across the book.
  • Abbreviations or translations that are easy to confuse.
  • Boundary terms that affect compliance judgment.
  • Terms that affect engineering implementation paths.
  • High-frequency terms that appear more than three times in the body.

Ordinary words that appear only once and are easy to understand usually do not need inclusion. The value of a glossary is stabilizing keywords, not endlessly expanding a dictionary.

F.9.2 Handling Term Conflicts

When two terms both look correct, decide by three criteria:

  • Which one is easier for readers to understand.
  • Which one better matches industry practice.
  • Which one stays consistent with the rest of the book.

For example, "anonymization," "masking," and "de-identification" may all appear in privacy contexts. If the book's main line is engineering control and release boundaries, "masking" is usually the better primary term. If the discussion is about legal non-identifiability, switch explicitly to the stricter meaning and add jurisdictional context.

F.9.3 Glossary and Chapter Titles

Chapter titles should use the book's primary terms rather than temporary new phrasing. Titles enter navigation, indexes, cross-references, and search results; once a title diverges, maintenance costs become higher than in body text. Terms in titles should satisfy two conditions:

  1. They are book-wide primary terms.
  2. They summarize the chapter's core content.

If a title must retain an industry phrase, define the full form in the first paragraph and state the book's chosen usage.

F.10 Appendix Checklist

Before finalizing a chapter, run this checklist:

  1. Are key terms mapped to the unified forms in this appendix?
  2. Does the first mention provide the full form?
  3. Could any abbreviation conflict with another chapter?
  4. Do tables and prose use the same primary term?
  5. Does the term involve compliance, privacy, or jurisdictional boundaries?
  6. If the term affects implementation, does the chapter explain its engineering meaning?
  7. If the term affects review, is the definition also present in a caption or note where needed?

The purpose is to stabilize terms before writing. Stable terms make the content more stable.

F.11 Summary

Terminology consistency is not a formatting issue; it is a collaboration issue. Stable vocabulary makes structure stable, and stable structure makes engineering reusable. For the whole book, this appendix is not about memorizing more words. It is about making all chapters speak the same engineering language.

F.12 Extended Supplementary Terms

To keep the body chapters consistent, the following table adds common terms that are easy to scatter.

Chinese term English / abbreviation Note
数据快照 Snapshot A cross-section of data at a point in time
数据版本 Data Version A traceable identifier for data state
配置版本 Config Version A traceable identifier for configuration state
回归测试 Regression Test Verifies whether an old issue reappears
抽检 Sampling Audit Limited inspection of samples
证据链 Evidence Chain The complete material chain supporting a judgment
审批门禁 Approval Gate A step that requires confirmation before continuing
上下文窗口 Context Window The information visible to a model or tool
失败回退 Fallback An alternative path when the main path fails
发布窗口 Release Window A time window in which changes may be released

This table covers terms that are not necessarily primary concepts but frequently appear in prose. Many texts become inconsistent not because major terms are undefined, but because smaller terms are left unmanaged.

F.13 Terminology Boundaries in Writing

Some words cannot be translated literally; their function in this book must be considered first.

F.13.1 "Model" and "System"

In many chapters, the deliverable is not the model alone. It is a system that includes model, data, rules, cache, interfaces, and monitoring. If the prose says only "model," readers may assume the whole problem is algorithmic.

F.13.2 "Accuracy" and "Usability"

Accuracy is a metric. Usability is an experience. A system can have acceptable accuracy but be unusable because of latency, poor fallback, or complicated permissions. The reverse can also occur.

F.13.3 "Security" and "Compliance"

Security is closer to technical protection. Compliance is closer to institutional constraint. They are related but not interchangeable. Equating a security control with compliance satisfaction misstates the boundary.

F.13.4 "Privacy" and "Confidentiality"

Privacy usually involves identifiability and usage boundaries for individuals. Confidentiality emphasizes preventing information leakage. They overlap in some scenarios, but they are not the same.

F.14 Translation Selection Principles

One English term may have several Chinese translations. Choose in this order:

  1. Book-wide consistency first.
  2. Industry practice second.
  3. Reader comprehension cost third.
  4. Jurisdictional or standards requirements as special exceptions.

For terms that appear frequently across chapters, consistency is more important than each chapter choosing wording that feels locally smoother.

F.15 Terminology Checkpoints in Chapter Writing

Before finalizing each chapter, check these locations:

  • Chapter title.
  • First-paragraph definitions.
  • Figure titles and captions.
  • Table headers and notes.
  • Variable explanations around formulas.
  • Abbreviations and full forms in citations.
  • Supplemental definitions in footnotes.

Many terminology problems do not appear in paragraphs; they appear in figures and footnotes. In technical books, once terms in figures diverge, readers lose context while skimming.

Terms related to compliance and privacy require special care because they often carry both engineering and legal meanings.

Risk-classification and transparency research usually first defines system capabilities, stakeholders, usage scenarios, and disclosure scope before discussing control measures. Therefore, terms such as "legal," "authorized," and "shareable" in the glossary should be understood as judgments that require contextual confirmation, not as default states (Weidinger et al. 2022; Bommasani et al. 2023).

F.16.1 "Lawful"

Lawfulness is not an engineering default. It is a conclusion that must be confirmed by jurisdiction, purpose, data type, and processing action. Do not equate "we implemented a control" with "this is lawful."

F.16.2 "Authorization"

Authorization is not only "can this be accessed." It also includes "can this be used for the current purpose." Permission and purpose are two separate lines.

F.16.3 "Shareable"

Shareability usually comes with scope, conditions, review, and withdrawal mechanisms. Writing only "shareable" can easily be misread as unconditional openness.

F.17 Usage Examples

If a chapter discusses data cards, lineage, and gates, write:

This chapter uses "Data Card" to record dataset sources, scope, and limits; "Lineage" to record transfer paths; and "Gate" to represent control points at critical steps.

If a chapter discusses privacy computing, write:

This chapter uses "privacy-enhancing technologies (PETs)" as the umbrella term. When federated learning, differential privacy, or secure multi-party computation is discussed specifically, their boundaries and applicable scenarios are stated separately.

This keeps readers from seeing several competing labels on the same page.

F.18 Supplementary Summary

Expanding the glossary is not meant to make the prose heavier. It makes the book more stable. For an engineering book, words are interfaces. Once interfaces are unified, chapters can connect.

F.19 Terminology Consistency in Chapter Titles

Chapter titles are the easiest place to overlook terminology, and the easiest place for terminology to drift out of control. Once a title diverges, navigation, search, and cross-references diverge with it.

Use three principles:

  1. Prefer book-wide primary terms.
  2. Avoid renaming the same concept across titles.
  3. If a title needs to help readers understand, use a subtitle for clarification but keep the primary term stable.

For example, if the book standardizes on "Data Card," do not alternate between "data description" and "dataset note" in titles. Stable titles reduce downstream terminology maintenance.

F.20 Terminology in Figures and Tables

Figure titles, captions, and table headers are more error-prone than prose because they are often added late.

Recommended rules:

  • Keep figure titles short, but keep primary terms accurate.
  • At first mention in a caption, provide the full form.
  • Use unified primary terms in table headers.
  • Avoid switching abbreviations inside a figure.

If a figure becomes too complex, split it into two figures rather than putting three competing labels into one. A confusing figure forces readers to search the prose for definitions.

F.21 Including and Retiring Abbreviations

More abbreviations are not automatically better. Decide whether to keep an abbreviation by four criteria:

  1. Does it appear repeatedly across the book?
  2. Does it actually reduce cognitive cost compared with the full form?
  3. Is it unlikely to conflict with another abbreviation?
  4. Is it widely accepted in the field?

If an abbreviation appears only once or twice, or collides with other terms, use it sparingly or not at all. One fewer abbreviation is usually better than one more guessing game.

F.22 Additional Common Terms

Chinese term English / abbreviation Note
检索增强生成 RAG Retrieval-Augmented Generation
数据运维 DataOps Operations for data pipelines
机器学习运维 MLOps Operations for the model lifecycle
大模型运维 LLMOps Operations for large-model applications
特征存储 Feature Store A shared layer for managing features
访问控制 RBAC / ABAC Role-based or attribute-based access control
个体信息 PII Personally Identifiable Information
数据处理协议 DPA Data Processing Agreement
数据保护影响评估 DPIA Data Protection Impact Assessment

These terms are common in technical books, but their meanings are not always stable. The safest approach is to provide the full form at first mention and define the scope used in this book.

F.23 Glossary Maintenance Workflow

The glossary also needs version management. Each revision should follow four steps:

  1. Scan new terminology.
  2. Check whether old terms have acquired new translations.
  3. Verify consistency across figures, titles, and prose.
  4. Record the affected scope of this change.

If multiple people maintain the book, assign a final decision-maker. Otherwise the glossary will grow larger but less coherent.

F.24 Extended Terms for This Book's Main Line

The following terms should be brought directly into the book-wide convention.

Chinese term English / abbreviation Note
推理 Inference The process by which a model generates a result from input
轨迹 Trace / Trajectory The event path of an agent or process
检索增强生成 RAG A design that combines retrieval and generation
大模型应用 LLM Application Application-layer systems built for deployment
多模态模型 Multimodal Model A model that processes text, images, and other modalities
视觉语言模型 VLM Vision-Language Model
文生图 T2I Text-to-Image
文生视频 T2V Text-to-Video
数据运营 DataOps Operations and governance for data pipelines
隐私保护 Privacy Protection Boundaries around access, use, and release
风险评估 Risk Assessment Judgment of failure, misuse, and compliance risk

These terms most often diverge in chapter titles, figure titles, and tables. Prose should use the primary term and keep English at first mention when needed.

F.25 Applying Terms Inside Chapters

A term is stable only when it appears consistently in the body, not merely in a table. Each chapter should do at least two things:

  1. Define the chapter's primary terms in the opening section.
  2. Keep the same labels in figures and tables.

For example, an agent chapter should not alternate among "intelligent agent," "agent," and "assistant" unless it explicitly distinguishes layers. Multimodal chapters should define VLM, T2I, and T2V early so readers do not mistake them for the same category.

F.26 Three Common Misuses of Glossaries

F.26.1 Treating English Abbreviations as Replacements for Primary Terms

Abbreviations are for communication efficiency, not mystery. If the primary term is clearer in prose, do not overload the page with abbreviations.

F.26.2 Treating Synonyms as Multiple Primary Terms

If one concept has two primary terms in the same chapter, readers will wonder whether two objects are being discussed. The point of a glossary is to prevent that split.

This is especially risky around privacy, compliance, and release boundaries. Terms can slide from engineering description into legal judgment. Clearly distinguish "what the system does" from "whether the legal requirement is satisfied."

References

Bommasani R, Klyman K, Zhang D, Liang P (2023) The Foundation Model Transparency Index. arXiv preprint arXiv:2310.12941.

Gebru T, Morgenstern J, Vecchione B, Vaughan J W, Wallach H, Daumé H, Crawford K (2021) Datasheets for Datasets. Communications of the ACM 64(12):86-92. https://doi.org/10.1145/3458723.

Liang P, Bommasani R, Lee T, et al. (2023) Holistic Evaluation of Language Models. Transactions on Machine Learning Research. arXiv:2211.09110.

Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji I D, Gebru T (2019) Model Cards for Model Reporting. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp 220-229. https://doi.org/10.1145/3287560.3287596.

Pushkarna M, Zaldivar A, Kjartansson O (2022) Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp 1776-1826. https://doi.org/10.1145/3531146.3533231.

Wang B, Chen W, Pei H, et al. (2023) DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. In: Advances in Neural Information Processing Systems 36. https://doi.org/10.52202/075280-1361.

Weidinger L, Uesato J, Rauh M, Griffin C, Huang P-S, Mellor J, Glaese A, Cheng M, Balle B, Kasirzadeh A, Kenton Z, Brown S, Hawkins W, Stepleton T, Birhane A, Haas J, Rimell L, Hendricks L A, Isaac W, Legassick S, Irving G, Gabriel I (2022) Taxonomy of Risks posed by Language Models. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp 214-229. https://doi.org/10.1145/3531146.3533088.