Abbreviations¶

This page lists high-frequency technical abbreviations used throughout the book for quick lookup across parts and chapters.

General Abbreviations¶

Table FM-1: General Abbreviations.

Abbreviation	Full Name	Description	Main Locations
A100	NVIDIA A100 GPU	NVIDIA A100 accelerator	Part 1, Part 10, Part 11
AI	Artificial Intelligence	Artificial intelligence	Whole book
AGI	Artificial General Intelligence	Artificial general intelligence	Part 1, Part 11
API	Application Programming Interface	Application programming interface	Part 1, Part 4, Part 10, Part 11
ANN	Approximate Nearest Neighbor	Approximate nearest-neighbor retrieval	Part 1, Part 7
ASR	Automatic Speech Recognition	Automatic speech recognition	Part 3
BM25	Best Matching 25	Classic sparse retrieval ranking method	Part 1, Part 7
CI/CD	Continuous Integration / Continuous Deployment	Continuous integration and continuous deployment	Part 1, Part 2, Part 8
CPU	Central Processing Unit	Central processing unit	Part 1, Part 2
CSV	Comma-Separated Values	Comma-separated text format	Part 1
ETL	Extract, Transform, Load	Extract, transform, and load	Part 1, Part 8
GPU	Graphics Processing Unit	Graphics processing unit	Part 1, Part 3, Part 10, Part 11
GUID	Globally Unique Identifier	Globally unique identifier	Part 2
HDFS	Hadoop Distributed File System	Hadoop distributed file system	Part 1, Part 10
H100	NVIDIA H100 GPU	NVIDIA H100 accelerator	Part 1
JSON	JavaScript Object Notation	Structured data interchange format	Part 1, Part 3, Part 4, Part 10, Part 11
JSONL	JSON Lines	Line-delimited JSON text format	Part 1, Part 3, Part 4, Part 10
KPI	Key Performance Indicator	Key performance indicator	Part 4, Part 8
LLM	Large Language Model	Large language model	Whole book
MLOps	Machine Learning Operations	Machine-learning engineering and operations system	Part 1, Part 8
PDF	Portable Document Format	Portable document format	Part 1, Part 3, Part 7, Part 10, Part 11
PII	Personally Identifiable Information	Personally identifiable information	Part 2, Part 9, Part 10
ROI	Return on Investment	Return on investment	Part 1, Part 8
SLA	Service Level Agreement	Service-level agreement	Part 1, Part 4, Part 8
SOPs	Standard Operating Procedures	Standard operating procedures	Part 4, Part 8
SQL	Structured Query Language	Structured query language	Part 1, Part 4, Part 11
TPU	Tensor Processing Unit	Tensor processing unit	Part 1
UTF-8	8-bit Unicode Transformation Format	8-bit Unicode transformation format	Part 2

Data Engineering and Platforms¶

Table FM-2: Data Engineering and Platforms.

Abbreviation	Full Name	Description	Main Locations
DataOps	Data Operations	Data operations and data-engineering operations system	Part 2, Part 8, Part 10
DOM	Document Object Model	Document object model	Part 3, Part 11
DVC	Data Version Control	Data version control tool or method	Part 1, Part 2, Part 8
FAISS	Facebook AI Similarity Search	Vector similarity search library	Part 7
FastText	FastText	Lightweight text representation and classification tool	Part 2
LakeFS	LakeFS	Version management system for data lakes	Part 1, Part 8
MATTR	Moving-Average Type-Token Ratio	Moving-average type-token ratio	Part 2
MFU	Model FLOPs Utilization	Model FLOPs utilization	Part 2
MinHash	Min-wise Independent Permutations Hashing	Approximate deduplication method based on min-wise hashing	Part 1, Part 2, Part 11
OOM	Out Of Memory	Out-of-memory error	Part 1, Part 2
PPL	Perplexity	Perplexity metric	Part 1, Part 2, Part 4
RDMA	Remote Direct Memory Access	Remote direct memory access	Part 1
ReDoS	Regular Expression Denial of Service	Regular-expression denial-of-service risk	Part 2
TTR	Type-Token Ratio	Type-token ratio, a diversity metric	Part 2
WARC	Web ARChive	Web archive format	Part 2
WebDataset	WebDataset	Data packaging format and tool for large-scale training	Part 2, Part 3

Training, Alignment, and Reasoning¶

Table FM-3: Training, Alignment, and Reasoning.

Abbreviation	Full Name	Description	Main Locations
CoT	Chain-of-Thought	Chain-of-thought reasoning	Part 6, Part 10, Part 11
DPO	Direct Preference Optimization	Direct preference optimization	Part 4, Part 11
LoRA	Low-Rank Adaptation	Low-rank adaptation fine-tuning method	Part 11
PPO	Proximal Policy Optimization	Proximal policy optimization	Part 4, Part 11
PRM	Process Reward Model	Process reward model	Part 4, Part 6, Part 10, Part 11
QA	Quality Assurance	Quality assurance	Part 4, Part 10
RAG	Retrieval-Augmented Generation	Retrieval-augmented generation	Part 7, Part 10
RL	Reinforcement Learning	Reinforcement learning	Part 4, Part 11
RLAIF	Reinforcement Learning from AI Feedback	Reinforcement learning from AI feedback	Part 4
RLHF	Reinforcement Learning from Human Feedback	Reinforcement learning from human feedback	Part 4, Part 11
RM	Reward Model	Reward model	Part 4
ROUGE-L	Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence	Text-similarity metric based on the longest common subsequence	Part 2, Part 4
SFT	Supervised Fine-Tuning	Supervised fine-tuning	Part 4, Part 10, Part 11

Multimodality and Vision¶

Table FM-4: Multimodality and Vision.

Abbreviation	Full Name	Description	Main Locations
BBox	Bounding Box	Bounding box	Part 3, Part 10, Part 11
ChartQA	Chart Question Answering	Chart question-answering task or dataset	Part 3, Part 11
CLIP	Contrastive Language-Image Pre-training	Contrastive image-text pre-training model	Part 3, Part 11
CLIP-Score	CLIP Score	Image-text relevance score based on CLIP	Part 11
COCO	Common Objects in Context	General object-detection and image-captioning dataset	Part 3, Part 10
DINO	DEtection TRansformer with Improved deNoising anchOr boxes	Detection model family, often used in Grounding DINO contexts	Part 3, Part 11
DocVQA	Document Visual Question Answering	Document visual question-answering task or dataset	Part 11
Grounding	Visual Grounding	Visual grounding or alignment task	Part 3, Part 10, Part 11
IoU	Intersection over Union	Object-detection overlap metric	Part 3
LLaVA	Large Language and Vision Assistant	Multimodal large model and data format name	Part 10, Part 11
OCR	Optical Character Recognition	Optical character recognition	Part 3, Part 7, Part 10, Part 11
OCR-Rich	OCR-Rich Data	Image or document data rich in OCR information	Part 11
SSIM	Structural Similarity Index Measure	Structural similarity metric	Part 11
ViT	Vision Transformer	Vision Transformer encoder	Part 11
VLM	Vision-Language Model	Vision-language model	Part 3, Part 11
VQA	Visual Question Answering	Visual question answering	Part 3, Part 11
XML	eXtensible Markup Language	Extensible markup language	Part 1, Part 3
YOLO	You Only Look Once	Object-detection model family	Part 3

Evaluation, Compliance, and Governance¶

Table FM-5: Evaluation, Compliance, and Governance.

Abbreviation	Full Name	Description	Main Locations
AGI-Eval	AGI Evaluation	Evaluation benchmark for general-intelligence capabilities	Part 11
DPIA	Data Protection Impact Assessment	Data protection impact assessment	Part 9
GSM8K	Grade School Math 8K	Grade-school math reasoning benchmark	Part 1, Part 11
MCTS	Monte Carlo Tree Search	Monte Carlo tree search	Part 11
MMLU	Massive Multitask Language Understanding	Massive multitask language-understanding benchmark	Part 1, Part 11
MMMU	Massive Multi-discipline Multimodal Understanding and Reasoning	Multidiscipline multimodal understanding and reasoning benchmark	Part 11
NSFW	Not Safe For Work	Content unsuitable for public or workplace contexts	Part 2
P99	99th Percentile	99th percentile metric	Part 2
P99.9	99.9th Percentile	99.9th percentile metric	Part 2
RoPA	Record of Processing Activities	Record of processing activities	Part 9