Abbreviations¶
This page lists high-frequency technical abbreviations used throughout the book for quick lookup across parts and chapters.
General Abbreviations¶
Table FM-1: General Abbreviations.
| Abbreviation | Full Name | Description | Main Locations |
|---|---|---|---|
| A100 | NVIDIA A100 GPU | NVIDIA A100 accelerator | Part 1, Part 10, Part 11 |
| AI | Artificial Intelligence | Artificial intelligence | Whole book |
| AGI | Artificial General Intelligence | Artificial general intelligence | Part 1, Part 11 |
| API | Application Programming Interface | Application programming interface | Part 1, Part 4, Part 10, Part 11 |
| ANN | Approximate Nearest Neighbor | Approximate nearest-neighbor retrieval | Part 1, Part 7 |
| ASR | Automatic Speech Recognition | Automatic speech recognition | Part 3 |
| BM25 | Best Matching 25 | Classic sparse retrieval ranking method | Part 1, Part 7 |
| CI/CD | Continuous Integration / Continuous Deployment | Continuous integration and continuous deployment | Part 1, Part 2, Part 8 |
| CPU | Central Processing Unit | Central processing unit | Part 1, Part 2 |
| CSV | Comma-Separated Values | Comma-separated text format | Part 1 |
| ETL | Extract, Transform, Load | Extract, transform, and load | Part 1, Part 8 |
| GPU | Graphics Processing Unit | Graphics processing unit | Part 1, Part 3, Part 10, Part 11 |
| GUID | Globally Unique Identifier | Globally unique identifier | Part 2 |
| HDFS | Hadoop Distributed File System | Hadoop distributed file system | Part 1, Part 10 |
| H100 | NVIDIA H100 GPU | NVIDIA H100 accelerator | Part 1 |
| JSON | JavaScript Object Notation | Structured data interchange format | Part 1, Part 3, Part 4, Part 10, Part 11 |
| JSONL | JSON Lines | Line-delimited JSON text format | Part 1, Part 3, Part 4, Part 10 |
| KPI | Key Performance Indicator | Key performance indicator | Part 4, Part 8 |
| LLM | Large Language Model | Large language model | Whole book |
| MLOps | Machine Learning Operations | Machine-learning engineering and operations system | Part 1, Part 8 |
| Portable Document Format | Portable document format | Part 1, Part 3, Part 7, Part 10, Part 11 | |
| PII | Personally Identifiable Information | Personally identifiable information | Part 2, Part 9, Part 10 |
| ROI | Return on Investment | Return on investment | Part 1, Part 8 |
| SLA | Service Level Agreement | Service-level agreement | Part 1, Part 4, Part 8 |
| SOPs | Standard Operating Procedures | Standard operating procedures | Part 4, Part 8 |
| SQL | Structured Query Language | Structured query language | Part 1, Part 4, Part 11 |
| TPU | Tensor Processing Unit | Tensor processing unit | Part 1 |
| UTF-8 | 8-bit Unicode Transformation Format | 8-bit Unicode transformation format | Part 2 |
Data Engineering and Platforms¶
Table FM-2: Data Engineering and Platforms.
| Abbreviation | Full Name | Description | Main Locations |
|---|---|---|---|
| DataOps | Data Operations | Data operations and data-engineering operations system | Part 2, Part 8, Part 10 |
| DOM | Document Object Model | Document object model | Part 3, Part 11 |
| DVC | Data Version Control | Data version control tool or method | Part 1, Part 2, Part 8 |
| FAISS | Facebook AI Similarity Search | Vector similarity search library | Part 7 |
| FastText | FastText | Lightweight text representation and classification tool | Part 2 |
| LakeFS | LakeFS | Version management system for data lakes | Part 1, Part 8 |
| MATTR | Moving-Average Type-Token Ratio | Moving-average type-token ratio | Part 2 |
| MFU | Model FLOPs Utilization | Model FLOPs utilization | Part 2 |
| MinHash | Min-wise Independent Permutations Hashing | Approximate deduplication method based on min-wise hashing | Part 1, Part 2, Part 11 |
| OOM | Out Of Memory | Out-of-memory error | Part 1, Part 2 |
| PPL | Perplexity | Perplexity metric | Part 1, Part 2, Part 4 |
| RDMA | Remote Direct Memory Access | Remote direct memory access | Part 1 |
| ReDoS | Regular Expression Denial of Service | Regular-expression denial-of-service risk | Part 2 |
| TTR | Type-Token Ratio | Type-token ratio, a diversity metric | Part 2 |
| WARC | Web ARChive | Web archive format | Part 2 |
| WebDataset | WebDataset | Data packaging format and tool for large-scale training | Part 2, Part 3 |
Training, Alignment, and Reasoning¶
Table FM-3: Training, Alignment, and Reasoning.
| Abbreviation | Full Name | Description | Main Locations |
|---|---|---|---|
| CoT | Chain-of-Thought | Chain-of-thought reasoning | Part 6, Part 10, Part 11 |
| DPO | Direct Preference Optimization | Direct preference optimization | Part 4, Part 11 |
| LoRA | Low-Rank Adaptation | Low-rank adaptation fine-tuning method | Part 11 |
| PPO | Proximal Policy Optimization | Proximal policy optimization | Part 4, Part 11 |
| PRM | Process Reward Model | Process reward model | Part 4, Part 6, Part 10, Part 11 |
| QA | Quality Assurance | Quality assurance | Part 4, Part 10 |
| RAG | Retrieval-Augmented Generation | Retrieval-augmented generation | Part 7, Part 10 |
| RL | Reinforcement Learning | Reinforcement learning | Part 4, Part 11 |
| RLAIF | Reinforcement Learning from AI Feedback | Reinforcement learning from AI feedback | Part 4 |
| RLHF | Reinforcement Learning from Human Feedback | Reinforcement learning from human feedback | Part 4, Part 11 |
| RM | Reward Model | Reward model | Part 4 |
| ROUGE-L | Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence | Text-similarity metric based on the longest common subsequence | Part 2, Part 4 |
| SFT | Supervised Fine-Tuning | Supervised fine-tuning | Part 4, Part 10, Part 11 |
Multimodality and Vision¶
Table FM-4: Multimodality and Vision.
| Abbreviation | Full Name | Description | Main Locations |
|---|---|---|---|
| BBox | Bounding Box | Bounding box | Part 3, Part 10, Part 11 |
| ChartQA | Chart Question Answering | Chart question-answering task or dataset | Part 3, Part 11 |
| CLIP | Contrastive Language-Image Pre-training | Contrastive image-text pre-training model | Part 3, Part 11 |
| CLIP-Score | CLIP Score | Image-text relevance score based on CLIP | Part 11 |
| COCO | Common Objects in Context | General object-detection and image-captioning dataset | Part 3, Part 10 |
| DINO | DEtection TRansformer with Improved deNoising anchOr boxes | Detection model family, often used in Grounding DINO contexts | Part 3, Part 11 |
| DocVQA | Document Visual Question Answering | Document visual question-answering task or dataset | Part 11 |
| Grounding | Visual Grounding | Visual grounding or alignment task | Part 3, Part 10, Part 11 |
| IoU | Intersection over Union | Object-detection overlap metric | Part 3 |
| LLaVA | Large Language and Vision Assistant | Multimodal large model and data format name | Part 10, Part 11 |
| OCR | Optical Character Recognition | Optical character recognition | Part 3, Part 7, Part 10, Part 11 |
| OCR-Rich | OCR-Rich Data | Image or document data rich in OCR information | Part 11 |
| SSIM | Structural Similarity Index Measure | Structural similarity metric | Part 11 |
| ViT | Vision Transformer | Vision Transformer encoder | Part 11 |
| VLM | Vision-Language Model | Vision-language model | Part 3, Part 11 |
| VQA | Visual Question Answering | Visual question answering | Part 3, Part 11 |
| XML | eXtensible Markup Language | Extensible markup language | Part 1, Part 3 |
| YOLO | You Only Look Once | Object-detection model family | Part 3 |
Evaluation, Compliance, and Governance¶
Table FM-5: Evaluation, Compliance, and Governance.
| Abbreviation | Full Name | Description | Main Locations |
|---|---|---|---|
| AGI-Eval | AGI Evaluation | Evaluation benchmark for general-intelligence capabilities | Part 11 |
| DPIA | Data Protection Impact Assessment | Data protection impact assessment | Part 9 |
| GSM8K | Grade School Math 8K | Grade-school math reasoning benchmark | Part 1, Part 11 |
| MCTS | Monte Carlo Tree Search | Monte Carlo tree search | Part 11 |
| MMLU | Massive Multitask Language Understanding | Massive multitask language-understanding benchmark | Part 1, Part 11 |
| MMMU | Massive Multi-discipline Multimodal Understanding and Reasoning | Multidiscipline multimodal understanding and reasoning benchmark | Part 11 |
| NSFW | Not Safe For Work | Content unsuitable for public or workplace contexts | Part 2 |
| P99 | 99th Percentile | 99th percentile metric | Part 2 |
| P99.9 | 99.9th Percentile | 99.9th percentile metric | Part 2 |
| RoPA | Record of Processing Activities | Record of processing activities | Part 9 |