Appendix B: Compliance and Release Checklist¶

B.1 Purpose of This Appendix¶

This appendix focuses on the checkpoints that determine whether a dataset can continue moving downstream. It is not a collection of abstract compliance slogans. It asks concrete engineering questions: can a batch of data enter annotation, enter training, be released externally, support a course experiment, or connect to an online system?

In large-model data engineering, the most dangerous situation is often not that nobody has heard of a relevant regulation. It is that the team gradually treats compliance as an approval form to be completed at the end of the project. In reality, compliance and release checks must run through the data lifecycle. Unclear sources, ambiguous authorization, unstable anonymization, evaluation contamination, missing resource statements, and overexposed teaching environments are all much more expensive to fix late than early.

This appendix therefore does not provide legal advice, medical advice, financial or investment advice, nor does it constitute regulatory approval, ethics review, or release permission. It is a checklist framework better suited to engineering-team execution and traceability. Its goal is to let technical leads, project managers, course owners, and compliance contacts use the same vocabulary and reduce cross-role communication cost.

In scenarios involving law, medicine, finance, minors, cross-border data, sensitive personal information, or industry regulation, readers should rely on their institution's formal policies, the current laws of the relevant jurisdiction, data-provider contracts, ethics-review requirements, and professional compliance opinions. In the mainland China context, cybersecurity, data security, and personal-information protection should be understood in relation to the Cybersecurity Law of the People's Republic of China, the Data Security Law of the People's Republic of China, and the Personal Information Protection Law of the People's Republic of China (National People's Congress of the People's Republic of China 2016, 2021a, 2021b). The checklists in this appendix can only help teams identify issues that need escalated review in advance; they cannot replace the professional judgment of lawyers, physicians, financial compliance personnel, security leads, or ethics committees.

B.2 Why Compliance Checks Must Shift Left¶

If compliance is checked only before release, teams usually encounter three expensive forms of rework. First, source rework: the data has already been collected and cleaned before the team discovers that the original authorization does not allow model training or redistribution. Second, annotation rework: annotation is complete before the team realizes that sensitive fields were not properly anonymized. Third, release rework: a benchmark is ready to publish before the team discovers unstable train/test boundaries or conflicts between external licenses and leaderboard rules.

A more stable approach is to split compliance into four gates. This split can also align with risk-management frameworks: the NIST AI RMF emphasizes organizing AI risk through governance, mapping, measurement, and management, while the EU Artificial Intelligence Act further reflects a regulatory approach that assigns obligations and boundaries by risk level (National Institute of Standards and Technology 2023; European Parliament and Council of the European Union 2024).

Source and authorization checks before data ingestion.
Sensitivity and delegation-boundary checks before annotation and processing.
Data-use boundary checks before training and evaluation.
Public-exposure checks before external release or system launch.

Once these gates exist, many problems that would otherwise explode at the end of a project can be contained earlier.

B.3 Pre-Ingestion Checklist¶

B.3.1 Source and Authorization Are the First Gate¶

The first question before data ingestion is not whether the data is worth collecting, but whether it can be collected legally and contractually. At minimum, check the fields in Table B-1.

Check Item	Question to Answer	Common Risk	Recommended Action
Source owner	Who provides the data?	Broken source chain or unclear resale path	Keep source notes and contact information
Usage license	Does it allow training, evaluation, and redistribution?	Allowed for papers but restricted for commercial or public use	Maintain license allowlists and graylists
robots / ToS	Does crawling conflict with site rules?	Rule violation and takedown risk	Preserve rule snapshots and timestamps
Jurisdiction boundary	Does the data involve cross-border or regulated industry data?	Unclear transfer boundary	Confirm with legal and security contacts
Update cycle	Does the source change over time?	The same version cannot be reproduced later	Freeze crawl windows and snapshots

Source checks matter because every downstream engineering action amplifies source problems. The closer a project gets to model training and public release, the more expensive it becomes to change source strategy.

B.3.2 Classify Personal and Sensitive Information Before Processing¶

If samples may contain names, contact information, ID numbers, medical records, location traces, information about minors, internal enterprise documents, or trade secrets, the team should not immediately debate which model to use for anonymization. It should first classify the information.

Level	Typical Content	Default Strategy
L0	Public, low-risk, explicitly authorized data	Enter the standard governance flow
L1	Ordinary personal information or business fields	Minimize collection and anonymize before transfer
L2	Sensitive personal information, medical, financial, educational, and similar data	Require special approval and isolated processing
L3	Highly sensitive, classified, or strongly contract-restricted data	Generally exclude from general training pipelines

The common mistake is treating anonymization as a universal pass. Often the issue is not whether a field has been masked, but whether the task itself depends on sensitive attributes. If it does, the team should first redesign the task boundary rather than assume one regex replacement can solve the problem.

B.4 Annotation, Delegation, and Third-Party Collaboration¶

B.4.1 Annotation Platforms Are Not Automatic Compliance Boundaries¶

Many teams upload samples to an external annotation platform and assume that the platform's permission system is sufficient. That is risky. At minimum, the annotation stage must confirm:

Whether the third party is explicitly authorized to access this category of data.
Whether annotation instructions disclose internal information beyond the task need.
Whether annotation results may flow back into other projects.
Whether logs, screenshots, previews, or caches create new exposure surfaces.

Check Item	Question to Answer	Recommended Action
Delegation boundary	Who can see raw samples?	Use role layering and least privilege
Data anonymization	Has required processing been completed before annotation?	Isolate anonymized and raw versions
Annotation instructions	Do the guidelines disclose internal information?	Review instructions before release
Logs and caches	Does the platform retain previews or downloads?	Define cleanup rules and retention periods
Disputed samples	Can samples flow into public discussion spaces?	Isolate them and do not send raw text externally

B.4.2 University Collaboration Requires Clear Output Boundaries¶

In university collaboration, a common risk is an unclear boundary between project work and publishable research. Samples allowed for internal project use may not be allowed in paper appendices. Screenshots allowed in class demonstrations may not be packaged as an external dataset.

Define the output boundary at the start:

Which results are for internal project use only.
Which statistics can appear in papers while raw samples cannot be distributed.
Which data can enter course images but cannot be downloaded.
Which content can become a benchmark and which must remain an internal test set.

Without these boundaries, teams often reach a painful state where the engineering work is complete but the contract boundary does not permit release.

B.5 Pre-Training and Pre-Evaluation Checklist¶

B.5.1 A Legal Training Set Does Not Make the Evaluation Set Safe¶

Training and evaluation are often treated as one governance problem, but their risks differ. Training sets focus on source, authorization, sensitivity, and task fit. Evaluation sets must additionally address contamination, isolation, and comparison fairness.

Check Item	Before Training	Before Evaluation
Source and license	Is training allowed?	Is public testing or leaderboard use allowed?
Data minimization	Are unnecessary fields retained?	Are standard answers or hints leaked?
Version freeze	Is the training version locked?	Are the test version and scripts locked?
Contamination check	Has it touched public test sets?	Has the test set been polluted by training corpora?
Resource statement	Can training resources be recorded?	Are submission resource conditions comparable?

For open benchmarks, the evaluation set is not simply "good if it is high quality." External teams must trust that comparison conditions are stable and fair.

B.5.2 LLM Judges and Automatic Anonymization Must Be Audited¶

As more workflows use large models for judging, summarization, classification, and anonymization, the auxiliary model itself must enter the compliance view. Check whether:

The auxiliary model sends input samples to an external service.
The service terms allow retention, training use, or human review of inputs.
Judge conclusions determine official labels or launch decisions.
Failed anonymization samples enter a human review channel.

From a governance perspective, calling an external API with sample content is a new data exposure event, even if the technical team thinks of it as an internal convenience.

B.6 External Release and Public Benchmark Checklist¶

B.6.1 Prepare Four Documents Before Release¶

A mature dataset or benchmark should have at least four document types before external release:

A data or benchmark card describing the task, sample structure, splits, and limits.
A license and usage note explaining what is allowed and prohibited.
A baseline bundle with a minimally reproducible baseline.
An update and dispute-handling mechanism describing versions, feedback, and withdrawal paths.

Publishing only samples and a paper link makes safe reuse difficult and dispute handling slow.

B.6.2 Leaderboard Governance Matters More Than "Whether It Is Public"¶

Many benchmarks fail not because they are insufficiently public, but because they lack governance after publication.

Check Item	Question to Answer
Submission method	What can be uploaded and what is prohibited?
Resource statement	Must model size, inference budget, and retrieval resources be reported?
Human review	What triggers manual review?
Removal rules	What happens after contamination, cheating, or unauthorized access is found?
Teaching isolation	Is the course version separate from the public leaderboard version?

Clear rules prevent a public leaderboard from becoming a confusing score wall.

B.6.3 Minimal Release Sign-Off Form¶

Every formal public release should create a minimal sign-off form.

Field	Required Content
asset_name	Dataset, benchmark, or course image name
release_version	External release version
owner	Data owner and evaluation owner
source_check	Whether source and license review is complete
privacy_check	Whether anonymization and sensitive-information checks are complete
contamination_check	Whether train/test contamination checks are complete
baseline_ready	Whether the baseline bundle is complete
rollback_path	Withdrawal path after a problem is found
public_notice	Location of the release announcement or documentation

The value of this form is that it records who approved the release and what evidence the decision relied on.

B.7 Pre-Launch Checklist for Systems¶

B.7.1 "Runs" and "Can Launch" Are Separated by at Least Five Control Layers¶

A model or data workflow running in an experiment environment is not enough for production. At least five control layers are needed.

Control Layer	Key Question	Typical Checkpoints
Permissions	Who can access samples and results?	RBAC, least privilege, audit logs
Isolation	Are test and production environments separated?	Environment partitions, key management
Content	Can outputs violate rules or echo sensitive content?	Red-line samples, refusal policies
Rollback	Can the system be disabled quickly after an incident?	Version locking, canary switches
Records	Can the team review what happened?	Request logs, evaluation snapshots, incident ledger

In large-model applications, many incidents occur not because the model suddenly becomes worse, but because data, prompts, retrieval sources, or evaluation definitions change without being caught by launch controls.

B.7.2 Teaching Launch and Product Launch Have Different Thresholds¶

Course experiments often allow more explanatory process, more logging, and slower responses. Product environments usually cannot. Do not treat a course image as a production system, and do not expose a production audit environment directly to course practice.

Maintain separate versions for:

Teaching images.
Research reproduction experiments.
Online service releases.

They may reuse data assets, but they should not share the same exposure surface and permission boundary.

B.7.3 Special Checklist for Teaching Scenarios¶

University collaboration and course experiments are often misclassified as low risk because they involve fewer users and shorter cycles. In practice, teaching has its own risks: account sharing, raw samples pasted into reports, classroom recordings spreading, and mid-semester hot updates causing answer drift.

Check Item	Question to Answer	Recommended Action
Version freeze	Will the version stay stable during the semester?	Lock the image before the course starts
Permission scope	Can students download raw data?	Disable raw-layer downloads by default
Sample display	Can screenshots leak sensitive content?	Use teaching-safe anonymized samples
Assignment submission	Can submissions contain raw data?	Add submission rules and automatic scanning
TA operations	Who handles questions and incidents?	Establish duty and escalation paths

This checklist does not replace a product-launch checklist; it answers a different question: whether the system can be taught safely.

B.7.4 Additional Checks for Cross-Border Flows, External APIs, and Third-Party Models¶

As workflows rely more on external foundation models, cloud OCR, cloud storage, and SaaS annotation platforms, teams must identify whether samples leave the original control domain.

Check whether:

The call chain sends raw text, images, or audio to external services.
Those services retain logs, caches, or training rights.
Cross-border transfer, third-party subprocessors, or multilevel subcontracting are involved.
The team has fallback plans if an external service is interrupted or its terms change.

Many post-launch issues are not about model quality. They come from hidden data exposure points in the dependency chain.

B.8 Incident Response and Withdrawal Mechanisms¶

B.8.1 Public Release Without Withdrawal Is High-Risk Release¶

Once a dataset, leaderboard, or course resource is public, assume that one day:

Someone reports infringement or sensitive-information leakage.
Train/test contamination is discovered.
Baseline code contains a serious error.
The leaderboard receives abnormal or suspicious submissions.
A teaching image exposes an unauthorized access path.

Public assets need a withdrawal and repair process:

Intake: who receives reports.
Triage: whether immediate takedown is required.
Temporary action: hide downloads, freeze the leaderboard, or disable the image.
Investigation and review: confirm scope, cause, and remedies.
Public notice: publish revision notes and version changes.

B.8.2 Minimal Incident Ledger¶

Field	Description
incident_id	Incident identifier
reported_time	First report time
affected_asset	Dataset, leaderboard, or system involved
risk_level	Risk level
temporary_action	Temporary handling action
root_cause	Root cause
public_notice	Public notice link or note
preventive_action	Follow-up prevention measure

Institutionalizing this ledger prevents the same type of problem from recurring across semesters, projects, and owners.

B.8.3 High-Risk Red Flags¶

Use these red flags on a project board:

Data sources can be traced only to personal cloud drives, chat files, or secondary repost links.
All annotators can see all raw samples by default.
Teaching experiments reuse production environments or production keys.
Train/test splits are changed across versions without notice.
A public leaderboard accepts uploads without resource statements or removal rules.
External APIs see sample text, but nobody can explain their log-retention policy.
After the project lead leaves, nobody can identify the withdrawal or repair entry point.

If two or three of these appear, the project should move into a special review instead of continuing as routine work.

B.9 Role Division¶

Compliance and launch checks fail easily when "everyone knows they matter" but nobody owns them. At minimum, define four roles.

Role	Main Responsibilities
Data owner	Source, authorization, versioning, takedown decisions
Evaluation owner	Test isolation, leaderboard definitions, baseline review
Security/compliance contact	Approval boundaries, sensitive-information handling, external report coordination
Teaching owner	Course images, semester versions, classroom-use boundaries

These do not have to be four separate jobs, but the responsibilities cannot be absent.

B.9.1 Questions for a Quarterly Compliance Review¶

For long-running projects, run a lightweight compliance review each quarter. It should answer:

Did any new data source this quarter change its license boundary?
Did the project add any external service call chain, and is it recorded?
Were there any sample withdrawals, anonymization patches, or leaderboard disputes?
Are teaching, research, and public versions still clearly isolated?
Which layer should be improved next quarter: source governance, permission governance, or withdrawal flow?

This turns compliance from a last-minute blocker into part of stable project evolution.

B.9.2 Withdrawal Drills Should Happen Before Real Incidents¶

Many teams write withdrawal mechanisms but never test whether they can locate, freeze, announce, and rebuild affected versions within half a day. This is like writing a disaster-recovery plan and never running a drill.

Once per semester or quarter, run a small withdrawal drill:

Confirm that sample location can be traced from the release version back to the original source and intermediate versions.
Confirm that training sets, evaluation sets, teaching images, and public downloads can identify affected scope.
Confirm whether leaderboards, baselines, and course experiments need freezing or notices.
Confirm that external report handlers and internal owners can communicate without gaps.
Confirm that revised versions can be regenerated quickly while preserving historical notes.

The point is not only emergency response. It reveals whether a dataset truly has a governable asset structure.

B.10 Summary¶

This appendix translates compliance and release concerns into a checklist that engineering teams can execute.

First, compliance is not a final add-on. It is a continuous set of gates in the data lifecycle.

Second, training, evaluation, public release, course reproduction, and product launch have different boundaries and must be checked separately.

Third, long-term risk is reduced not by one approval form but by source records, version freezes, incident withdrawal, and clear role ownership.

References¶

National People's Congress of the People's Republic of China (2016) Cybersecurity Law of the People's Republic of China. https://www.gov.cn/xinwen/2016-11/07/content_5129723.htm.

National People's Congress of the People's Republic of China (2021a) Data Security Law of the People's Republic of China. https://www.gov.cn/xinwen/2021-06/11/content_5616919.htm.

National People's Congress of the People's Republic of China (2021b) Personal Information Protection Law of the People's Republic of China. https://www.gov.cn/xinwen/2021-08/20/content_5632486.htm.

National Institute of Standards and Technology (2023) AI Risk Management Framework (AI RMF 1.0). https://www.nist.gov/itl/ai-risk-management-framework.

European Parliament and Council of the European Union (2024) Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/eli/reg/2024/1689/oj.

Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji I D, Gebru T (2019) Model Cards for Model Reporting. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp 220-229. https://doi.org/10.1145/3287560.3287596.