If you earned your ISTQB Certified Tester AI Testing (CT-AI) certificate under the v1.0 syllabus, or if you are now preparing for the exam for the first time, you need to understand exactly how v2.0 changes the picture. This is not a minor revision. ISTQB formally released v2.0 on April 17, 2026, and the changes are substantial enough that existing v1.0 study material is no longer safe to rely on.
This post covers every significant change in the new CT-AI 2.0 syllabus, explains what was added, what was removed, and what was restructured, and tells you what it means for your exam preparation and professional practice.
Who Is the ISTQB CT-AI Certification For?
Before diving into the changes, a quick orientation. The CT-AI certification sits in ISTQB’s Specialist stream. It is aimed at people involved in testing AI-based systems, including testers, test analysts, test engineers, data analysts, and test managers. It is also relevant for software developers, project managers, and business analysts who want a structured understanding of how to test systems built on machine learning and generative AI.
The main entry requirement is holding the ISTQB Certified Tester Foundation Level (CTFL) certificate. ISTQB strongly recommends at least six months of practical experience in software testing, data science, or development before attempting the CT-AI exam.
The Headline Changes at a Glance
The v2.0 release notes summarize the key structural shifts as follows:
- Introduction to AI has been shortened
- Generative AI and the testing of Generative AI are now included
- AI quality characteristics have been consolidated
- Focus on ML performance metrics has been reduced
- Test levels have been refined around two ML-specific levels: input data testing and ML model testing
- Testing with AI (using AI as a test tool) has been fully removed from this syllabus
- Test environments for AI-based systems are no longer covered
- Minimum required training time drops from four days to three days (19.5 hours)
That last point is significant for training providers and candidates alike. The syllabus is leaner, but what it covers, it covers in more depth and with more practical relevance to the state of the field in 2026.
Chapter by Chapter: What Actually Changed
Chapter 1 — Introduction to Artificial Intelligence (120 minutes)
The scope here has been tightened. V1.0 spent considerable time on foundational AI theory. V2.0 still introduces the distinction between AI-based and conventional systems, and still distinguishes between narrow AI, general AI, and super AI, but the treatment is more efficient.
What is new in v2.0:
Frontier AI is now explicitly defined as a subset of narrow AI representing the most advanced current systems, including large-scale GenAI models with highly autonomous decision-making. This was absent from v1.0.
Agentic AI is introduced as AI that uses autonomous agents to plan, reason, and act independently in dynamic environments. This reflects the growing deployment of agentic frameworks in real systems and is directly relevant to testing challenges.
Generative AI (section 1.1.4) now has its own dedicated section covering the main technologies (GANs, diffusion models, transformers), societal concerns (deepfakes, employment impact, sustainability), and the regulatory landscape including the EU AI Act. This was not covered in v1.0.
The hardware section (1.1.5) now explicitly covers quantization (low-precision arithmetic) as a hardware characteristic relevant to ML inference, and neuromorphic processors are introduced as an emerging AI-specific hardware architecture.
The regulations section (1.1.8) has been updated to reference the EU AI Act directly, as well as the ISO/IEC TR 29119-11 and the new ISO/IEC 42119 series for AI testing. Candidates are now expected to understand how risk-based regulatory frameworks such as the EU AI Act classify AI systems and what high-risk classification means for testing obligations.
What learning objectives cover (all K2):
All eight learning objectives in Chapter 1 are at the K2 (Understand) level. You need to be able to differentiate, explain, compare, and summarize. You will not be asked to calculate or design, but you will be expected to recognize the practical implications of each concept.
Chapter 2 — Quality Characteristics for AI-Based Systems (45 minutes)
This chapter remains relatively stable in structure but has been sharpened.
The coverage of ISO/IEC 25059 quality characteristics is now clearly framed around seven AI-specific attributes: AI functional correctness, functional adaptability, user controllability, transparency, AI robustness, intervenability, and societal and ethical risk mitigation.
What is important for exam candidates:
Each quality characteristic is linked to a parent category in the ISO/IEC 25010 quality model. For example, AI robustness is a sub-characteristic of reliability, and intervenability is a sub-characteristic of security. The exam tests your ability to classify a described system behavior against the correct quality characteristic, so you need to know both the definitions and the parent categories.
The acceptance criteria table (section 2.2.1) is expanded in v2.0 with more concrete, measurable examples for each characteristic. These examples are directly examinable and worth studying carefully. For instance, the acceptance criterion for intervenability specifying that a production line must be capable of shutting down within 0.5 seconds after a shutdown is initiated illustrates the type of threshold-based, non-binary criterion that distinguishes AI testing from conventional functional testing.
The safety section (2.1.2) enumerates five specific additional challenges when AI is used in safety-related systems: vague specifications, non-determinism, self-learning, limited explainability, and evolving regulations. Each challenge is explained with practical examples. Candidates should be able to explain each one and recognize why it complicates safety demonstration.
Chapter 3 — Machine Learning (375 minutes)
This is the largest chapter and the most technically demanding. V2.0 adds significant new content here.
Fine-tuning and Retrieval-Augmented Generation (section 3.1.4)
This section is entirely new. V1.0 touched on pretrained models but did not cover fine-tuning or RAG in any detail. V2.0 explains both approaches clearly:
Fine-tuning adapts a pretrained neural network for a new task by performing additional training, either on the full network or on specific layers. Success depends on the similarity between the original and target tasks. Importantly, biases and vulnerabilities from the pretrained model carry forward to the fine-tuned model, which directly creates a testing obligation.
RAG supplements a large language model with relevant external documents at inference time without modifying the model itself. It improves response precision by enriching the prompt with retrieved information. The exam expects you to understand the difference between fine-tuning (changes the model) and RAG (does not change the model) and to recognize the testing implications of each.
Training, Validation, and Test Datasets (section 3.2.3)
V2.0 adds a substantive treatment of k-fold cross-validation as a strategy for limited data scenarios. The mechanics, purpose, and tradeoffs of k-fold are now examinable at K2. You should understand why a three-way data split can be problematic with limited data, what k-fold achieves, how stratified sampling is used within k-fold, and what the final holdout test set is for.
ML Functional Performance Metrics (section 3.3)
The four core metrics (accuracy, precision, recall, F1-score) and their derivation from the confusion matrix remain central, but the K-level is notable: AI-3.3.1 is K3 (Apply). Candidates must be able to calculate these metrics from given confusion matrix data. This is one of the few calculation-based objectives in the syllabus.
The formulas are:
- Accuracy = (TP + TN) / (TP + TN + FP + FN) × 100%
- Precision = TP / (TP + FP) × 100%
- Recall = TP / (TP + FN) × 100%
- F1-score = 2 × (Precision × Recall) / (Precision + Recall)
Work through these until they are automatic. Past exam questions in similar ISTQB specializations have presented a confusion matrix and asked for one or more of these values.
Neural Network Coverage (section 3.4.3)
Three structural coverage measures for neural networks are introduced: neuron coverage, k-multisection neuron coverage (kMNC), and neuron boundary coverage (NBC). These are distinct from code coverage in conventional testing. Candidates should know what each measure captures and understand the fundamental limitation: structural coverage alone cannot guarantee that a neural network will generalize well or handle real-world variation.
Chapter 4 — Testing AI-Based Systems (195 minutes)
This chapter has the most visible new content in v2.0.
Locked vs Adaptive AI Systems (section 4.1.1)
This distinction is now more nuanced. V2.0 explicitly places most GenAI systems (such as LLM-based chatbots) between fully locked and fully adaptive: they are deployed as locked models at runtime but updated periodically. This has direct implications for test strategy and is worth understanding clearly.
Statistical Approach to Testing (section 4.1.2)
V2.0 dedicates explicit content to explaining why a statistical approach is necessary for AI testing. Four reasons are enumerated: non-determinism, distributional performance evaluation, handling uncertainty and bias, and regulatory/safety context. Candidates at K2 need to be able to explain each reason, not just list it.
Testing Generative AI and LLMs (section 4.2)
This is an entirely new section with no equivalent in v1.0.
Section 4.2.1 covers the black-box approach to testing GenAI, the input explosion problem (the vast combinatorial space of prompts, parameters, system prompts, and context windows), qualitative evaluation criteria, use of a second GenAI system as a test oracle, non-functional testing of resource utilization, and standardized benchmark suites.
Section 4.2.2 covers Red Teaming (RT) in detail. RT is described as a systematic, often black-box form of fault attack that probes an AI-based system to identify harmful capabilities. The learning objective AI-4.2.2 is K3 (Apply), meaning candidates must be able to implement a red teaming approach, not just describe it.
Key points on red teaming to internalize:
- RT is most effective before deployment, after initial internal quality assessments are complete
- The core activity is interactive prompting, typically multi-turn dialogues of 15-20 turns
- The five steps: assemble a diverse team, provide access in a safe environment, prompt to find vulnerabilities, analyze failures, create datasets from threats
- RT complements Blue Teaming (ongoing defensive monitoring of a deployed system)
- RT is increasingly a regulatory expectation under frameworks like the EU AI Act
Note that v2.0 explicitly references the CT-GenAI syllabus for deeper coverage of using GenAI as a testing tool. That content has been split out entirely from CT-AI. If you want to use AI for test generation and test execution, you need the separate CT-GenAI certification.
Risk-Based Testing of ML Systems (section 4.3.2)
The three risk categories (development, input data, model) are now explicitly structured around the ML workflow. This gives candidates a practical framework for organizing a test strategy discussion in exam scenarios.
Chapter 5 — Input Data Testing for Machine Learning Systems (180 minutes)
This chapter is largely intact from v1.0 but with refinements.
Testing for Bias (section 5.1.2)
The bias testing section now includes a formal four-step procedure for disparate impact analysis: identify sensitive attributes, generate counterfactuals, generate model output on counterfactuals, analyze results statistically to determine if changing a sensitive attribute changes outcomes. The cautionary note about unrealistic counterfactuals is worth remembering for the exam.
Data Pipeline Testing (section 5.1.3)
The layered approach to data pipeline testing is now more clearly structured around conventional test levels (component, component integration, system, system integration) applied to the pipeline specifically. The distinction between training pipeline testing (focuses on data integrity) and operational pipeline testing (prioritizes reliability, performance, and maintainability) is explicitly drawn.
Dataset Constraint Testing (section 5.1.5)
This section is at K3 in v2.0, meaning candidates must be able to apply dataset constraint testing, not merely describe it. The taxonomy of constraints is explicit: single-value constraints (missing, range, type), multi-value constraints (sum, count, duplicate, useful, outlier), and comparison constraints (greater than, correlate). Know the difference between single-value and multi-value constraints and be able to identify which type of constraint applies to a given scenario.
Label Correctness Testing (section 5.1.6)
Seven approaches are covered: expert review, multiple annotation, risk-based prioritization, data distribution analysis, automated rule-based tests, model loss analysis, and model confidence score analysis. Inter-annotator agreement (IAA) measured by Cohen’s Kappa is referenced as the quantitative measure for multiple annotation quality.
Chapter 6 — Model Testing for Machine Learning Systems (225 minutes)
This is the second-largest chapter, with several important v2.0 additions.
ML Model Documentation and Review (section 6.1.2)
V2.0 includes a detailed checklist of what comprehensive ML model documentation should contain: general identifiers, design decisions, usage intent (including bias, safety, transparency, data drift, concept drift), dataset details, testing results, functional and non-functional performance data, and operational/deployment information. Documentation review against this checklist is a core test activity in v2.0 and a direct response to the EU AI Act’s transparency obligations.
ML Functional Performance Testing of Probabilistic Systems (section 6.1.3)
This section has been significantly expanded and is more quantitatively rigorous than v1.0. The statistical framing is clear: instead of a pass/fail threshold, a requirement specifies a performance metric, a margin of error (MoE), and a confidence level (CL).
The worked example in the syllabus is instructive: to confirm 98% accuracy with a MoE of ±4% at 95% CL, you need at least 601 test cases, of which at least 589 must pass. Sequential testing allows early stopping if the observed accuracy stays consistently high.
Important: the syllabus explicitly notes that candidates will not be required to derive or compute sample sizes, MoEs, or confidence intervals using statistical formulas in the exam. You need to understand the concepts and interpret results, not perform the calculations.
The final test report format is also specified: “The model achieved an accuracy of X% ±Y% at Z% CL.” Recognizing this format and understanding what it communicates to stakeholders is examinable.
Metamorphic Testing (section 6.1.5)
At K3 (Apply), this is one of the most practically demanding topics in the syllabus. Metamorphic testing (MT) derives follow-up test cases from a source test case by applying a metamorphic relation (MR). The MR captures a known property of the function, such as monotonicity (increasing cigarettes smoked should decrease predicted age at death) or invariance (translating an image should not change its classification).
MT is particularly valuable for AI testing because it addresses the test oracle problem: when you cannot determine the absolute correct output, you can still verify that outputs maintain correct relationships under systematic input changes. Applications mentioned include image recognition, search engines, route optimization, and voice recognition.
For the exam, practice identifying appropriate MRs for given scenarios and generating follow-up test cases from source test cases. Know when MT is preferred: when no reliable expected outputs exist, when the system is a black box, or when relational properties are sufficient for confidence.
Drift Testing (section 6.1.7)
The distinction between data drift and concept drift is now more precisely drawn. Data drift is a change in the statistical distribution of input data over time. Concept drift is a change in the relationship between input data and the correct output, meaning the model’s learned decision boundaries become outdated even if the input distribution stays similar.
Dynamic drift testing requires feedback (ground truth) from users. Static drift testing uses statistical tests such as Kolmogorov-Smirnov to compare distributions without requiring ground truth.
A/B Testing vs Back-to-Back Testing (sections 6.1.9, 6.1.10)
V2.0 clarifies the distinction between these two approaches more explicitly than v1.0. A/B testing compares two variants of the same system using ML functional performance metrics and statistical techniques to determine which is better. Back-to-back testing uses a pseudo-oracle (a separate system or reference implementation) to detect defects by comparing outputs for the same inputs. A/B testing is about improvement measurement; back-to-back testing is about defect detection.
Chapter 7 — Machine Learning Development Testing (30 minutes)
This short chapter has been reorganized and focuses specifically on risks from ML development tools, configuration choices, and deployment mechanisms.
ML System Deployment Testing (section 7.1.2)
V2.0 introduces several deployment-specific test types that were not named in v1.0:
Canary testing releases an updated model to a small subset of production traffic (such as 5% of users) and monitors real-time metrics before full rollout.
Shadow testing runs a new model in parallel with the current production model, routing the same requests to both without affecting live responses. This allows defect detection using live data before deployment commitment.
Model conversion testing verifies that a model retains acceptable accuracy and efficiency after being converted from its training format to a deployment format for a target environment.
Rollback testing verifies the ability to revert to a previous stable state after a failed deployment. Critically, rollback testing must be performed before deployment to confirm readiness.
Cross-device testing verifies correct behavior across the full range of deployment targets, from mobile and edge devices to cloud servers.
What Was Completely Removed from v2.0
The most significant removal is the chapter on Testing with AI. In v1.0, CT-AI covered how AI tools can be used to support test activities such as test case generation, test data generation, and test oracle automation. That content now belongs entirely to the separate ISTQB CT-GenAI (Testing with Generative AI) certification. If you want credit for this knowledge area, you need to pursue CT-GenAI separately.
Test environments for AI-based systems is also no longer covered. V1.0 included content on what test environments for ML systems should look like. This has been removed from scope.
The Structural Impact: From 4 Days to 3 Days
V1.0 required a minimum of four days (approximately 26 hours) of instruction. V2.0 reduces this to 19.5 hours, distributed across seven chapters. The time allocation per chapter is:
| Chapter | Topic | Time |
|---|---|---|
| 1 | Introduction to AI | 120 min |
| 2 | Quality Characteristics | 45 min |
| 3 | Machine Learning | 375 min |
| 4 | Testing AI-Based Systems | 195 min |
| 5 | Input Data Testing | 180 min |
| 6 | Model Testing | 225 min |
| 7 | ML Development Testing | 30 min |
| Total | 1,170 min (19.5 hours) |
Chapter 3 (Machine Learning, 375 minutes) and Chapter 6 (Model Testing, 225 minutes) together account for more than half the total instruction time. This is a reliable signal of exam weight.
Exam Structure and Preparation Strategy
The CT-AI v2.0 exam is multiple choice, based on the learning objectives listed at the beginning of each chapter. Hands-on objectives (the practical lab exercises) are not directly examinable in the written exam, but the concepts they exercise are.
The cognitive level distribution matters for preparation:
K1 (Remember) — No K1 objectives in this syllabus. However, all keywords listed at the beginning of each chapter must be recalled at K1 implicitly.
K2 (Understand) — The majority of learning objectives are at this level. You must be able to classify, compare, distinguish, explain, and give examples. Understanding why, not just what.
K3 (Apply) — Four objectives require application: AI-3.3.1 (calculate ML performance metrics), AI-4.2.2 (implement red teaming), AI-5.1.5 (apply dataset constraint testing), and AI-6.1.5 (use metamorphic testing to derive test cases). These are the most likely sources of scenario-based questions that present a situation and ask you to select the correct action.
Preparation priorities based on exam weight and K-level:
- Master the confusion matrix and all four derived metrics (K3, will be calculated)
- Know the red teaming process end to end (K3, scenario-based)
- Be able to generate metamorphic relations and follow-up test cases for described scenarios (K3)
- Apply dataset constraint testing to a given dataset description (K3)
- Distinguish all ISO/IEC 25059 quality characteristics and classify behavior examples (K2)
- Explain statistical ML functional performance testing including MoE, CL, and sequential testing concepts (K2)
- Distinguish data drift from concept drift and explain static vs dynamic drift testing (K2)
- Explain fine-tuning vs RAG and their respective testing implications (K2)
- Distinguish A/B testing from back-to-back testing by purpose (K2)
- Know all seven deployment testing types in section 7.1.2 by name, definition, and purpose (K2)
Frequently Asked Questions about CT-AI v2.0
Can I still use v1.0 study materials?
Not safely. The removal of the Testing with AI chapter, the addition of GenAI testing, RAG, red teaming, the new deployment testing types, and the restructured statistical testing approach mean that v1.0 materials will actively mislead you on a meaningful portion of the exam content. Use v2.0-aligned resources only.
Is CT-GenAI a prerequisite for CT-AI?
No. They are independent specialist certifications. CT-AI focuses on testing of AI-based systems. CT-GenAI focuses on using GenAI as a tool within testing activities. Many professionals will eventually want both, but there is no ordering requirement beyond holding the CTFL Foundation Level certificate.
Do I need to retake the exam if I already hold CT-AI v1.0?
ISTQB credentials do not automatically expire. If you already hold the v1.0 certificate, it remains valid. However, if your role requires current knowledge of the v2.0 content areas (particularly GenAI testing, red teaming, statistical performance testing, and new deployment practices), the v2.0 exam is worth considering for professional development.
How many questions are on the CT-AI exam and what is the passing score?
The exam structure details are published in the separate CT-AI v2.0 Exam Structures and Rules document. As of this writing, ISTQB typically sets specialist exams at 40 questions with a 65% passing threshold, but confirm these details against the official document before your exam.
Which standards do I need to know?
ISO/IEC 25059 (quality model for AI systems) and ISO/IEC 25010 (product quality model) are the primary quality standards. ISO/IEC TR 29119-11 provides guidance on testing AI-based systems. The EU AI Act is referenced in the context of regulatory requirements for high-risk systems and red teaming. Standards documents themselves are not examinable beyond what is summarized in the syllabus.
Summary: The v2.0 CT-AI in One Paragraph
The CT-AI v2.0 syllabus is a genuine modernization of the certification. It drops the now-separate Testing with AI content, adds thorough coverage of generative AI testing including red teaming and LLM-specific challenges, strengthens the statistical foundations of ML model testing, refines the two ML-specific test levels (input data testing and model testing), and introduces deployment testing practices relevant to production ML systems. It is three days of instruction rather than four, but the material is denser and more practice-oriented. For anyone testing or managing the quality of AI-based systems in 2026, the v2.0 syllabus represents the current state of professional knowledge in this field.
This post was written based on the ISTQB CT-AI Syllabus v2.0 GA (released April 17, 2026). All syllabus summaries are paraphrased from the official document. For examination purposes, always refer to the official ISTQB syllabus and exam structures documents at istqb.org.
Discover more from ISTQB Guru
Subscribe to get the latest posts sent to your email.
Have a question? Ask here.