Purpose
This library extends the workshop pack with deeper guided exercises for junior, intermediate, senior, and Staff practitioners.
The point is not to train people to get an answer from AI as quickly as possible.
The point is to train:
- mode selection
- question quality
- verification habits
- judgment under uncertainty
- reflection on where AI helps and where it creates risk
Design rules
- Prefer language-agnostic tasks that can be run in most software-delivery environments.
- Make the work feel real enough to be interesting, but bounded enough to be teachable.
- Focus on paired-engineering habits rather than syntax trivia or tool fandom.
- Keep the exercise objective centered on reasoning, verification, and workflow quality.
- Use the same exercise structure so facilitators can swap domains without rewriting the teaching method.
Rating model
Each exercise has two ratings.
These ratings are not pretend precision.
They are a shared scoping heuristic for facilitators and learners.
The purpose is to make the expected shape of the work visible before the exercise begins.
Boundary rule
- the minimum rating is
1 - the maximum rating is
5 - there is no valid
0 - there is no valid
6
If a task feels smaller than 1, combine it with another small task until it forms a meaningful exercise.
If a task feels larger than 5, break it down into smaller exercises before using it for practice.
The point is to keep training work bounded, teachable, and reviewable.
Size
Size answers:
How much work surface is involved?
This is mostly about scope, number of moving parts, and amount of work to inspect.
1
- one small artifact or rule set
- narrow task surface
- short completion window
2
- two artifacts or one artifact plus one signal
- slightly larger comparison space
3
- several artifacts with one meaningful dependency
- more than one plausible path
4
- multiple artifacts with cross-checking required
- wider impact or more steps to verify
5
- cross-artifact reasoning, tradeoffs, or partial investigation
- enough surface area that sequencing matters
Examples:
size 1: fix or extend one small test case, clarify one narrow rule, inspect one bounded bugsize 3: compare a few options, update several related artifacts, or verify a multi-step workflowsize 5: partial incident reconstruction, broad refactor planning, or cross-team rollout reasoning
Complexity
Complexity answers:
How deep, ambiguous, or hard to verify is the work?
This is mostly about reasoning difficulty, uncertainty, and risk of false confidence.
1
- low ambiguity
- clear verification path
- low risk of false confidence
2
- one or two traps
- moderate ambiguity
- still fairly observable
3
- multiple plausible interpretations
- verification requires care
- risk of shallow confidence is real
4
- hidden dependencies or low-observability reasoning
- stronger tradeoffs and escalation judgment needed
5
- incomplete information
- cross-boundary reasoning
- high risk of overreliance if the facilitator does not slow people down
Examples:
complexity 1: straightforward boundary check or local test correction with clear expected behaviorcomplexity 3: multiple plausible explanations, tradeoffs, or verification trapscomplexity 5: long-standing low-observability defect, ambiguous incident, or system behavior where elegant answers are easy to generate and hard to trust
How to read the ratings
The two ratings work together.
- high size and low complexity can still be heavy because there is a lot of work surface
- low size and high complexity can still be dangerous because the reasoning is hard
- high size and high complexity should be used carefully and often need stronger facilitation
The ratings are there to set expectations, not to pretend every exercise is objectively measurable.
How to use the library
- Start with the exercise level that matches the learner's current oversight readiness, not just their title.
- Use a short worked example before independent exercise time when the pattern is new.
- Ask learners to choose
learning modeordelivery modefirst. - Require a verification plan before accepting any AI-assisted answer.
- End every exercise with reflection on what the model helped with, what it made easier to miss, and what still required human judgment.
Track guidance
The full library spans size 1-5 and complexity 1-5 overall.
The junior track intentionally stays lower-to-mid on complexity while still increasing in scope.
The intermediate track pushes harder on ambiguity, verification difficulty, and system effects.
The senior and Staff track is not the main exercise audience, but it is still useful for:
- low-observability reasoning
- architecture and standards critique
- workflow and tooling decisions
- coaching and review discipline
- cross-team tradeoff judgment
Junior track
- Junior Exercise 1 - Guardrail Test Matrix
size 1complexity 1-
focus: edge-case thinking and simple verification
size 2complexity 2-
focus: hypothesis generation and minimal reproduction
size 3complexity 2-
focus: clarification, question design, and assumption control
size 4complexity 3-
focus: behavioral preservation and verification planning
size 5complexity 3- focus: early incident reasoning without overclaiming certainty
Intermediate track
- Intermediate Exercise 1 - Flaky Test or Race Condition
size 2complexity 3-
focus: rejecting shallow fixes and investigating observability gaps
size 3complexity 3-
focus: performance advice, tradeoffs, and safe skepticism
-
Intermediate Exercise 3 - Policy Change With Hidden Edge Cases
size 3complexity 4-
focus: domain logic, policy interpretation, and verification traps
size 4complexity 4-
focus: rollout safety across service boundaries
size 5complexity 5- focus: incomplete evidence, escalation, and disciplined uncertainty
Senior and Staff track
- Senior Staff Exercise 1 - Architecture Option Critique Under Missing Evidence
size 2complexity 4-
focus: option critique, uncertainty, and unsupported assumptions
size 3complexity 4-
focus: detecting hidden system cost behind apparently successful adoption
size 3complexity 4-
focus: turning vague review expectations into explicit standards
-
Senior Staff Exercise 4 - Tool Selection and Governance Tradeoff
size 4complexity 5-
focus: workflow fit, edit surface, governance, and long-term operational tradeoffs
size 5complexity 5- focus: interpreting weak rollout signals and deciding how to intervene
Facilitator pattern
For most sessions, use this sequence:
- Briefly frame the scenario and desired learning pattern.
- Ask learners to name the mode first.
- Let them work with AI in a bounded way.
- Require a verification path and a confidence statement.
- Debrief what AI improved, what it obscured, and what should change next time.
Relationship to the workshop pack
Use Workshop Pack - Paired Engineering with AI for the base workshop flow.
Use this library when you want deeper practice, better progression, or more targeted sessions for junior, intermediate, senior, and Staff practitioners.
Use Exercise Worksheet Pack - Paired Engineering with AI when a facilitator wants ready-to-run delivery guidance rather than scenario notes alone.