Purpose
This draft is a one-page operational sheet for tracking the pilot evidence model.
It works best as a shared spreadsheet or table with a few tabs rather than one long flat page.
Ground rules
- Use existing systems where possible.
- Keep reporting aggregate by default.
- Track only metrics that will influence refine, pause, or scale decisions.
- Do not add fields that no one will actually maintain.
Recommended tabs
Pilot overviewWorkflow KPI trackerSampled case logSupport and learning logWeekly action log
Pilot metadata
- pilot name:
- date range:
- sponsor:
- technical enablement lead:
- participating teams:
- target roles:
- baseline window:
- weekly review day:
- survey tool:
- evidence-sheet owner:
Target workflows
| Role | Workflow | System of record | Owner | Baseline ready? | Workflow tag | Notes |
|---|---|---|---|---|---|---|
| Developer | ||||||
| QA/SDET | ||||||
| Architect | ||||||
| Product owner |
Baseline-ready status
yes: comparable baseline is readypartial: partial baseline exists, but interpretation will be cautiousno: no usable baseline, use early pilot weeks as directional baseline only
Workflow KPI tracker
| KPI | Workflow | Data source | Cadence | Owner | Baseline | Current | Trend | Decision note |
|---|---|---|---|---|---|---|---|---|
| Target workflow median cycle time | issue tracker | weekly | ||||||
| Downstream rework rate | issue tracker / defects | weekly | ||||||
| Review burden | code review system | weekly | ||||||
| Guardrail exception count | exception log | weekly | ||||||
| Adoption quality score | sampled case review | weekly | ||||||
| Retrieval follow-up score | form or pulse | after learning events | ||||||
| Confidence calibration check | form plus sampled review | biweekly |
Trend values
updownflatmixedunknown
Sampled case log
| Week | Cases reviewed | Workflow mix | Good examples | Weak examples | Main pattern observed | Follow-up owner | Action needed |
|---|---|---|---|---|---|---|---|
Support and learning log
| Signal | Source | Cadence | Notes |
|---|---|---|---|
| Workshop attendance | attendance log | per session | |
| Office-hours participation | enablement log | weekly | |
| Manager coaching uptake | short manager pulse | biweekly | |
| Peer review support uptake | checklist or sample review | biweekly |
Weekly action log
| Week | What improved | What got heavier | What needs support | What needs inspection | Refine / pause / scale note |
|---|---|---|---|---|---|
Decision prompts
- Are target workflows improving without obvious hidden rework?
- Is verification quality holding where acceleration is increasing?
- Are lower-oversight-readiness participants learning better habits or just moving faster?
- Are managers and peers reinforcing the right behaviors?
- What should be refined, paused, or scaled next?