Docs/TDD Rubric

TDD Rubric

Scoring criteria for evaluating test quality during code review.

Scoring (1-5 per criterion)

1. Five Questions Coverage (weight: 3x)

5: Every test clearly answers all 5 questions via assert format
4: Most tests answer all 5; minor gaps in given/should clarity
3: Tests have assertions but given/should are vague or describe literals
2: Tests exist but don't use structured assert; hard to understand intent
1: Tests are present but don't answer basic questions about behavior

2. Test Isolation (weight: 2x)

5: Zero shared mutable state; each test is completely independent
4: Mostly isolated; factory functions used; minor shared setup
3: Some shared fixtures but no inter-test dependencies
2: Shared mutable state exists between tests
1: Tests depend on execution order or external state

3. Requirement Coverage (weight: 3x)

5: Every functional requirement has corresponding test(s); edge cases covered
4: All core requirements tested; most edge cases covered
3: Core happy-path requirements tested; some edge cases missing
2: Partial coverage; significant requirements untested
1: Minimal or no meaningful coverage of requirements

4. Test Readability (weight: 2x)

5: Tests read as documentation; a new developer understands behavior from tests alone
4: Tests are clear; minor improvements possible in naming/structure
3: Tests are understandable but require some code reading to grasp intent
2: Tests are hard to follow; poor naming or structure
1: Tests are opaque; no clear connection between test name and behavior

5. No Redundancy (weight: 1x)

5: No type-shape tests, no framework-behavior tests, no duplicate tests
4: Minor redundancy that doesn't obscure the suite
3: Some unnecessary tests that add noise
2: Significant redundancy obscuring the meaningful tests
1: More noise than signal in the test suite

Score Calculation

Weighted total / max possible = percentage

90-100%: Exemplary — publish as reference
75-89%: Strong — minor improvements only
60-74%: Adequate — address specific gaps
40-59%: Needs Work — significant revision required
Below 40%: Insufficient — rewrite tests