Docs/TDD Rubric

TDD Rubric

Scoring criteria for evaluating test quality during code review.

Scoring (1-5 per criterion)

1. Five Questions Coverage (weight: 3x)

  • 5: Every test clearly answers all 5 questions via assert format
  • 4: Most tests answer all 5; minor gaps in given/should clarity
  • 3: Tests have assertions but given/should are vague or describe literals
  • 2: Tests exist but don't use structured assert; hard to understand intent
  • 1: Tests are present but don't answer basic questions about behavior

2. Test Isolation (weight: 2x)

  • 5: Zero shared mutable state; each test is completely independent
  • 4: Mostly isolated; factory functions used; minor shared setup
  • 3: Some shared fixtures but no inter-test dependencies
  • 2: Shared mutable state exists between tests
  • 1: Tests depend on execution order or external state

3. Requirement Coverage (weight: 3x)

  • 5: Every functional requirement has corresponding test(s); edge cases covered
  • 4: All core requirements tested; most edge cases covered
  • 3: Core happy-path requirements tested; some edge cases missing
  • 2: Partial coverage; significant requirements untested
  • 1: Minimal or no meaningful coverage of requirements

4. Test Readability (weight: 2x)

  • 5: Tests read as documentation; a new developer understands behavior from tests alone
  • 4: Tests are clear; minor improvements possible in naming/structure
  • 3: Tests are understandable but require some code reading to grasp intent
  • 2: Tests are hard to follow; poor naming or structure
  • 1: Tests are opaque; no clear connection between test name and behavior

5. No Redundancy (weight: 1x)

  • 5: No type-shape tests, no framework-behavior tests, no duplicate tests
  • 4: Minor redundancy that doesn't obscure the suite
  • 3: Some unnecessary tests that add noise
  • 2: Significant redundancy obscuring the meaningful tests
  • 1: More noise than signal in the test suite

Score Calculation

Weighted total / max possible = percentage

  • 90-100%: Exemplary — publish as reference
  • 75-89%: Strong — minor improvements only
  • 60-74%: Adequate — address specific gaps
  • 40-59%: Needs Work — significant revision required
  • Below 40%: Insufficient — rewrite tests