# Critic Checklist Reference

> Loaded by Step 5 (Critic Evaluation). 5 dimensions, 20 items. Each scored PASS/FAIL. Threshold: 18/20 to pass. Maximum 3 revision rounds.

## Table of Contents

1. [Scoring Rules](#scoring-rules)
2. [Dimension 1: Clarity (5 items)](#dimension-1-clarity-5-items)
3. [Dimension 2: Accuracy (4 items)](#dimension-2-accuracy-4-items)
4. [Dimension 3: Style (5 items)](#dimension-3-style-5-items)
5. [Dimension 4: Reproducibility (3 items)](#dimension-4-reproducibility-3-items)
6. [Dimension 5: Caption (3 items)](#dimension-5-caption-3-items)
7. [Evaluation Output Template](#evaluation-output-template)
8. [Revision Protocol](#revision-protocol)

---

## Scoring Rules

- Each item: PASS (1 point) or FAIL (0 points)
- Total: 20 points maximum
- Pass threshold: 18/20 (90%)
- Maximum revision rounds: 3
- Each revision must address ALL flagged FAIL items
- Items not applicable to the figure type: auto-PASS (e.g., error bars for conceptual figures)

---

## Dimension 1: Clarity (5 items)

| ID | Check | PASS Criteria | Common FAIL |
|----|-------|---------------|-------------|
| C1 | Caption self-explanatory | Reader understands figure without reading paper body | Caption says only "Results" with no detail |
| C2 | Labels readable | All text >= minimum font size at print dimensions | 6pt text on IEEE single-column figure |
| C3 | Visual hierarchy | Most important data visually prominent (size, color, position) | All lines same weight, "Ours" indistinguishable |
| C4 | No overlapping elements | Labels, legends, data points, annotations do not occlude each other | Legend sits on top of data region |
| C5 | Adequate whitespace | Margins around figure, between subfigures, around legends | Elements crammed to edges, no breathing room |

### How to Fix Common FAIL

- **C1**: Add "showing that [method] achieves [X]% improvement over [baseline]" to caption
- **C2**: Increase font size or reduce figure complexity (fewer elements)
- **C3**: Use thicker lines / brighter colors for "Ours", thinner / muted for baselines
- **C4**: Relocate legend (outside plot area, or to whitespace region)
- **C5**: Add `plt.tight_layout(pad=1.5)` or increase figure dimensions

---

## Dimension 2: Accuracy (4 items)

| ID | Check | PASS Criteria | Common FAIL |
|----|-------|---------------|-------------|
| A1 | Data fidelity | Plotted values exactly match source data (spot-check 3 points) | Transposed rows/columns, wrong method assigned to line |
| A2 | Encoding faithful | Visual encoding matches data semantics (e.g., larger bar = larger value) | Inverted y-axis without indicator, log scale unlabeled |
| A3 | Axes honest | No truncated axes without break markers; scale is fair | Y-axis starts at 90% making 1% difference look huge |
| A4 | Error bars correct | Error bars represent stated metric (std, sem, CI) and are labeled | Error bars present but unlabeled, or wrong metric |

### How to Fix Common FAIL

- **A1**: Cross-reference code data arrays with source table, print values to verify
- **A2**: Check axis direction, scale type (linear/log), and legend-to-data mapping
- **A3**: Start y-axis at 0, or use axis break (`//`) marker if truncated
- **A4**: Add "(mean +/- std, n=5)" to caption or legend

---

## Dimension 3: Style (5 items)

| ID | Check | PASS Criteria | Common FAIL |
|----|-------|---------------|-------------|
| S1 | Palette matches venue | Colors from venue-specific palette in academic-styles.md | Using matplotlib default colors instead of venue palette |
| S2 | Colorblind safe | All series distinguishable under simulated deuteranopia | Red and green lines only differ by hue |
| S3 | Grayscale compatible | Figure readable when desaturated (for B&W printing) | Two series both map to medium gray |
| S4 | Font compliant | Font family and sizes match venue specification | Sans-serif used for NeurIPS (should be serif) |
| S5 | Size compliant | Figure fits within venue column width at stated dimensions | 8" wide figure for IEEE single-column (max 3.5") |

### How to Fix Common FAIL

- **S1**: Replace colors with hex values from `references/academic-styles.md`
- **S2**: Add secondary encoding (markers: o, s, ^, D, v) alongside color
- **S3**: Add line styles (solid, dashed, dotted, dash-dot) as tertiary encoding
- **S4**: Update `matplotlib.rcParams['font.family']`
- **S5**: Adjust `figsize` to venue constraints

---

## Dimension 4: Reproducibility (3 items)

| ID | Check | PASS Criteria | Common FAIL |
|----|-------|---------------|-------------|
| R1 | Code complete | Code runs without modification (all data inline, no external files) | References `data.csv` that doesn't exist |
| R2 | Imports present | All required imports listed at top of code | Missing `import seaborn as sns` |
| R3 | Output configured | Explicit DPI, savefig with bbox_inches='tight', both PNG and PDF | No savefig, or missing DPI, or only PNG |

**Scope**: Only applies to code-based figures (data-plot, comparison-chart, result-visualization). Auto-PASS for AI-generated figures.

### How to Fix Common FAIL

- **R1**: Replace file reads with inline data arrays (`np.array([...])`)
- **R2**: Run code mentally — every function call must have its import
- **R3**: Add `plt.savefig('fig.png', dpi=300, bbox_inches='tight')` + PDF variant

---

## Dimension 5: Caption (3 items)

| ID | Check | PASS Criteria | Common FAIL |
|----|-------|---------------|-------------|
| P1 | Content described | Caption describes WHAT the figure shows (not just "Results") | "Figure 1: Comparison." — no detail |
| P2 | Key finding stated | Caption highlights the main takeaway for the reader | Describes setup but not conclusion |
| P3 | Subfigures referenced | Multi-panel figures: each panel described in caption (a, b, c) | 4 panels but caption only mentions "left" and "right" |

### How to Fix Common FAIL

- **P1**: Structure as "Figure N: [verb] [what]. [context]." e.g., "Figure 3: Comparison of training convergence across 5 methods on CIFAR-10."
- **P2**: Add final sentence: "Method A converges 2x faster than the strongest baseline."
- **P3**: Add "(a) Training loss. (b) Validation accuracy. (c) Inference latency."

---

## Evaluation Output Template

```
## Critic Evaluation — Round N/3

| Dim.            | Items | Score | Failed IDs | Issues |
|-----------------|-------|-------|------------|--------|
| Clarity         | 5     | X/5   | [CX]       | [description] |
| Accuracy        | 4     | X/4   | [AX]       | [description] |
| Style           | 5     | X/5   | [SX]       | [description] |
| Reproducibility | 3     | X/3   | [RX]       | [description] |
| Caption         | 3     | X/3   | [PX]       | [description] |
|-----------------|-------|-------|------------|--------|
| **Total**       | **20**| **X/20** |         |        |

Verdict: PASS (>= 18) / REVISE (< 18)

[If REVISE:]
Revision actions:
1. [Fix for failed item 1]
2. [Fix for failed item 2]
...
```

---

## Revision Protocol

### Round Management

```
Round 1: Initial evaluation
  ↓ REVISE → apply fixes
Round 2: Re-evaluate ALL items (not just failed ones)
  ↓ REVISE → apply fixes
Round 3: Final evaluation
  ↓ REVISE → output with [!] Quality warning
  ↓ PASS → proceed to Step 6
```

### Rules

1. Each round evaluates ALL 20 items fresh (fixes can introduce new issues)
2. Revision must address every FAIL item — no "defer to next round"
3. After round 3, if still < 18/20:
   - Output the figure with explicit warning
   - List all remaining FAIL items
   - Suggest user manual review
4. A PASS in any round immediately proceeds to Step 6 (no unnecessary iterations)

### Quality Warning Format

```
[!] Quality warning: Figure output with 2 unresolved issues after 3 revision rounds.

Remaining issues:
- [S2] Colorblind safety: series 3 and 5 may be confusing under deuteranopia
- [C4] Minor legend overlap with rightmost data point

Recommendation: Manual review before submission.
```

---

*Cross-reference: SKILL.md §Step 5, §NEVER Rule #7*