Critic Checklist Reference
Loaded by Step 5 (Critic Evaluation). 5 dimensions, 20 items. Each scored PASS/FAIL. Threshold: 18/20 to pass. Maximum 3 revision rounds.
Table of Contents
- Scoring Rules
- Dimension 1: Clarity (5 items)
- Dimension 2: Accuracy (4 items)
- Dimension 3: Style (5 items)
- Dimension 4: Reproducibility (3 items)
- Dimension 5: Caption (3 items)
- Evaluation Output Template
- Revision Protocol
Scoring Rules
- Each item: PASS (1 point) or FAIL (0 points)
- Total: 20 points maximum
- Pass threshold: 18/20 (90%)
- Maximum revision rounds: 3
- Each revision must address ALL flagged FAIL items
- Items not applicable to the figure type: auto-PASS (e.g., error bars for conceptual figures)
Dimension 1: Clarity (5 items)
| ID |
Check |
PASS Criteria |
Common FAIL |
| C1 |
Caption self-explanatory |
Reader understands figure without reading paper body |
Caption says only "Results" with no detail |
| C2 |
Labels readable |
All text >= minimum font size at print dimensions |
6pt text on IEEE single-column figure |
| C3 |
Visual hierarchy |
Most important data visually prominent (size, color, position) |
All lines same weight, "Ours" indistinguishable |
| C4 |
No overlapping elements |
Labels, legends, data points, annotations do not occlude each other |
Legend sits on top of data region |
| C5 |
Adequate whitespace |
Margins around figure, between subfigures, around legends |
Elements crammed to edges, no breathing room |
How to Fix Common FAIL
- C1: Add "showing that [method] achieves [X]% improvement over [baseline]" to caption
- C2: Increase font size or reduce figure complexity (fewer elements)
- C3: Use thicker lines / brighter colors for "Ours", thinner / muted for baselines
- C4: Relocate legend (outside plot area, or to whitespace region)
- C5: Add
plt.tight_layout(pad=1.5) or increase figure dimensions
Dimension 2: Accuracy (4 items)
| ID |
Check |
PASS Criteria |
Common FAIL |
| A1 |
Data fidelity |
Plotted values exactly match source data (spot-check 3 points) |
Transposed rows/columns, wrong method assigned to line |
| A2 |
Encoding faithful |
Visual encoding matches data semantics (e.g., larger bar = larger value) |
Inverted y-axis without indicator, log scale unlabeled |
| A3 |
Axes honest |
No truncated axes without break markers; scale is fair |
Y-axis starts at 90% making 1% difference look huge |
| A4 |
Error bars correct |
Error bars represent stated metric (std, sem, CI) and are labeled |
Error bars present but unlabeled, or wrong metric |
How to Fix Common FAIL
- A1: Cross-reference code data arrays with source table, print values to verify
- A2: Check axis direction, scale type (linear/log), and legend-to-data mapping
- A3: Start y-axis at 0, or use axis break (
//) marker if truncated
- A4: Add "(mean +/- std, n=5)" to caption or legend
Dimension 3: Style (5 items)
| ID |
Check |
PASS Criteria |
Common FAIL |
| S1 |
Palette matches venue |
Colors from venue-specific palette in academic-styles.md |
Using matplotlib default colors instead of venue palette |
| S2 |
Colorblind safe |
All series distinguishable under simulated deuteranopia |
Red and green lines only differ by hue |
| S3 |
Grayscale compatible |
Figure readable when desaturated (for B&W printing) |
Two series both map to medium gray |
| S4 |
Font compliant |
Font family and sizes match venue specification |
Sans-serif used for NeurIPS (should be serif) |
| S5 |
Size compliant |
Figure fits within venue column width at stated dimensions |
8" wide figure for IEEE single-column (max 3.5") |
How to Fix Common FAIL
- S1: Replace colors with hex values from
references/academic-styles.md
- S2: Add secondary encoding (markers: o, s, ^, D, v) alongside color
- S3: Add line styles (solid, dashed, dotted, dash-dot) as tertiary encoding
- S4: Update
matplotlib.rcParams['font.family']
- S5: Adjust
figsize to venue constraints
Dimension 4: Reproducibility (3 items)
| ID |
Check |
PASS Criteria |
Common FAIL |
| R1 |
Code complete |
Code runs without modification (all data inline, no external files) |
References data.csv that doesn't exist |
| R2 |
Imports present |
All required imports listed at top of code |
Missing import seaborn as sns |
| R3 |
Output configured |
Explicit DPI, savefig with bbox_inches='tight', both PNG and PDF |
No savefig, or missing DPI, or only PNG |
Scope: Only applies to code-based figures (data-plot, comparison-chart, result-visualization). Auto-PASS for AI-generated figures.
How to Fix Common FAIL
- R1: Replace file reads with inline data arrays (
np.array([...]))
- R2: Run code mentally — every function call must have its import
- R3: Add
plt.savefig('fig.png', dpi=300, bbox_inches='tight') + PDF variant
Dimension 5: Caption (3 items)
| ID |
Check |
PASS Criteria |
Common FAIL |
| P1 |
Content described |
Caption describes WHAT the figure shows (not just "Results") |
"Figure 1: Comparison." — no detail |
| P2 |
Key finding stated |
Caption highlights the main takeaway for the reader |
Describes setup but not conclusion |
| P3 |
Subfigures referenced |
Multi-panel figures: each panel described in caption (a, b, c) |
4 panels but caption only mentions "left" and "right" |
How to Fix Common FAIL
- P1: Structure as "Figure N: [verb] [what]. [context]." e.g., "Figure 3: Comparison of training convergence across 5 methods on CIFAR-10."
- P2: Add final sentence: "Method A converges 2x faster than the strongest baseline."
- P3: Add "(a) Training loss. (b) Validation accuracy. (c) Inference latency."
Evaluation Output Template
## Critic Evaluation — Round N/3
| Dim. | Items | Score | Failed IDs | Issues |
|-----------------|-------|-------|------------|--------|
| Clarity | 5 | X/5 | [CX] | [description] |
| Accuracy | 4 | X/4 | [AX] | [description] |
| Style | 5 | X/5 | [SX] | [description] |
| Reproducibility | 3 | X/3 | [RX] | [description] |
| Caption | 3 | X/3 | [PX] | [description] |
|-----------------|-------|-------|------------|--------|
| **Total** | **20**| **X/20** | | |
Verdict: PASS (>= 18) / REVISE (< 18)
[If REVISE:]
Revision actions:
1. [Fix for failed item 1]
2. [Fix for failed item 2]
...
Revision Protocol
Round Management
Round 1: Initial evaluation
↓ REVISE → apply fixes
Round 2: Re-evaluate ALL items (not just failed ones)
↓ REVISE → apply fixes
Round 3: Final evaluation
↓ REVISE → output with [!] Quality warning
↓ PASS → proceed to Step 6
Rules
- Each round evaluates ALL 20 items fresh (fixes can introduce new issues)
- Revision must address every FAIL item — no "defer to next round"
- After round 3, if still < 18/20:
- Output the figure with explicit warning
- List all remaining FAIL items
- Suggest user manual review
- A PASS in any round immediately proceeds to Step 6 (no unnecessary iterations)
Quality Warning Format
[!] Quality warning: Figure output with 2 unresolved issues after 3 revision rounds.
Remaining issues:
- [S2] Colorblind safety: series 3 and 5 may be confusing under deuteranopia
- [C4] Minor legend overlap with rightmost data point
Recommendation: Manual review before submission.
Cross-reference: SKILL.md §Step 5, §NEVER Rule #7