# Critic Checklist Reference > Loaded by Step 5 (Critic Evaluation). 5 dimensions, 20 items. Each scored PASS/FAIL. Threshold: 18/20 to pass. Maximum 3 revision rounds. ## Table of Contents 1. [Scoring Rules](#scoring-rules) 2. [Dimension 1: Clarity (5 items)](#dimension-1-clarity-5-items) 3. [Dimension 2: Accuracy (4 items)](#dimension-2-accuracy-4-items) 4. [Dimension 3: Style (5 items)](#dimension-3-style-5-items) 5. [Dimension 4: Reproducibility (3 items)](#dimension-4-reproducibility-3-items) 6. [Dimension 5: Caption (3 items)](#dimension-5-caption-3-items) 7. [Evaluation Output Template](#evaluation-output-template) 8. [Revision Protocol](#revision-protocol) --- ## Scoring Rules - Each item: PASS (1 point) or FAIL (0 points) - Total: 20 points maximum - Pass threshold: 18/20 (90%) - Maximum revision rounds: 3 - Each revision must address ALL flagged FAIL items - Items not applicable to the figure type: auto-PASS (e.g., error bars for conceptual figures) --- ## Dimension 1: Clarity (5 items) | ID | Check | PASS Criteria | Common FAIL | |----|-------|---------------|-------------| | C1 | Caption self-explanatory | Reader understands figure without reading paper body | Caption says only "Results" with no detail | | C2 | Labels readable | All text >= minimum font size at print dimensions | 6pt text on IEEE single-column figure | | C3 | Visual hierarchy | Most important data visually prominent (size, color, position) | All lines same weight, "Ours" indistinguishable | | C4 | No overlapping elements | Labels, legends, data points, annotations do not occlude each other | Legend sits on top of data region | | C5 | Adequate whitespace | Margins around figure, between subfigures, around legends | Elements crammed to edges, no breathing room | ### How to Fix Common FAIL - **C1**: Add "showing that [method] achieves [X]% improvement over [baseline]" to caption - **C2**: Increase font size or reduce figure complexity (fewer elements) - **C3**: Use thicker lines / brighter colors for "Ours", thinner / muted for baselines - **C4**: Relocate legend (outside plot area, or to whitespace region) - **C5**: Add `plt.tight_layout(pad=1.5)` or increase figure dimensions --- ## Dimension 2: Accuracy (4 items) | ID | Check | PASS Criteria | Common FAIL | |----|-------|---------------|-------------| | A1 | Data fidelity | Plotted values exactly match source data (spot-check 3 points) | Transposed rows/columns, wrong method assigned to line | | A2 | Encoding faithful | Visual encoding matches data semantics (e.g., larger bar = larger value) | Inverted y-axis without indicator, log scale unlabeled | | A3 | Axes honest | No truncated axes without break markers; scale is fair | Y-axis starts at 90% making 1% difference look huge | | A4 | Error bars correct | Error bars represent stated metric (std, sem, CI) and are labeled | Error bars present but unlabeled, or wrong metric | ### How to Fix Common FAIL - **A1**: Cross-reference code data arrays with source table, print values to verify - **A2**: Check axis direction, scale type (linear/log), and legend-to-data mapping - **A3**: Start y-axis at 0, or use axis break (`//`) marker if truncated - **A4**: Add "(mean +/- std, n=5)" to caption or legend --- ## Dimension 3: Style (5 items) | ID | Check | PASS Criteria | Common FAIL | |----|-------|---------------|-------------| | S1 | Palette matches venue | Colors from venue-specific palette in academic-styles.md | Using matplotlib default colors instead of venue palette | | S2 | Colorblind safe | All series distinguishable under simulated deuteranopia | Red and green lines only differ by hue | | S3 | Grayscale compatible | Figure readable when desaturated (for B&W printing) | Two series both map to medium gray | | S4 | Font compliant | Font family and sizes match venue specification | Sans-serif used for NeurIPS (should be serif) | | S5 | Size compliant | Figure fits within venue column width at stated dimensions | 8" wide figure for IEEE single-column (max 3.5") | ### How to Fix Common FAIL - **S1**: Replace colors with hex values from `references/academic-styles.md` - **S2**: Add secondary encoding (markers: o, s, ^, D, v) alongside color - **S3**: Add line styles (solid, dashed, dotted, dash-dot) as tertiary encoding - **S4**: Update `matplotlib.rcParams['font.family']` - **S5**: Adjust `figsize` to venue constraints --- ## Dimension 4: Reproducibility (3 items) | ID | Check | PASS Criteria | Common FAIL | |----|-------|---------------|-------------| | R1 | Code complete | Code runs without modification (all data inline, no external files) | References `data.csv` that doesn't exist | | R2 | Imports present | All required imports listed at top of code | Missing `import seaborn as sns` | | R3 | Output configured | Explicit DPI, savefig with bbox_inches='tight', both PNG and PDF | No savefig, or missing DPI, or only PNG | **Scope**: Only applies to code-based figures (data-plot, comparison-chart, result-visualization). Auto-PASS for AI-generated figures. ### How to Fix Common FAIL - **R1**: Replace file reads with inline data arrays (`np.array([...])`) - **R2**: Run code mentally — every function call must have its import - **R3**: Add `plt.savefig('fig.png', dpi=300, bbox_inches='tight')` + PDF variant --- ## Dimension 5: Caption (3 items) | ID | Check | PASS Criteria | Common FAIL | |----|-------|---------------|-------------| | P1 | Content described | Caption describes WHAT the figure shows (not just "Results") | "Figure 1: Comparison." — no detail | | P2 | Key finding stated | Caption highlights the main takeaway for the reader | Describes setup but not conclusion | | P3 | Subfigures referenced | Multi-panel figures: each panel described in caption (a, b, c) | 4 panels but caption only mentions "left" and "right" | ### How to Fix Common FAIL - **P1**: Structure as "Figure N: [verb] [what]. [context]." e.g., "Figure 3: Comparison of training convergence across 5 methods on CIFAR-10." - **P2**: Add final sentence: "Method A converges 2x faster than the strongest baseline." - **P3**: Add "(a) Training loss. (b) Validation accuracy. (c) Inference latency." --- ## Evaluation Output Template ``` ## Critic Evaluation — Round N/3 | Dim. | Items | Score | Failed IDs | Issues | |-----------------|-------|-------|------------|--------| | Clarity | 5 | X/5 | [CX] | [description] | | Accuracy | 4 | X/4 | [AX] | [description] | | Style | 5 | X/5 | [SX] | [description] | | Reproducibility | 3 | X/3 | [RX] | [description] | | Caption | 3 | X/3 | [PX] | [description] | |-----------------|-------|-------|------------|--------| | **Total** | **20**| **X/20** | | | Verdict: PASS (>= 18) / REVISE (< 18) [If REVISE:] Revision actions: 1. [Fix for failed item 1] 2. [Fix for failed item 2] ... ``` --- ## Revision Protocol ### Round Management ``` Round 1: Initial evaluation ↓ REVISE → apply fixes Round 2: Re-evaluate ALL items (not just failed ones) ↓ REVISE → apply fixes Round 3: Final evaluation ↓ REVISE → output with [!] Quality warning ↓ PASS → proceed to Step 6 ``` ### Rules 1. Each round evaluates ALL 20 items fresh (fixes can introduce new issues) 2. Revision must address every FAIL item — no "defer to next round" 3. After round 3, if still < 18/20: - Output the figure with explicit warning - List all remaining FAIL items - Suggest user manual review 4. A PASS in any round immediately proceeds to Step 6 (no unnecessary iterations) ### Quality Warning Format ``` [!] Quality warning: Figure output with 2 unresolved issues after 3 revision rounds. Remaining issues: - [S2] Colorblind safety: series 3 and 5 may be confusing under deuteranopia - [C4] Minor legend overlap with rightmost data point Recommendation: Manual review before submission. ``` --- *Cross-reference: SKILL.md §Step 5, §NEVER Rule #7*