content-forge/.claude/skills/paper-illustration/references/critic-checklist.md

7.8 KiB

Critic Checklist Reference

Loaded by Step 5 (Critic Evaluation). 5 dimensions, 20 items. Each scored PASS/FAIL. Threshold: 18/20 to pass. Maximum 3 revision rounds.

Table of Contents

  1. Scoring Rules
  2. Dimension 1: Clarity (5 items)
  3. Dimension 2: Accuracy (4 items)
  4. Dimension 3: Style (5 items)
  5. Dimension 4: Reproducibility (3 items)
  6. Dimension 5: Caption (3 items)
  7. Evaluation Output Template
  8. Revision Protocol

Scoring Rules

  • Each item: PASS (1 point) or FAIL (0 points)
  • Total: 20 points maximum
  • Pass threshold: 18/20 (90%)
  • Maximum revision rounds: 3
  • Each revision must address ALL flagged FAIL items
  • Items not applicable to the figure type: auto-PASS (e.g., error bars for conceptual figures)

Dimension 1: Clarity (5 items)

ID Check PASS Criteria Common FAIL
C1 Caption self-explanatory Reader understands figure without reading paper body Caption says only "Results" with no detail
C2 Labels readable All text >= minimum font size at print dimensions 6pt text on IEEE single-column figure
C3 Visual hierarchy Most important data visually prominent (size, color, position) All lines same weight, "Ours" indistinguishable
C4 No overlapping elements Labels, legends, data points, annotations do not occlude each other Legend sits on top of data region
C5 Adequate whitespace Margins around figure, between subfigures, around legends Elements crammed to edges, no breathing room

How to Fix Common FAIL

  • C1: Add "showing that [method] achieves [X]% improvement over [baseline]" to caption
  • C2: Increase font size or reduce figure complexity (fewer elements)
  • C3: Use thicker lines / brighter colors for "Ours", thinner / muted for baselines
  • C4: Relocate legend (outside plot area, or to whitespace region)
  • C5: Add plt.tight_layout(pad=1.5) or increase figure dimensions

Dimension 2: Accuracy (4 items)

ID Check PASS Criteria Common FAIL
A1 Data fidelity Plotted values exactly match source data (spot-check 3 points) Transposed rows/columns, wrong method assigned to line
A2 Encoding faithful Visual encoding matches data semantics (e.g., larger bar = larger value) Inverted y-axis without indicator, log scale unlabeled
A3 Axes honest No truncated axes without break markers; scale is fair Y-axis starts at 90% making 1% difference look huge
A4 Error bars correct Error bars represent stated metric (std, sem, CI) and are labeled Error bars present but unlabeled, or wrong metric

How to Fix Common FAIL

  • A1: Cross-reference code data arrays with source table, print values to verify
  • A2: Check axis direction, scale type (linear/log), and legend-to-data mapping
  • A3: Start y-axis at 0, or use axis break (//) marker if truncated
  • A4: Add "(mean +/- std, n=5)" to caption or legend

Dimension 3: Style (5 items)

ID Check PASS Criteria Common FAIL
S1 Palette matches venue Colors from venue-specific palette in academic-styles.md Using matplotlib default colors instead of venue palette
S2 Colorblind safe All series distinguishable under simulated deuteranopia Red and green lines only differ by hue
S3 Grayscale compatible Figure readable when desaturated (for B&W printing) Two series both map to medium gray
S4 Font compliant Font family and sizes match venue specification Sans-serif used for NeurIPS (should be serif)
S5 Size compliant Figure fits within venue column width at stated dimensions 8" wide figure for IEEE single-column (max 3.5")

How to Fix Common FAIL

  • S1: Replace colors with hex values from references/academic-styles.md
  • S2: Add secondary encoding (markers: o, s, ^, D, v) alongside color
  • S3: Add line styles (solid, dashed, dotted, dash-dot) as tertiary encoding
  • S4: Update matplotlib.rcParams['font.family']
  • S5: Adjust figsize to venue constraints

Dimension 4: Reproducibility (3 items)

ID Check PASS Criteria Common FAIL
R1 Code complete Code runs without modification (all data inline, no external files) References data.csv that doesn't exist
R2 Imports present All required imports listed at top of code Missing import seaborn as sns
R3 Output configured Explicit DPI, savefig with bbox_inches='tight', both PNG and PDF No savefig, or missing DPI, or only PNG

Scope: Only applies to code-based figures (data-plot, comparison-chart, result-visualization). Auto-PASS for AI-generated figures.

How to Fix Common FAIL

  • R1: Replace file reads with inline data arrays (np.array([...]))
  • R2: Run code mentally — every function call must have its import
  • R3: Add plt.savefig('fig.png', dpi=300, bbox_inches='tight') + PDF variant

Dimension 5: Caption (3 items)

ID Check PASS Criteria Common FAIL
P1 Content described Caption describes WHAT the figure shows (not just "Results") "Figure 1: Comparison." — no detail
P2 Key finding stated Caption highlights the main takeaway for the reader Describes setup but not conclusion
P3 Subfigures referenced Multi-panel figures: each panel described in caption (a, b, c) 4 panels but caption only mentions "left" and "right"

How to Fix Common FAIL

  • P1: Structure as "Figure N: [verb] [what]. [context]." e.g., "Figure 3: Comparison of training convergence across 5 methods on CIFAR-10."
  • P2: Add final sentence: "Method A converges 2x faster than the strongest baseline."
  • P3: Add "(a) Training loss. (b) Validation accuracy. (c) Inference latency."

Evaluation Output Template

## Critic Evaluation — Round N/3

| Dim.            | Items | Score | Failed IDs | Issues |
|-----------------|-------|-------|------------|--------|
| Clarity         | 5     | X/5   | [CX]       | [description] |
| Accuracy        | 4     | X/4   | [AX]       | [description] |
| Style           | 5     | X/5   | [SX]       | [description] |
| Reproducibility | 3     | X/3   | [RX]       | [description] |
| Caption         | 3     | X/3   | [PX]       | [description] |
|-----------------|-------|-------|------------|--------|
| **Total**       | **20**| **X/20** |         |        |

Verdict: PASS (>= 18) / REVISE (< 18)

[If REVISE:]
Revision actions:
1. [Fix for failed item 1]
2. [Fix for failed item 2]
...

Revision Protocol

Round Management

Round 1: Initial evaluation
  ↓ REVISE → apply fixes
Round 2: Re-evaluate ALL items (not just failed ones)
  ↓ REVISE → apply fixes
Round 3: Final evaluation
  ↓ REVISE → output with [!] Quality warning
  ↓ PASS → proceed to Step 6

Rules

  1. Each round evaluates ALL 20 items fresh (fixes can introduce new issues)
  2. Revision must address every FAIL item — no "defer to next round"
  3. After round 3, if still < 18/20:
    • Output the figure with explicit warning
    • List all remaining FAIL items
    • Suggest user manual review
  4. A PASS in any round immediately proceeds to Step 6 (no unnecessary iterations)

Quality Warning Format

[!] Quality warning: Figure output with 2 unresolved issues after 3 revision rounds.

Remaining issues:
- [S2] Colorblind safety: series 3 and 5 may be confusing under deuteranopia
- [C4] Minor legend overlap with rightmost data point

Recommendation: Manual review before submission.

Cross-reference: SKILL.md §Step 5, §NEVER Rule #7