问题域/PD-505

LLM 主观评审系统

LLM Subjective Review System

基于 LLM 的盲审流水线,分批评估代码设计质量并合并多轮评审结果

子问题

1.Review packet generation with file context

2.Batch splitting and parallel subagent execution

3.Evidence-weighted score merging across batches

4.Finding deduplication and concept matching

5.Dimension-specific prompt contract injection for review consistency

6.Score-independent evidence weighting prevents score-aware bias in merge

7.Dynamic score cap based on finding severity prevents high-score-with-findings contradiction

各项目的解法1 solutions

Signals

横向对比

维度Desloppify
检查方式LLM subagent 盲审 + 9 类主题批次并行
评估维度JSON 驱动可配置维度(abstraction_fitness, design_coherence 等 15+ 维度)
评估粒度维度级评分 0-100 + 证据加权合并 + finding-pressure 下压
迭代机制auto-resolve 旧 findings + 增量维度补充 rerun
盲审隔离blind packet 剥离已有评分,消除 LLM 锚定效应
合并算法70% 加权均值 + 30% 批次最低分 + severity-based pressure penalty + dynamic score cap

最佳实践

1.Fail-closed import validation prevents low-quality reviews

2.Blind review isolation prevents score-aware bias

3.70/30 weighted-mean/floor blend prevents single outlier batch from dominating

4.Positive observation filter rejects non-defect findings at import time

5.Prompt contract ensures score-finding consistency across all subagents