P
ProveIQ
Primary PMF Gate · Constitution §6.1

How often does our AI agree with humans?

Every employer who views a ProveIQ-evaluated submission is asked to independently rate the same work — before they see Claude's score. We track the delta, publish the rate, and don't hide bad days. Our target is ≥80% sustained. This is the single number that matters in the first 90 days.

Agreement (demo)
AI-employer agreement
Real submissions: 0 · demo shown while pilot cohort onboards
30-day rolling
By domain

How we measure it

  1. 1
    Candidate submits work. The submission enters our pipeline and is scored by our frontier evaluation layer against the employer's rubric.
  2. 2
    Employer is prompted to rate independently. Before the AI score is revealed, the employer scores the submission themselves on the same 0-100 scale.
  3. 3
    We compute the delta. Agreement = |AI score − employer score| ≤ 1.0 points. Both scores are stored in EmployerScoreRating.
  4. 4
    We publish the rate. Rolling 90-day window. Per-domain breakdown available to employers in /admin/agreement-rate. Public aggregate at /api/public/agreement-rate.

What the thresholds mean

≥80%

PMF gate met. AI scoring is trusted by employers. Marketing spend unfrozen.

75–80%

Warning band. Per-domain investigation. No marketing scale-up.

<75%

Alert. Domain flagged for AI retraining. Marketing spend reassessed.

Why publish this?

The only durable proof that AI evaluation works is showing the delta between AI and the humans who also graded the same work. Every other metric — signups, NPS, headline scores — is downstream of this one.

We publish the rate even on bad days. Constitution §6.2 mandates it. If the rate drops, we tell you, we investigate, and we publish the fix.