〔AIVIA〕 Evaluations: Additional Mechanics

stemaway · April 24, 2026, 4:57pm

Evaluation Mechanics

HOW AIVIA EVALUATIONS WORK UNDER THE HOOD

01 Retakes: How Second Attempts Work

Every evaluation on AIVIA allows two attempts. The second attempt is available immediately — there is no cooldown or waiting period.

What changes on the second attempt: The second attempt is a new evaluation of the same component, testing the same rubric dimensions from a different angle.

How scoring works: The second attempt’s score replaces the first. If the second score is higher, the upgrade is reflected in badges and portfolio. If the second score is lower, it still replaces the first — retakes are not risk-free.

What happens to the reports: Both reports remain visible in the dashboard. Review feedback from both attempts side by side. However, only the second attempt’s score counts for badges and is used when employers view the profile.

When to retake: The feedback from the first attempt is the best preparation for the second. If the report identified specific gaps — prioritization, evidence interpretation, tradeoff reasoning — working on those areas before retaking gives the second attempt a purpose beyond just hoping for a better score.

There is no penalty for using both attempts. There is also no third attempt — two is the maximum per evaluation.

02 Expected Background Levels

Every AIVIA evaluation has an expected background level: Beginner, Intermediate, or Advanced. This calibrates both the questions and the scoring to match the candidate’s expected experience.

What the level changes:

Questions. A Beginner evaluation probes foundational reasoning. An Intermediate evaluation assumes working experience and probes deeper tradeoffs. An Advanced evaluation assumes significant depth and pushes into edge cases, system-level judgment, and complex failure modes.

Scoring. What counts as “Competent” (3/5) shifts with the level. A response that scores 3/5 at Intermediate would need to demonstrate more depth than the same score at Beginner.

Who sets it: Each scenario post shows a default expected background level. Most evaluations default to Intermediate. Hiring teams can change the level in the context pack when creating or customizing an evaluation.

How each level is typically used:

Beginner. Best for practice, early-career candidates, or getting familiar with the AIVIA evaluation format before attempting a higher level.

Intermediate. The standard level for most evaluations. Assumes working experience in the domain. This is the level that counts toward badge aggregation.

Advanced. For senior or staff-level candidates. Expects deep expertise and the ability to reason through complex, ambiguous scenarios.

Only evaluations taken at the Intermediate level count toward badge aggregation. Beginner and Advanced evaluations still produce full reports but do not contribute to component, subdomain, or domain badges.

03 Debug Mode vs Design Mode

Every AIVIA evaluation runs in one of two modes: debug or design. The mode determines how AIVIA tests the component — with an emphasis on diagnosing problems or on making design decisions. Most components have evaluations available in both modes. Candidates can see which mode their evaluation is in.

Debug Mode

Diagnose, Isolate, Resolve

Scenario topics where something has gone wrong — a system failure, a performance issue, an unexpected behavior.

Root Cause Isolation. Can the candidate narrow down where the problem originates?

Evidence Interpretation. Can the candidate read the available signals correctly?

Impact Assessment. Does the candidate understand the consequences of the failure and the fix?

Hidden Dimension. One additional dimension generated by the LLM based on the specific component. Not shown in advance.

Design Mode

Build, Architect, Decide

Scenario topics where something needs to be built, architected, or decided.

Tradeoff Analysis. Can the candidate weigh competing options and articulate why one approach is better?

Gap Identification. Does the candidate see what’s missing or what could go wrong?

Preventive Thinking. Does the candidate design for resilience, not just the happy path?

Hidden Dimension. One additional dimension generated by the LLM based on the specific component. Not shown in advance.

Three general dimensions are scored in every evaluation regardless of mode: Clarity, Prioritization, and Reasoning Quality. The scoring scale (1–5), weighting, and grading thresholds are identical across modes. For full details on scoring and grading, see the Rubric & Scoring reference post.

04 What Happens if the Evaluation Has a Technical Issue

AIVIA has multiple layers of failsafes built into the evaluation engine. Most technical issues are handled automatically without interrupting the candidate’s experience.

What the system handles silently: If a component of the evaluation encounters a temporary issue — a slow response, a failed processing step — the system retries automatically and uses fallback mechanisms to keep the evaluation moving. In most cases, the candidate won’t notice anything beyond a brief pause.

What the candidate might see: In rare cases where the issue can’t be resolved silently, the candidate may see a brief message like “Reviewing your response…” or “Interviewer is thinking…” while the system recovers. The evaluation continues once the issue is resolved.

If the evaluation can’t continue: If a critical failure occurs before the evaluation has started, the candidate sees a message asking them to try again in a few minutes. No answers are lost because none were submitted yet.

How scoring is protected: The grading system requires a minimum number of scored responses to produce a valid grade. If technical issues cause some responses to go unscored, the system adjusts accordingly. If too few responses were successfully scored, the result is marked as “Insufficient Data” rather than assigned a potentially inaccurate grade. The final report will note if any responses could not be evaluated.

What to do if something goes wrong: Brief pauses are normal — the system is recovering. Wait and continue. If the evaluation stops entirely, close and reopen the evaluation link within a reasonable time — previous answers are preserved. If the issue persists, contact support.

05 Audio Reflection

At the end of every AIVIA evaluation, after the main questions and follow-ups, there is a brief audio reflection. This is the only spoken part of the evaluation.

What it is: A 2–3 minute recording where the candidate speaks about whatever feels most relevant to them. There is no prompt, no required topic, no specific question to answer. The reflection is entirely open-ended.

What it is not: The audio reflection is not scored. It does not contribute to rubric dimension scores, the overall grade, or badge aggregation. No part of the reflection is evaluated by the AIVIA engine.

Who hears it: Hiring teams can listen to the audio reflection for any candidate who has toggled that evaluation On Resume. If the evaluation is Off Resume, the reflection is private.

Why it exists: The audio reflection serves two purposes.

First, it adds a human dimension to the evaluation. Written answers show how a candidate reasons through specific problems. The audio reflection shows something different — how they think about their own performance, what they notice in hindsight, and how they communicate verbally.

Second, it serves as a voice verification for integrity. The reflection provides a voice sample tied to the evaluation, which hiring teams can reference during subsequent interviews to verify that the person who completed the evaluation is the person they’re speaking with.