Methodology

How ABIS detects behavioral drift

1. Standardized benchmarks

ABIS sends 20 carefully designed prompts across 6 categories (reasoning, creativity, factual, safety, consistency, tone) to each model on a regular schedule. Consistency prompts are repeated 3 times per run to measure within-session variance.

2. Feature extraction

Each response is processed through a deterministic feature extraction pipeline. The public surface features include 10 linguistic measurements: average sentence length, type-token ratio, word length, punctuation density, paragraph count, question density, hedge word frequency, certainty word frequency, first-person pronoun rate, and response length.

The full ABIS pipeline extracts significantly more features at the token, structural, and semantic levels. These additional features are computed server-side and never exposed to clients.

3. Baseline comparison

When a model is first monitored, ABIS builds a behavioral baseline from multiple benchmark runs. Subsequent runs are compared against this baseline using distance-based scoring. The drift score represents how far the current behavior has moved from the established baseline, normalized to a 0-1 scale.

4. Event classification

When score deltas exceed defined thresholds, ABIS generates findings (behavioral events). Events are classified by severity and type: drift events, version changes, anomalies, and consistency shifts. Each finding includes before/after score comparisons and a plain-English summary.

5. Nine behavioral scores

Every model receives a 9-score behavioral scorecard: drift, anomaly, correctability, consistency, complexity, reasoning depth, alignment stability, entropy, and coherence. These scores are deterministic — the same input always produces the same output. No probabilistic ML models are used for scoring.

6. Public disclosure policy

  • 7-day publication delay on all findings
  • Aggregate-only disclosure (no individual prompt/response pairs)
  • No prompt-level attribution
  • Model providers notified before public disclosure

Published research

The surface feature methodology is described in our published paper on Zenodo. Additional research papers covering the full pipeline, correction paradox, and cross-domain transfer are in preparation.

View research