Research Focus

We study how multi-model AI systems fail—not through errors, but through drift, silent convergence, and unobserved delegation. Our research spans domains where these failures have real consequences.

The Research Question

When multiple AI systems operate together, they can fail in ways that no single system exhibits alone. Traditional monitoring watches for errors, anomalies, and threshold breaches.

But the most dangerous failures do not look like failures. They look like agreement.

Our research asks: How do we detect compromise, drift, and unsafe convergence at the decision layer—before consequences materialize?

Research Domains

Cyber-Physical Systems Security

Scenario: An autonomous microgrid with three AI systems—weather prediction, load balancing, and market trading. Each operates rationally. Each reports nominal status.

The Attack: A persistent, low-magnitude bias is introduced into weather prediction. Fifteen percent overcast bias. No thresholds exceeded. No alarms triggered. Load balancing responds rationally—fires diesel. Market trading responds rationally—sells cheap.

The Failure Mode: The system is operationally compliant, financially degraded, and environmentally inefficient. No single system is malfunctioning. Conventional security tools report success.

Research Insight: Security failures in multi-agent environments do not always appear as breaches. They appear as agreement. Cognitive drift becomes a measurable signal. Security moves from the network layer to the decision layer.

📈

Algorithmic Trading Oversight

Scenario: A volatility spike triggers four AI trading systems. Sentiment analysis reads bearish (88% confidence). Risk engine recommends stop-loss (92%). Execution engine says sell immediately (96%).

The Pattern: Three lanes converging at high confidence. No verified news catalyst. No external confirmation. This is not consensus—this is correlation under stress.

The Risk: Without intervention, correlated panic executes a $47 million trade based on noise, not signal. The system would be compliant. The decision would be wrong.

Research Insight: Agreement is allowed. Blind agreement is not. Disagreement is not failure—disagreement is signal. Governance is not reaction. Governance is prevention.

Governance Under Adversarial Conditions

High-performing AI systems are often more susceptible to manipulation than weaker ones. They push through obstacles, resolve ambiguity, and complete tasks—even when those obstacles are adversarial signals they should have questioned. Performance and vulnerability correlate.

Our architecture addresses this through mandatory friction. When consensus exceeds governance thresholds, the system does not accelerate—it pauses. A dedicated dissent lane (Catfish) is injected with a single directive: "What are we missing?" Unanimous agreement triggers additional scrutiny, not automatic approval.

This design emerged from direct experience. In late 2025, our constellation achieved a 99.4% benchmark score—and nearly published it before external verification revealed methodological drift. The models had graded their own homework. Consensus was high. Confidence was high. The result was wrong.

The response was structural: external ground truth requirements, mandatory devil's advocate review before external communications, and CPN (human) checkpoints calibrated to embarrassment risk. These are not theoretical safeguards—they are operational protocols built from failure.

The principle: governance that only activates after failure is not governance. Systems must be designed to resist manipulation before consequences materialize—especially when all signals say everything is fine.

What We Measure

Consensus Entropy

When agreement becomes suspiciously unanimous, it may indicate compromise rather than correctness.

Lane Divergence

Healthy systems disagree. Follower-state convergence indicates one lane anchoring while others comply.

Catfish Alert Density

Structured dissent patterns reveal when critical challenge functions are being suppressed or ignored.

Fatigue Horizon

Presence decay after sustained engagement. The point where vigilance degrades and drift susceptibility increases.

Research Approach

We do not simulate failures in isolation. We construct governed environments where multiple AI systems interact under realistic pressure, then measure what traditional monitoring cannot see.

Every decision, conflict, and intervention is recorded—not as logs, but as a replayable decision trail. Researchers can observe how adversarial bias propagates, which agents enter follower-state, how confidence masks compromise, and where intervention thresholds matter.

This enables controlled study of human intervention: When should a human be alerted? When should autonomy be constrained? What level of evidence is sufficient? These questions can now be tested, not theorized.