Attack Scenario Library

Arbiter ships with 10 self-contained attack demonstrations in the demos/ directory. Each demo runs a specific attack, shows Arbiter blocking it, and then runs a legitimate request for contrast. These are real enforcement scenarios, not mocks or simulations.

Scope of these demos. These scenarios test policy enforcement — an agent hitting boundaries that an operator configured. They demonstrate that Arbiter correctly blocks unauthorized tool calls, enforces budgets, detects drift, and scrubs credentials. They do not demonstrate defense against an adversary who controls the agent’s reasoning (e.g., via prompt injection) and uses legitimate, whitelisted tools to carry out malicious intent. That threat class is outside Arbiter’s enforcement surface; see Security Model for the full boundary analysis.

Running the Demos

Each demo has its own directory with a script and configuration:

$ cd demos/01-unauthenticated-access
$ ./demo.sh

Demos start their own Arbiter instance, run the attack, show the result, and clean up.


Demo 01: Unauthenticated Access

Attack: Send an MCP tool call without a session header.

Defense: require_session = true (default). Requests without x-arbiter-session are rejected with 403 before any middleware runs.

What you see:

Attack:  POST /  (no session header)  → 403 Forbidden
Legit:   POST /  (with session header) → 200 OK

Demo 02: Protocol Injection

Attack: Send a non-JSON-RPC POST body through the proxy.

Defense: strict_mcp = true (default). Non-MCP POST traffic is rejected, preventing protocol smuggling where an attacker bypasses the MCP parser entirely.


Demo 03: Tool Escalation

Attack: Call a tool that isn’t on the session’s whitelist.

Defense: The session middleware checks every tool call against authorized_tools. Tools not on the list get 403 regardless of policy.


Demo 04: Resource Exhaustion

Attack: Exceed the session’s call budget, then hit the rate limit.

Defense: Call budget enforcement (429 when calls_made >= call_budget) and per-minute rate limiting. The session tracks both counters and rejects excess calls.


Demo 05: Session Replay

Attack: Reuse an expired session ID.

Defense: The session middleware checks status on every request. Expired sessions return 408 Gone. Time limits aren’t advisory. They’re hard-enforced.


Demo 06: Zero-Trust Policy

Attack: Make a tool call with no matching Allow policy.

Defense: Deny-by-default. If no policy explicitly allows the tool for this agent/intent combination, the request is denied. This demo runs with an empty policy file to show the baseline behavior.


Demo 07: Parameter Tampering

Attack: Call a tool with argument values that violate parameter constraints.

Defense: Policy parameter constraints enforce numeric bounds (max_value, min_value) and string allowlists (allowed_values). An agent requesting max_tokens: 10000 when the policy caps at 1000 gets denied.


Demo 08: Intent Drift

Attack: Declare a read intent, then call write tools.

Defense: The behavioral anomaly detector classifies the intent as “read” and the tool call as “write.” The mismatch fires an anomaly. With escalate_anomalies = true, the request is blocked.


Demo 09: Session Multiplication

Attack: Open many concurrent sessions to multiply the effective call budget.

Defense: max_concurrent_sessions_per_agent (default 10) caps how many active sessions one agent can hold. Session creation beyond the cap returns 429 TooManySessions.


Demo 10: Credential Leakage via Response

Attack: An upstream MCP server returns a response containing credentials that Arbiter injected into the outgoing request.

Defense: When credential injection is active, Arbiter scrubs responses for the exact secrets it injected, across multiple encodings (plaintext, URL-encoded, JSON-escaped, hex, base64). Matches are replaced with [CREDENTIAL] before the response reaches the agent.

Scope: This is closed-scope scrubbing: it catches secrets Arbiter knows about because it injected them. It does not perform general PII detection or prompt injection scanning on arbitrary response content.


What These Demos Cover

#

Threat

Stage That Blocks It

01

Unauthenticated access

Session middleware

02

Protocol smuggling

MCP parser (strict mode)

03

Tool escalation

Session tool whitelist

04

Resource exhaustion

Session budget + rate limiter

05

Session replay

Session expiry check

06

Missing authorization

Policy engine (deny-by-default)

07

Parameter tampering

Policy parameter constraints

08

Intent drift

Behavioral anomaly detector

09

Session multiplication

Per-agent session cap

10

Credential leakage

Credential response scrubbing

Together, these demonstrate defense-in-depth: even if one layer is bypassed (stolen session ID, for instance), other layers catch the attack (policy evaluation, behavioral detection, credential scrubbing).