Policy Language¶
Arbiter uses a TOML-based policy language for authorization rules. If you haven’t written a policy yet, start with Your First Policy. This guide covers the full language.
Policy Structure¶
Each policy is a [[policies]] entry in a TOML file:
[[policies]]
id = "allow-read-basic"
effect = "allow"
allowed_tools = ["read_file", "list_dir"]
[policies.agent_match]
trust_level = "basic"
[policies.intent_match]
keywords = ["read", "analyze"]
Top-Level Fields¶
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
yes |
Unique identifier. Shows up in audit logs. |
|
string |
yes |
|
|
string[] |
no |
Tools this policy covers. Empty = all tools. |
|
integer |
no |
Manual specificity override (0 = auto) |
Agent Matching¶
[policies.agent_match]
agent_id = "550e8400-..." # exact agent (most specific)
trust_level = "basic" # minimum trust: untrusted, basic, verified, trusted
capabilities = ["read", "write"] # all must be present on the agent
Trust levels form an ordered hierarchy. A policy requiring basic trust matches agents at basic, verified, or trusted.
Principal Matching¶
[policies.principal_match]
sub = "user:alice" # OAuth subject
groups = ["ops-team", "admins"] # match any of these groups
Intent Matching¶
[policies.intent_match]
keywords = ["read", "analyze"] # all must appear in declared intent (case-insensitive)
regex = "^(read|analyze)\\b" # regex the intent must match
Keywords and regex can be combined. When both are present, both must match.
Parameter Constraints¶
[[policies.parameter_constraints]]
key = "max_tokens"
max_value = 1000.0
[[policies.parameter_constraints]]
key = "temperature"
min_value = 0.0
max_value = 1.0
[[policies.parameter_constraints]]
key = "model"
allowed_values = ["gpt-4", "claude-sonnet-4-20250514"]
Parameter constraints are checked against the tool call’s arguments. Numeric bounds and string allowlists are both supported.
Effects¶
Effect |
What Happens |
|---|---|
|
Request proceeds to the upstream MCP server |
|
Request is rejected with HTTP 403 |
|
Request is flagged for human-in-the-loop approval |
Specificity Ordering¶
When multiple policies match the same request, the most specific one wins. Arbiter computes specificity automatically:
Criterion |
Score |
|---|---|
|
+100 |
|
+50 |
Per capability |
+25 each |
|
+40 |
Per group |
+20 each |
|
+30 |
Per keyword |
+10 each |
The policy with the highest total score wins. You can override this with a manual priority field, but auto-computed specificity handles most cases well.
Why this matters: you can have a broad Deny policy blocking admin_users for everyone (score: 0) and a targeted Allow policy granting admin_users to a specific agent ID (score: 100). The targeted policy wins because it’s more precise about who it applies to.
Real-World Examples¶
Read-Only Access for Low-Trust Agents¶
[[policies]]
id = "allow-read-basic"
effect = "allow"
allowed_tools = ["read_file", "list_dir", "search", "get_metadata"]
[policies.agent_match]
trust_level = "basic"
[policies.intent_match]
keywords = ["read"]
Block Destructive Tools Globally¶
[[policies]]
id = "deny-destructive"
effect = "deny"
allowed_tools = ["drop_database", "delete_all", "truncate_table", "rm_rf"]
No match criteria, so this applies to everyone. Even if another policy allows one of these tools, you’d need a more specific Allow (like matching an agent_id) to override this.
Escalate Write Operations for Readers¶
Agents that declared a read intent but try to use write tools trigger human review:
[[policies]]
id = "escalate-write-for-readers"
effect = "escalate"
allowed_tools = ["write_file", "create_file", "update_record"]
[policies.agent_match]
trust_level = "basic"
[policies.intent_match]
keywords = ["read"]
Ops Team Gets Deployment Tools¶
[[policies]]
id = "allow-deploy-ops"
effect = "allow"
allowed_tools = ["deploy", "rollback", "scale"]
[policies.principal_match]
groups = ["ops-team"]
Surgical Override for a Specific Agent¶
[[policies]]
id = "allow-admin-for-ops-bot"
effect = "allow"
allowed_tools = ["admin_users", "configure_system"]
[policies.agent_match]
agent_id = "550e8400-e29b-41d4-a716-446655440000"
With agent_id scoring +100, this beats any broader deny.
Token Generation with Guardrails¶
[[policies]]
id = "allow-generate-limited"
effect = "allow"
allowed_tools = ["generate"]
[[policies.parameter_constraints]]
key = "max_tokens"
max_value = 1000.0
[[policies.parameter_constraints]]
key = "temperature"
min_value = 0.0
max_value = 1.0
Loading and Managing Policies¶
From a File¶
[policy]
file = "policies.toml"
Hot Reload¶
Enable file watching to pick up changes without restarting:
[policy]
file = "policies.toml"
watch = true
watch_debounce_ms = 500
Changes are parsed, validated, and atomically swapped. In-flight requests finish under the old policy set.
Runtime Management¶
The admin API provides three policy endpoints:
POST /policy/explain: dry-run a request against loaded policiesPOST /policy/validate: check policy TOML for syntax errorsPOST /policy/reload: force an immediate reload from disk
See Admin API Reference for full request/response schemas.
Best Practices¶
Start with deny-all, add specific allows. That’s the default behavior. Work with it rather than against it.
Use intent keywords. They constrain what agents can do based on their declared purpose, not just their identity. An agent with admin capabilities but a read intent still gets limited to read tools.
Set escalate for high-risk operations. Instead of a hard deny, let a human review the request. This is especially useful during early rollout when you’re still learning your agents’ behavior patterns.
Use agent_id for surgical overrides. The specificity system ensures agent-specific policies always win over broad rules.
Audit first, restrict later. Start with broad allows and review the audit log to understand actual usage patterns. Then tighten policies based on what you see, not what you guess.
Next Steps¶
Configuration Reference: full
[policy]configuration referenceAdmin API Reference: policy management API endpoints
Session Management: how sessions interact with policies