Agent Communication Protocol¶

Vectra QA uses an Agent-to-Agent (A2A) communication protocol based on the Obsidian Vault. Unlike traditional message queues or HTTP APIs, agents communicate by reading and writing Markdown files.

Why File-Based Communication?¶

1. Decoupling¶

Agents don't know about each other. They only know about the vault: - No direct dependencies — Agent A doesn't import Agent B - No network failures — Communication is local filesystem I/O - No service discovery — Agents don't need to find each other

2. Persistence¶

Every message is automatically saved: - Crash recovery — If an agent crashes, its state is in the vault - Audit trail — Complete history of all communication - Replay capability — Re-run tests from saved state

3. Observability¶

Humans can read agent conversations:

cat obsidian_vault/Runs/Test_20260115.md

Communication Patterns¶

Pattern 1: Status Updates¶

Agent writes progress to its own node:

# Worker process
vault.update_frontmatter(memory_node, {
    "status": "active",
    "progress_percent": 50,
    "last_action": "Checking navigation links"
})

Command Center reads and broadcasts:

# Dashboard reads via obsidian_reader
agent = reader.get_active_agents()[0]
# Returns: {"progress_percent": 50, "last_action": "..."}

Pattern 2: Findings Reports¶

Agent appends findings to node content:

findings = """
## [10:30:15] Navigation Check Complete

- **Links found**: 15
- **Broken links**: 2
- **Details**:
  - /about — 404 Not Found
  - /contact — 500 Server Error
"""

node = vault.read_node(memory_node)
new_content = node["content"] + findings
vault.write_node(memory_node, new_content, node["frontmatter"])

Pattern 3: Inter-Agent Coordination¶

Agents reference each other's work:

# Agent B's Report

## Cross-Reference
The UI Explorer found 2 broken links.
See: [[Navigation_Test_20260115]]

## API Validation
I verified the /api/contact endpoint:
- Returns 500 when the form is submitted
- This confirms the UI Explorer's finding

Message Format¶

Standard Message Structure¶

## [ISO_TIMESTAMP] ACTION_NAME

### Context
Relevant background information

### Data
- **Key**: Value
- **Key**: Value

### Decisions
What the agent decided to do

### Next Steps
What should happen next

Example: Test Completion¶

## [2026-01-15T10:35:00Z] Test Complete

### Summary
All homepage checks passed successfully.

### Metrics
| Metric | Value |
|--------|-------|
| Page Load | 1.2s |
| Links | 15 |
| Errors | 0 |

### Screenshots
- [[Screenshots/homepage_20260115.png]]

### Recommendations
1. Consider adding meta description
2. Optimize image sizes

Synchronization¶

The Vault Watcher¶

class VaultWatcher(FileSystemEventHandler):
    def on_modified(self, event):
        if event.src_path.endswith('.md'):
            rel_path = os.path.relpath(event.src_path, VAULT_PATH)
            self._notify_change(rel_path)

Notification Chain¶

File Modified
    ↓
VaultWatcher detects change
    ↓
ObsidianReader parses file
    ↓
Command Center receives update
    ↓
SSE broadcasts to browser
    ↓
Dashboard UI updates

Latency¶

Operation	Latency
File write	~1ms
Watchdog detection	~10ms
Parse and broadcast	~50ms
SSE to browser	~100ms
Total	~160ms

State Machine¶

Agent Lifecycle States¶

stateDiagram-v2
    [*] --> Spawned: spawn_agent()

    Spawned --> Active: Process started
    Spawned --> Failed: Spawn error

    Active --> Running: Browser connected
    Active --> Failed: Connection error

    Running --> Running: Progress update
    Running --> Completed: All tests passed
    Running --> Failed: Test failure
    Running --> Failed: Exception

    Completed --> [*]: Auto-terminate
    Failed --> [*]: Auto-terminate

State Transitions¶

From	To	Trigger	Vault Update
None	Spawned	`spawn_agent()`	Create node
Spawned	Active	Process starts	`status=active`
Active	Running	Browser connects	`status=running`
Running	Running	Progress	`progress_percent`
Running	Completed	Tests done	`status=completed, result=pass`
Running	Failed	Error	`status=failed, error=...`

Error Handling¶

Agent Crash Recovery¶

If an agent process crashes:

Process monitor detects exit

MCP Server updates node status:

status: failed
error: "Process exited with code 1"

Command Center broadcasts failure
Dashboard shows red status badge

Network Timeouts¶

If browser automation times out:

try:
    await browser.visit(url, timeout=30)
except TimeoutError:
    vault.update_frontmatter(memory_node, {
        "status": "failed",
        "error": "Page load timeout after 30s"
    })

Partial Failures¶

Some tests pass, some fail:

---
status: completed
result: warning
sections_passed: 5
sections_failed: 2
---

Best Practices¶

1. Atomic Updates¶

Update frontmatter in one operation:

# Good
vault.update_frontmatter(node, {
    "status": "completed",
    "result": "pass",
    "progress_percent": 100
})

# Bad (multiple writes)
vault.update_frontmatter(node, {"status": "completed"})
vault.update_frontmatter(node, {"result": "pass"})

2. Idempotent Messages¶

Agents should handle duplicate reads:

# Check if already processed
if node["frontmatter"].get("status") == "completed":
    return  # Skip

3. Graceful Degradation¶

If vault is unavailable:

try:
    vault.update_frontmatter(node, updates)
except FileNotFoundError:
    # Log to stdout as fallback
    print(f"[ERROR] Could not write to {node}")

4. Content Size Limits¶

Keep messages concise: - Frontmatter: < 1KB (structured data only) - Content: < 100KB (use screenshots for visual data) - Screenshots: Store paths, not base64

Comparison with Traditional Protocols¶

Feature	Vault A2A	HTTP REST	gRPC	Message Queue
Coupling	Loose	Tight	Tight	Loose
Persistence	Built-in	Manual	Manual	Configurable
Observability	Human-readable	Binary	Binary	Protocol-dependent
Ordering	Timestamp	Request/response	Stream	FIFO
Retries	Manual	HTTP codes	Built-in	Built-in
Overhead	None	HTTP headers	Protobuf	Queue overhead

Use Case Fit: For testing frameworks where observability and simplicity matter more than raw performance.