Architecture

Internal architecture of Qualys MCP Server v0.1.6 — async tools, layered workflow design, KB semaphore, caching, concurrency, and request deduplication.

System Overview

Qualys MCP v0.1.6 uses a four-layer architecture: 7 async MCP tool wrappers dispatch to 5 workflow modules via asyncio.to_thread, which orchestrate 42 aggregator functions, which call the Qualys APIs through a shared HTTP + caching layer.

AI Assistant (Claude, etc.)
    |
    v  MCP tool call (one of 7 async tools)
FastMCP Server (qualys_mcp.py)
    |
    +-- 7 async @mcp.tool() wrappers (asyncio.to_thread)
    |     investigate, assess_risk, check_compliance,
    |     plan_remediation, security_overview, reports, cache_status
    |
    +-- qualys/workflows/ (5 workflow modules)
    |     investigate.py, assess_risk.py, compliance.py,
    |     remediation.py, overview.py
    |
    +-- qualys/aggregators.py (42 aggregator functions)
    |     Each aggregator wraps one or more Qualys API calls
    |     New: TotalAI, Policy Audit, SaaSDR, OCI aggregators
    |
    +-- qualys/api.py (HTTP + caching)
          _run_concurrent() -- ThreadPoolExecutor(max_workers=8)
          _get_or_fetch() -- request deduplication
          KB semaphore -- prevents 409 conflicts
          Tiered in-memory caches

Workflow Layer

Each workflow module receives parameters from its MCP tool wrapper and orchestrates multiple aggregator calls in parallel. The workflow handles:

Intent classification — determines which aggregators to call based on parameters (e.g., target="CVE-2024-3400" triggers CVE investigation aggregators)
Parallel dispatch — fires multiple aggregators concurrently via _run_concurrent()
Cross-source correlation — identifies connections across data sources (e.g., a CVE affecting assets that also have cloud misconfigurations)
Response envelope assembly — structures results into summary, data, correlations, and actions sections

Workflow Modules

Module	MCP Tool	Scope
`workflows/investigate.py`	`investigate`	CVE deep-dive, threat actors, TotalAI detections, asset investigation, EDR/FIM events, KB search
`workflows/assess_risk.py`	`assess_risk`	VMs, cloud (AWS/Azure/GCP/OCI), containers, web apps, certificates, SaaS DR, assets
`workflows/compliance.py`	`check_compliance`	Framework posture, failing controls, policy audit library (1,247 policies), risk acceptances
`workflows/remediation.py`	`plan_remediation`	Patch priorities, deployment status, mitigation coverage
`workflows/overview.py`	`security_overview`	Daily/weekly/monthly briefing, scanner health, ETM findings

Aggregator Layer

The 42 aggregator functions in qualys/aggregators.py are the building blocks of every workflow. Each aggregator wraps one or more Qualys API calls, normalizes the response data, and returns a structured dict. Aggregators are stateless and composable — any workflow can call any aggregator. New aggregators added in v0.1.x include TotalAI model detections (/tai/api/1.0/), Policy Audit library (/pcas/v1/library/), SaaS Detection & Response controls (/sdr/api/controls/), and OCI cloud resources via TotalCloud v2.

Cache Architecture

Tiered in-memory cache with TTLs matched to data volatility:

Cache	TTL	Key Strategy
Bearer token	3.5 hours	Single global token
KB entries	1 hour	Per-QID
VMDR detections	5 minutes	Per severity_days_qds_min
QDS scores	5 minutes	Per-QID
WAS findings	10 minutes	Per query params
Scanner list	5 minutes	Single global list
ETM results	1 hour	Single global (unfiltered)

Cache warmup runs on startup via a background thread (_warmup_vmdr_cache()) so the first real query hits warm caches.

Request Deduplication

The _get_or_fetch() helper uses per-key locks to ensure that when multiple aggregators request the same data concurrently, only one thread makes the API call while others wait for the result:

def _get_or_fetch(cache_key, fetch_fn, ttl=300):
    # Fast path -- cache hit
    hit, data = _cache_get(cache_key)
    if hit:
        return data

    # Slow path -- per-key lock, double-check
    with FETCH_LOCKS[cache_key]:
        hit, data = _cache_get(cache_key)
        if hit:
            return data
        data = fetch_fn()
        _cache_set(cache_key, data, ttl)
        return data

Concurrency Model

MCP tool handlers are async functions that use asyncio.to_thread to dispatch blocking workflow calls, preventing event loop blocking. Within each workflow, ThreadPoolExecutor(max_workers=8) via _run_concurrent() parallelizes independent aggregator calls. Cloud provider fetches (AWS, Azure, GCP, OCI) run in parallel rather than sequentially. A KB semaphore prevents 409 conflicts from concurrent KnowledgeBase requests.

Performance Budget

Target: all tool responses under 15 seconds (89K asset environment). Measured performance:

Tool	Typical Latency	Notes
`security_overview` (quick)	~1.7s	CSAM-heavy, cached
`assess_risk` (cloud)	~1.3s	Parallel cloud providers
`assess_risk` (containers)	~3.1s	Container image scan
`check_compliance` (cached)	<1ms	Cached compliance data
`plan_remediation` (patches)	~2.6s	PM + CSAM parallel
`assess_risk` (all)	~4.9s	All domains in parallel
`investigate` (CVE)	~33s	KB + CSAM + threat intel, no longer times out

← Previous Examples Next → Performance