Implementing Shadow AI Discovery to Find Unregistered Agents in the Microsoft Agent Governance Toolkit
The ShadowDiscoveryScanner provides an SDK-level engine that locates unregistered AI agents by analyzing process commands, filesystem artifacts, inline text, and GitHub repositories using regex-based discovery rules and confidence scoring.
The Microsoft Agent Governance Toolkit includes a lightweight discovery engine designed to identify shadow AI—agents running in your environment that lack proper registration or governance. The ShadowDiscoveryScanner implemented in Go offers four distinct scanning surfaces to detect unregistered agents wherever they hide, from running processes to buried configuration files.
How Shadow AI Discovery Works
The scanner operates as a stateless, rule-based engine that aggregates evidence across multiple data sources. According to the Microsoft Agent Governance Toolkit source code, the ShadowDiscoveryScanner in agent-governance-golang/packages/agentmesh/discovery.go exposes methods to scan text, processes, directories, and remote repositories.
Each scanning method returns DiscoveredAgent objects that aggregate multiple DiscoveryEvidence signals. The scanner supports four DetectionBasis types—DetectionProcess, DetectionConfigFile, DetectionGitHubRepo, and inline text—and tracks AgentStatus as registered, unregistered, shadow, or unknown.
Scanning Inline Text and Process Commands
The ScanText method walks line-by-line through arbitrary strings, matching against built-in discoveryRules regular expressions to identify API keys, framework imports, or agent signatures. For live process discovery, ScanProcessCommands feeds command-line strings into the same text scanner, while ScanProcesses handles deduplication using process IDs.
In discovery.go, the process scanner builds fingerprints from PID and command-line combinations to ensure the same running agent isn't counted twice:
// Scanning arbitrary text for agent signatures
textFindings := scanner.ScanText("sample.env",
"OPENAI_API_KEY=sk-test-not-real\nimport langchain\n")
// Scanning process command lines
cmds := []string{
"/usr/bin/python -m crewai.run",
"node /opt/agent/mcp-server.js",
}
procFindings := scanner.ScanProcessCommands(cmds)
Scanning Filesystem Configuration
The ScanConfigPaths method walks directory trees while skipping known large or irrelevant directories, hunting for agent-specific configuration files, dependency manifests, and content matching discovery rules. Each match generates a DiscoveredAgent with attached evidence pointing to the specific file path and line number.
The implementation resides in discovery.go lines 96-135, where the scanner recursively traverses supplied paths and applies both filename patterns (configPatterns) and content signatures (processSignatures).
Scanning GitHub Repositories
For shadow AI discovery in source control, ScanGitHubRepositories utilizes a thin GitHubDiscoveryClient wrapper around the GitHub Contents API. The scanner fetches raw files from specified repositories, then applies the same config-file and dependency-file logic used in local filesystem scans.
This enables organizations to identify unregistered agents defined in infrastructure-as-code repositories or microservice definitions before they deploy to production:
client := agentmesh.NewGitHubDiscoveryClient("<TOKEN>")
client.BaseURL = "https://api.github.com"
scanner := agentmesh.NewShadowDiscoveryScanner()
result := scanner.ScanGitHubRepositories(client, []string{"octo/demo", "myorg/agent-repo"})
Core Types and Confidence Aggregation
The discovery engine employs a probabilistic model to combine weak signals into high-confidence findings. Four core types drive this aggregation:
- DetectionBasis – Enum describing how evidence was gathered (process, config file, or GitHub repo)
- AgentStatus – Classification:
registered,unregistered,shadow, orunknown - DiscoveryEvidence – Individual signal containing scanner name, basis, confidence score, timestamp, and raw data
- DiscoveredAgent – Aggregated view with fingerprint, status, and computed confidence
The Noisy-OR Confidence Model
The AddEvidence method in discovery.go (lines 82-92) implements a noisy-OR aggregation formula that allows multiple low-confidence signals to combine into high-confidence findings. Each incoming confidence value is clamped to [0, 1] and combined with existing evidence using:
combined = 1 - (1 - prior) * (1 - new)
This probabilistic approach prevents single false positives from dominating while allowing cumulative evidence to surface genuine shadow agents. Unit tests in discovery_test.go verify this behavior across edge cases.
Fingerprinting and Deduplication
To prevent duplicate entries for the same logical agent, the scanner generates stable hashes from merge keys—combinations of PID plus truncated command lines for processes, or normalized file paths for configuration files. This fingerprinting logic ensures that an agent detected via process scanning and again via filesystem scanning merges into a single DiscoveredAgent record with combined evidence from both sources.
Practical Implementation
Below is a complete, runnable example demonstrating the full shadow AI discovery workflow. This mirrors the reference implementation in examples/shadow-discovery/main.go:
package main
import (
"fmt"
"log"
"os"
"path/filepath"
agentmesh "github.com/microsoft/agent-governance-toolkit/agent-governance-golang/packages/agentmesh"
)
func main() {
// Create the scanner with built-in heuristics
scanner := agentmesh.NewShadowDiscoveryScanner()
// 1. Scan arbitrary text (environment files or code snippets)
textFindings := scanner.ScanText("sample.env",
"OPENAI_API_KEY=sk-test-not-real\nimport langchain\n")
fmt.Printf("Text findings: %d\n", len(textFindings))
// 2. Scan process command lines
cmds := []string{
"/usr/bin/python -m crewai.run",
"node /opt/agent/mcp-server.js",
}
procFindings := scanner.ScanProcessCommands(cmds)
fmt.Printf("Process findings: %d\n", len(procFindings))
// 3. Scan directory tree
root := buildFixture()
defer os.RemoveAll(root)
cfgResult := scanner.ScanConfigPaths([]string{root}, 5)
fmt.Printf("Config scan: %d agents discovered, %d errors\n",
len(cfgResult.Agents), len(cfgResult.Errors))
for _, a := range cfgResult.Agents {
fmt.Printf(" %-30s type=%-12s confidence=%.2f evid=%d\n",
a.Name, a.AgentType, a.Confidence, len(a.Evidence))
}
}
func buildFixture() string {
dir, _ := os.MkdirTemp("", "shadow-discovery-*")
files := map[string]string{
"agentmesh.yaml": "agent_id: did:agentmesh:demo\n",
"src/handler.py": "from langchain import agents\n",
"mcp.json": `{"server":"demo-mcp"}`,
}
for rel, body := range files {
path := filepath.Join(dir, rel)
os.MkdirAll(filepath.Dir(path), 0o755)
os.WriteFile(path, []byte(body), 0o644)
}
return dir
}
Running this program prints low-level findings followed by aggregated DiscoveredAgent objects with confidence scores derived from the noisy-OR formula.
Summary
- ShadowDiscoveryScanner provides four scanning surfaces: text, process commands, filesystem paths, and GitHub repositories.
- Fingerprinting using merge keys (PID + command line or file paths) prevents duplicate agent entries across data sources.
- Noisy-OR confidence aggregation (
1 - (1-prior)*(1-new)) combines weak signals into actionable high-confidence findings. - The scanner is stateless and extensible—add custom regex rules to
discoveryRulesor extendconfigPatternsfor new frameworks. - All core logic resides in
agent-governance-golang/packages/agentmesh/discovery.gowith comprehensive tests indiscovery_test.go.
Frequently Asked Questions
How does Shadow AI Discovery differentiate between registered and unregistered agents?
The scanner assigns AgentStatus based on evidence context. Registered agents typically appear in governance registries with stable identifiers, while shadow agents lack these markers. The ScanConfigPaths method specifically looks for governance metadata files (like agentmesh.yaml) to mark agents as registered; their absence combined with process signatures or API key patterns indicates unregistered or shadow status.
Can I extend the discovery rules to detect custom agent frameworks?
Yes. The ShadowDiscoveryScanner accepts extensions to discoveryRules, configPatterns, and processSignatures. Because the scanner is stateless, you can instantiate multiple scanners with different rule sets for different environments, or modify the global rule sets in discovery.go before scanner instantiation to recognize proprietary frameworks or internal agent patterns.
What confidence threshold should I use for production alerts?
The noisy-OR formula in AddEvidence allows tuning based on your risk tolerance. In practice, a confidence threshold of 0.7 or higher indicates strong evidence of a shadow agent, while 0.4-0.6 suggests investigation-worthy leads requiring manual verification. The unit tests in discovery_test.go demonstrate how three low-confidence signals (0.4 each) combine to 0.784, illustrating why thresholds below 0.8 may be appropriate for discovery scenarios.
Does the scanner require elevated privileges to detect running agents?
Process scanning via ScanProcessCommands or ScanCurrentHostProcessList requires appropriate OS-level permissions to read process command lines. However, filesystem scanning (ScanConfigPaths) and text scanning (ScanText) operate without elevation, making them suitable for CI/CD pipelines or developer workstations where admin rights aren't available.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →