Implementing TrustScore with Five-Dimension Scoring in the Agent Governance Toolkit
The Microsoft Agent Governance Toolkit evaluates agent reliability using a dynamic TrustScore that aggregates five independent behavioral dimensions—Policy Compliance, Resource Efficiency, Output Quality, Security Posture, and Collaboration Health—into a weighted 0-1000 scale with automatic tier mapping and revocation.
The Agent Governance Toolkit (AGT) provides a robust framework for monitoring AI agent behavior through multi-dimensional trust scoring. This article examines how to implement the five-dimension TrustScore system by leveraging the RewardEngine and signal aggregation mechanisms available in the open-source microsoft/agent-governance-toolkit repository.
Understanding the Five-Dimension TrustScore Model
The TrustScore system evaluates agents across five distinct behavioral dimensions, each producing signals normalized to values between 0.0 and 1.0. Policy Compliance measures adherence to active governance rules, while Resource Efficiency tracks token and compute consumption against allocated budgets. Output Quality captures downstream acceptance rates from result consumers, Security Posture monitors boundary violations during execution, and Collaboration Health records the success rate of inter-agent handoffs.
The RewardEngine in agent-governance-python/agent-mesh/src/agentmesh/reward/engine.py processes these signals using an exponential moving average (α = 0.1) to smooth volatility, producing per-dimension scores on a 0-100 scale. These individual scores are then aggregated using configurable weights defined in agent-governance-python/agent-mesh/src/agentmesh/constants.py to generate a composite TrustScore ranging from 0 to 1000.
Architecture and Data Flow
Understanding the internal mechanics of trust calculation requires examining four core architectural components that transform raw telemetry into actionable governance decisions.
Signal Ingestion Layer
The RewardEngine exposes specialized wrapper methods that tag incoming telemetry with the appropriate DimensionType enum defined in reward/scoring.py. Applications invoke record_policy_compliance(), record_resource_usage(), record_output_quality(), record_security_event(), and record_collaboration() to emit behavioral signals. Each method internally calls the generic record_signal() function, routing data to the agent-specific RewardDimension instance for processing.
Dimension Scoring with Exponential Moving Average
Within the RewardDimension.add_signal() method, incoming signals undergo smoothing using an exponential moving average with a fixed alpha of 0.1. This calculation mitigates the impact of temporary fluctuations while preserving trend detection. The class simultaneously tracks signal counts, positivity/negativity ratios, and directional trends to provide comprehensive behavioral visibility beyond simple averages.
Weighted Aggregation
The private _recalculate_score() method in the RewardEngine implements the core aggregation logic. It retrieves current scores from all five dimensions, multiplies each by its corresponding weight constant (WEIGHT_POLICY_COMPLIANCE, WEIGHT_RESOURCE_EFFICIENCY, etc.), sums the weighted values, and scales the result to the 0-1000 range. Operators can override default weights through the RewardConfig class to align scoring with organizational risk tolerance.
Tier Mapping and Automatic Revocation
The TrustScore._update_tier() method maps aggregated scores to five distinct trust tiers: verified_partner, trusted, standard, probationary, and untrusted. Threshold boundaries are configurable via the constants file. When a score falls below the revocation_threshold (default 300), the engine automatically flags the agent as revoked and executes registered callback functions, enabling immediate response to trust degradation.
Implementing TrustScore in Python
The following examples demonstrate practical implementation patterns using the RewardService wrapper class.
Initialize a Custom RewardService
Configure dimension weights and thresholds during instantiation to match operational requirements.
from agentmesh.services.reward_engine import RewardService
from agentmesh.reward.engine import RewardConfig
config = RewardConfig(
revocation_threshold=250,
warning_threshold=450,
policy_compliance_weight=0.30,
resource_efficiency_weight=0.10,
output_quality_weight=0.20,
security_posture_weight=0.30,
collaboration_health_weight=0.10,
)
reward = RewardService(config=config)
Record Behavioral Signals
Emit signals across all five dimensions after agent task execution.
AGENT_ID = "did:mesh:worker-007"
# Policy dimension
reward.record_policy_compliance(
AGENT_ID,
compliant=True,
policy_name="no-external-api"
)
# Resource dimension
reward.record_resource_usage(
AGENT_ID,
tokens_used=800,
tokens_budget=1000,
compute_ms=900,
compute_budget_ms=1000,
)
# Quality dimension
reward.record_output_quality(
AGENT_ID,
accepted=True,
consumer="reporting-service",
)
# Security dimension
reward.record_security_event(
AGENT_ID,
within_boundary=True,
event_type="data_read",
)
# Collaboration dimension
reward.record_collaboration(
AGENT_ID,
handoff_successful=True,
peer_did="did:mesh:peer-003",
)
Retrieve Aggregated Trust Metrics
Access the composite score and dimensional breakdowns for monitoring dashboards.
score = reward.get_score(AGENT_ID)
print(f"Total Score: {score.total_score}") # 0-1000
print(f"Trust Tier: {score.tier}")
for name, dim in score.dimensions.items():
print(f"{name}: {dim.score:.1f} "
f"(signals={dim.signal_count}, trend={dim.trend})")
Handle Revocation Events
Register callbacks to trigger remediation workflows when trust falls below threshold.
def on_revoked(agent_did: str, reason: str):
print(f"Agent {agent_did} revoked: {reason}")
reward.on_revocation(on_revoked)
Key Source Files
The five-dimension TrustScore implementation spans several critical modules:
agent-governance-python/agent-mesh/src/agentmesh/reward/engine.py— Contains theRewardEngineclass that orchestrates signal aggregation and score calculation.agent-governance-python/agent-mesh/src/agentmesh/reward/scoring.py— DefinesDimensionType,RewardSignal,RewardDimension, andTrustScoredata models.agent-governance-python/agent-mesh/src/agentmesh/constants.py— Stores default weights, tier thresholds, andrevocation_thresholdvalues.docs/tutorials/17-advanced-trust-and-behavior.md— Provides end-to-end walkthroughs of multi-dimensional scoring workflows.
Summary
- The Agent Governance Toolkit implements five-dimensional trust scoring through Policy Compliance, Resource Efficiency, Output Quality, Security Posture, and Collaboration Health metrics.
- Signal processing utilizes exponential moving averages (α = 0.1) in
RewardDimension.add_signal()to produce stable per-dimension scores on a 0-100 scale. - Weighted aggregation occurs in
_recalculate_score()within the RewardEngine, mapping individual dimensions to a composite 0-1000 TrustScore. - The system automatically categorizes agents into five trust tiers and triggers revocation callbacks when scores breach configured thresholds.
- Implementation requires configuring
RewardConfigweights and invoking dimension-specific recording methods from theRewardServicewrapper.
Frequently Asked Questions
What is the default revocation threshold for the TrustScore?
By default, the revocation threshold is set to 300 on the 0-1000 scale, as defined in agent-governance-python/agent-mesh/src/agentmesh/constants.py. When an agent's aggregated score falls below this value, the RewardEngine automatically flags the agent as revoked and executes any registered revocation callbacks.
How does the exponential moving average affect dimension scoring?
The RewardDimension.add_signal() method applies an exponential moving average with α = 0.1 to incoming signals, meaning each new signal contributes 10% to the updated score while previous history retains 90% weight. This smoothing factor reduces noise from temporary behavioral fluctuations while maintaining sensitivity to genuine trend shifts.
Can operators customize the weight of individual trust dimensions?
Yes, the RewardConfig class allows complete customization of dimension weights through parameters like policy_compliance_weight and security_posture_weight. These weights must sum to 1.0 and are applied during the _recalculate_score() aggregation phase, enabling organizations to prioritize specific risk domains according to their operational requirements.
Where are the TrustScore tier thresholds defined?
Tier boundaries mapping scores to categories (verified_partner, trusted, standard, probationary, untrusted) are specified in agent-governance-python/agent-mesh/src/agentmesh/constants.py. The TrustScore._update_tier() method references these constants to determine tier membership whenever the aggregated score changes.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →