Implementing TrustScore with Five-Dimension Scoring in the Agent Governance Toolkit

The Microsoft Agent Governance Toolkit evaluates agent reliability using a dynamic TrustScore that aggregates five independent behavioral dimensions—Policy Compliance, Resource Efficiency, Output Quality, Security Posture, and Collaboration Health—into a weighted 0-1000 scale with automatic tier mapping and revocation.

The Agent Governance Toolkit (AGT) provides a robust framework for monitoring AI agent behavior through multi-dimensional trust scoring. This article examines how to implement the five-dimension TrustScore system by leveraging the RewardEngine and signal aggregation mechanisms available in the open-source microsoft/agent-governance-toolkit repository.

Understanding the Five-Dimension TrustScore Model

The TrustScore system evaluates agents across five distinct behavioral dimensions, each producing signals normalized to values between 0.0 and 1.0. Policy Compliance measures adherence to active governance rules, while Resource Efficiency tracks token and compute consumption against allocated budgets. Output Quality captures downstream acceptance rates from result consumers, Security Posture monitors boundary violations during execution, and Collaboration Health records the success rate of inter-agent handoffs.

The RewardEngine in agent-governance-python/agent-mesh/src/agentmesh/reward/engine.py processes these signals using an exponential moving average (α = 0.1) to smooth volatility, producing per-dimension scores on a 0-100 scale. These individual scores are then aggregated using configurable weights defined in agent-governance-python/agent-mesh/src/agentmesh/constants.py to generate a composite TrustScore ranging from 0 to 1000.

Architecture and Data Flow

Understanding the internal mechanics of trust calculation requires examining four core architectural components that transform raw telemetry into actionable governance decisions.

Signal Ingestion Layer

The RewardEngine exposes specialized wrapper methods that tag incoming telemetry with the appropriate DimensionType enum defined in reward/scoring.py. Applications invoke record_policy_compliance(), record_resource_usage(), record_output_quality(), record_security_event(), and record_collaboration() to emit behavioral signals. Each method internally calls the generic record_signal() function, routing data to the agent-specific RewardDimension instance for processing.

Dimension Scoring with Exponential Moving Average

Within the RewardDimension.add_signal() method, incoming signals undergo smoothing using an exponential moving average with a fixed alpha of 0.1. This calculation mitigates the impact of temporary fluctuations while preserving trend detection. The class simultaneously tracks signal counts, positivity/negativity ratios, and directional trends to provide comprehensive behavioral visibility beyond simple averages.

Weighted Aggregation

The private _recalculate_score() method in the RewardEngine implements the core aggregation logic. It retrieves current scores from all five dimensions, multiplies each by its corresponding weight constant (WEIGHT_POLICY_COMPLIANCE, WEIGHT_RESOURCE_EFFICIENCY, etc.), sums the weighted values, and scales the result to the 0-1000 range. Operators can override default weights through the RewardConfig class to align scoring with organizational risk tolerance.

Tier Mapping and Automatic Revocation

The TrustScore._update_tier() method maps aggregated scores to five distinct trust tiers: verified_partner, trusted, standard, probationary, and untrusted. Threshold boundaries are configurable via the constants file. When a score falls below the revocation_threshold (default 300), the engine automatically flags the agent as revoked and executes registered callback functions, enabling immediate response to trust degradation.

Implementing TrustScore in Python

The following examples demonstrate practical implementation patterns using the RewardService wrapper class.

Initialize a Custom RewardService

Configure dimension weights and thresholds during instantiation to match operational requirements.

from agentmesh.services.reward_engine import RewardService
from agentmesh.reward.engine import RewardConfig

config = RewardConfig(
    revocation_threshold=250,
    warning_threshold=450,
    policy_compliance_weight=0.30,
    resource_efficiency_weight=0.10,
    output_quality_weight=0.20,
    security_posture_weight=0.30,
    collaboration_health_weight=0.10,
)

reward = RewardService(config=config)

Record Behavioral Signals

Emit signals across all five dimensions after agent task execution.

AGENT_ID = "did:mesh:worker-007"

# Policy dimension

reward.record_policy_compliance(
    AGENT_ID, 
    compliant=True, 
    policy_name="no-external-api"
)

# Resource dimension

reward.record_resource_usage(
    AGENT_ID,
    tokens_used=800,
    tokens_budget=1000,
    compute_ms=900,
    compute_budget_ms=1000,
)

# Quality dimension

reward.record_output_quality(
    AGENT_ID,
    accepted=True,
    consumer="reporting-service",
)

# Security dimension

reward.record_security_event(
    AGENT_ID,
    within_boundary=True,
    event_type="data_read",
)

# Collaboration dimension

reward.record_collaboration(
    AGENT_ID,
    handoff_successful=True,
    peer_did="did:mesh:peer-003",
)

Retrieve Aggregated Trust Metrics

Access the composite score and dimensional breakdowns for monitoring dashboards.

score = reward.get_score(AGENT_ID)

print(f"Total Score: {score.total_score}")  # 0-1000

print(f"Trust Tier: {score.tier}")

for name, dim in score.dimensions.items():
    print(f"{name}: {dim.score:.1f} "
          f"(signals={dim.signal_count}, trend={dim.trend})")

Handle Revocation Events

Register callbacks to trigger remediation workflows when trust falls below threshold.

def on_revoked(agent_did: str, reason: str):
    print(f"Agent {agent_did} revoked: {reason}")

reward.on_revocation(on_revoked)

Key Source Files

The five-dimension TrustScore implementation spans several critical modules:

Summary

  • The Agent Governance Toolkit implements five-dimensional trust scoring through Policy Compliance, Resource Efficiency, Output Quality, Security Posture, and Collaboration Health metrics.
  • Signal processing utilizes exponential moving averages (α = 0.1) in RewardDimension.add_signal() to produce stable per-dimension scores on a 0-100 scale.
  • Weighted aggregation occurs in _recalculate_score() within the RewardEngine, mapping individual dimensions to a composite 0-1000 TrustScore.
  • The system automatically categorizes agents into five trust tiers and triggers revocation callbacks when scores breach configured thresholds.
  • Implementation requires configuring RewardConfig weights and invoking dimension-specific recording methods from the RewardService wrapper.

Frequently Asked Questions

What is the default revocation threshold for the TrustScore?

By default, the revocation threshold is set to 300 on the 0-1000 scale, as defined in agent-governance-python/agent-mesh/src/agentmesh/constants.py. When an agent's aggregated score falls below this value, the RewardEngine automatically flags the agent as revoked and executes any registered revocation callbacks.

How does the exponential moving average affect dimension scoring?

The RewardDimension.add_signal() method applies an exponential moving average with α = 0.1 to incoming signals, meaning each new signal contributes 10% to the updated score while previous history retains 90% weight. This smoothing factor reduces noise from temporary behavioral fluctuations while maintaining sensitivity to genuine trend shifts.

Can operators customize the weight of individual trust dimensions?

Yes, the RewardConfig class allows complete customization of dimension weights through parameters like policy_compliance_weight and security_posture_weight. These weights must sum to 1.0 and are applied during the _recalculate_score() aggregation phase, enabling organizations to prioritize specific risk domains according to their operational requirements.

Where are the TrustScore tier thresholds defined?

Tier boundaries mapping scores to categories (verified_partner, trusted, standard, probationary, untrusted) are specified in agent-governance-python/agent-mesh/src/agentmesh/constants.py. The TrustScore._update_tier() method references these constants to determine tier membership whenever the aggregated score changes.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →