How the InstanceLabel Service Enables Service Discovery in Linkis and InsLabelCacheConfiguration Options Explained

The InstanceLabel service in Apache Linkis provides a database-backed, label-driven service discovery mechanism where components register themselves with key-value metadata labels, enabling dynamic lookup of service instances while caching hot data in configurable Guava caches to reduce database round-trips.

The InstanceLabel service serves as the backbone of service discovery in Apache Linkis, allowing engine executors, context servers, and public-service components to advertise their capabilities through arbitrary key-value labels. By persisting these labels to relational tables and maintaining an in-memory cache governed by InsLabelCacheConfiguration, the system enables efficient, real-time service location without overwhelming the underlying database.

Architecture of the InstanceLabel Service

The service operates as a lightweight RPC server that manages the lifecycle of ServiceInstance metadata. Components interact with it through InstanceLabelClient to register labels, while discovery callers query the service to resolve instances matching specific criteria.

Label Registration and Refresh Workflow

When a component such as the Context Service (CS) server starts, it constructs a Map<String,Object> of labels describing its role and invokes InstanceLabelClient.refreshLabelsToInstance. This RPC reaches DefaultInsLabelService.refreshLabelsToInstance in linkis-public-enhancements/linkis-instance-label-server/src/main/java/org/apache/linkis/instance/label/service/impl/DefaultInsLabelService.java.

The implementation follows a transactional pattern:

  1. Remove existing relations for the instance via removeLabelsFromInstance
  2. Convert labels to InsPersistenceLabel objects using toInsPersistenceLabels
  3. Persist labels to the instance_label table and instances to instance_info via doInsertInsLabels and doInsertInstance
  4. Create relations in the ins_label_relation many-to-many join table using insLabelRelationDao.insertRelations

Batch operations respect the InsLabelConf.DB_PERSIST_BATCH_SIZE limit, which defaults to 100 rows per transaction.

Database Schema and Persistence Model

The persistence layer uses three core tables defined in linkis-dist/package/db/module/linkis_instance_label.sql:

  • instance_label: Stores label definitions (key, value, string value)
  • instance_info: Stores service instance records (application name, instance identifier)
  • ins_label_relation: Maps label IDs to instance IDs for many-to-many relationships

The DefaultInsLabelService uses asynchronous cleanup to manage orphan labels. An internal consumer queue configured via InsLabelConf periodically removes stale entries using a batch strategy.

Service Discovery Query Mechanism

Discovery callers invoke DefaultInsLabelService.searchInstancesByLabels, which converts query labels into InsPersistenceLabel objects and performs a relational lookup against ins_label_relation. The method returns a List<ServiceInstance> containing all matching instances, enabling load balancing and routing decisions based on label criteria such as engineType=spark or route keys.

InsLabelCacheConfiguration Options

The InsLabelConf class centralizes all tunable parameters for the InstanceLabel service, controlling both database persistence behavior and in-memory caching characteristics. These configurations allow administrators to balance data freshness against system throughput.

Persistence and Async Processing Settings

Configuration Key Default Description
wds.linkis.instance.label.persist.batch.size 100 Maximum rows per batch insert when persisting labels and instances to the database.
wds.linkis.instance.label.async.queue.capacity 1000 Capacity of the internal queue that buffers orphan label cleanup tasks.
wds.linkis.instance.label.async.queue.batch.size 100 Number of labels processed per asynchronous cleanup batch.
wds.linkis.instance.label.async.queue.interval-in-seconds 10 Interval between asynchronous queue consumption cycles.

Cache Tuning Parameters

To minimize database load during high-frequency discovery operations, the service maintains a Guava-style cache with three namespaces: instance, label, and appInstance.

Configuration Key Default Description
wds.linkis.instance.label.cache.expire.time-in-seconds 10 Time-to-live for cached entries before automatic eviction.
wds.linkis.instance.label.cache.maximum.size 1000 Maximum number of entries retained across all cache namespaces.
wds.linkis.instance.label.cache.names instance,label,appInstance Comma-separated list of cache namespaces managed by the service.
linkis.discovery.server-address http://localhost:20303 URL of the Linkis service registry used to resolve remote instance addresses.

These properties are read at service startup and injected into the cache builder within DefaultInsLabelService.

Practical Implementation Examples

Registering Labels from a Component

Components like the CS server register labels during initialization to advertise their availability:

// Inside CSInstanceLabelClient.init(...)
Map<String, Object> labels = new HashMap<>(1);
labels.put(LabelKeyConstant.ROUTE_KEY, "cs_1_" + ContextServerConf.CS_LABEL_SUFFIX);

InsLabelRefreshRequest request = new InsLabelRefreshRequest();
request.setLabels(labels);
request.setServiceInstance(Sender.getThisServiceInstance());

// RPC call to InstanceLabel Server
InstanceLabelClient.getInstance().refreshLabelsToInstance(request);

This triggers DefaultInsLabelService.refreshLabelsToInstance, which atomically replaces existing labels and persists the new set to the database.

Discovering Services by Label

Client applications query for specific service types using label-based filters:

// Create a label filter
Label<String> engineLabel = new EngineInstanceLabel();
engineLabel.setLabelKey("engineType");
engineLabel.setStringValue("spark");

// Query matching instances
List<ServiceInstance> engines = 
    InstanceLabelClient.getInstance().searchInstancesByLabels(
        Collections.singletonList(engineLabel));

// Process discovered Spark engine instances
engines.forEach(instance -> 
    System.out.println("Found engine: " + instance.getInstance()));

Under the hood, DefaultInsLabelService.searchInstancesByLabels queries the ins_label_relation table and returns hydrated ServiceInstance objects.

Configuring Cache Behavior

Override default cache settings in application.conf or linkis-instance-label.properties:


# Extend cache lifetime for stable environments

wds.linkis.instance.label.cache.expire.time-in-seconds = 30

# Increase capacity for high-scale deployments

wds.linkis.instance.label.cache.maximum.size = 5000

# Maintain default namespaces

wds.linkis.instance.label.cache.names = instance,label,appInstance

These settings directly configure the Guava cache instances used by the service to store hot lookup data.

Summary

  • DefaultInsLabelService provides the core implementation for label attachment, persistence, and discovery queries in linkis-instance-label-server.
  • The service uses three relational tables (instance_label, instance_info, ins_label_relation) to maintain many-to-many relationships between labels and service instances.
  • InsLabelConf exposes configuration keys for batch persistence sizes, asynchronous cleanup queues, and Guava cache parameters including expiration time and maximum size.
  • Client components interact via InstanceLabelClient to register labels during startup and query instances during runtime.
  • Default cache expiration is 10 seconds with a 1000-entry maximum, suitable for dynamic cloud environments but tunable for stable production clusters.

Frequently Asked Questions

How does the InstanceLabel service handle concurrent label updates from multiple instances?

The DefaultInsLabelService.refreshLabelsToInstance implementation atomically removes existing relations before inserting new ones within a transactional boundary. While individual instance updates are isolated, the service does not implement global locking across different instances, relying on the database's transactional consistency to maintain relation integrity.

What happens when the cache expires during an active service discovery query?

Cache expiration in the InstanceLabel service uses Guava's time-based eviction, which occurs during access or maintenance operations. If a query triggers expiration, the service transparently falls back to the database via searchInstancesByLabels, repopulating the cache with fresh data on completion. The default 10-second TTL ensures rapid convergence of service state changes.

Can I disable the in-memory caching entirely for debugging purposes?

While InsLabelConf does not provide an explicit "disable cache" flag, setting wds.linkis.instance.label.cache.maximum.size to 0 or wds.linkis.instance.label.cache.expire.time-in-seconds to 0 effectively prevents caching, forcing every searchInstancesByLabels call to hit the database. Note that this significantly impacts performance under load.

Where are the database table schemas defined for the InstanceLabel service?

The DDL for instance_label, instance_info, and ins_label_relation tables is located in linkis-dist/package/db/module/linkis_instance_label.sql within the Linkis repository. These tables store the persistent state of all registered labels and instance relationships.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →