How the InstanceLabel Service Enables Service Discovery in Linkis and InsLabelCacheConfiguration Options Explained
The InstanceLabel service in Apache Linkis provides a database-backed, label-driven service discovery mechanism where components register themselves with key-value metadata labels, enabling dynamic lookup of service instances while caching hot data in configurable Guava caches to reduce database round-trips.
The InstanceLabel service serves as the backbone of service discovery in Apache Linkis, allowing engine executors, context servers, and public-service components to advertise their capabilities through arbitrary key-value labels. By persisting these labels to relational tables and maintaining an in-memory cache governed by InsLabelCacheConfiguration, the system enables efficient, real-time service location without overwhelming the underlying database.
Architecture of the InstanceLabel Service
The service operates as a lightweight RPC server that manages the lifecycle of ServiceInstance metadata. Components interact with it through InstanceLabelClient to register labels, while discovery callers query the service to resolve instances matching specific criteria.
Label Registration and Refresh Workflow
When a component such as the Context Service (CS) server starts, it constructs a Map<String,Object> of labels describing its role and invokes InstanceLabelClient.refreshLabelsToInstance. This RPC reaches DefaultInsLabelService.refreshLabelsToInstance in linkis-public-enhancements/linkis-instance-label-server/src/main/java/org/apache/linkis/instance/label/service/impl/DefaultInsLabelService.java.
The implementation follows a transactional pattern:
- Remove existing relations for the instance via
removeLabelsFromInstance - Convert labels to
InsPersistenceLabelobjects usingtoInsPersistenceLabels - Persist labels to the
instance_labeltable and instances toinstance_infoviadoInsertInsLabelsanddoInsertInstance - Create relations in the
ins_label_relationmany-to-many join table usinginsLabelRelationDao.insertRelations
Batch operations respect the InsLabelConf.DB_PERSIST_BATCH_SIZE limit, which defaults to 100 rows per transaction.
Database Schema and Persistence Model
The persistence layer uses three core tables defined in linkis-dist/package/db/module/linkis_instance_label.sql:
instance_label: Stores label definitions (key, value, string value)instance_info: Stores service instance records (application name, instance identifier)ins_label_relation: Maps label IDs to instance IDs for many-to-many relationships
The DefaultInsLabelService uses asynchronous cleanup to manage orphan labels. An internal consumer queue configured via InsLabelConf periodically removes stale entries using a batch strategy.
Service Discovery Query Mechanism
Discovery callers invoke DefaultInsLabelService.searchInstancesByLabels, which converts query labels into InsPersistenceLabel objects and performs a relational lookup against ins_label_relation. The method returns a List<ServiceInstance> containing all matching instances, enabling load balancing and routing decisions based on label criteria such as engineType=spark or route keys.
InsLabelCacheConfiguration Options
The InsLabelConf class centralizes all tunable parameters for the InstanceLabel service, controlling both database persistence behavior and in-memory caching characteristics. These configurations allow administrators to balance data freshness against system throughput.
Persistence and Async Processing Settings
| Configuration Key | Default | Description |
|---|---|---|
wds.linkis.instance.label.persist.batch.size |
100 | Maximum rows per batch insert when persisting labels and instances to the database. |
wds.linkis.instance.label.async.queue.capacity |
1000 | Capacity of the internal queue that buffers orphan label cleanup tasks. |
wds.linkis.instance.label.async.queue.batch.size |
100 | Number of labels processed per asynchronous cleanup batch. |
wds.linkis.instance.label.async.queue.interval-in-seconds |
10 | Interval between asynchronous queue consumption cycles. |
Cache Tuning Parameters
To minimize database load during high-frequency discovery operations, the service maintains a Guava-style cache with three namespaces: instance, label, and appInstance.
| Configuration Key | Default | Description |
|---|---|---|
wds.linkis.instance.label.cache.expire.time-in-seconds |
10 | Time-to-live for cached entries before automatic eviction. |
wds.linkis.instance.label.cache.maximum.size |
1000 | Maximum number of entries retained across all cache namespaces. |
wds.linkis.instance.label.cache.names |
instance,label,appInstance | Comma-separated list of cache namespaces managed by the service. |
linkis.discovery.server-address |
http://localhost:20303 | URL of the Linkis service registry used to resolve remote instance addresses. |
These properties are read at service startup and injected into the cache builder within DefaultInsLabelService.
Practical Implementation Examples
Registering Labels from a Component
Components like the CS server register labels during initialization to advertise their availability:
// Inside CSInstanceLabelClient.init(...)
Map<String, Object> labels = new HashMap<>(1);
labels.put(LabelKeyConstant.ROUTE_KEY, "cs_1_" + ContextServerConf.CS_LABEL_SUFFIX);
InsLabelRefreshRequest request = new InsLabelRefreshRequest();
request.setLabels(labels);
request.setServiceInstance(Sender.getThisServiceInstance());
// RPC call to InstanceLabel Server
InstanceLabelClient.getInstance().refreshLabelsToInstance(request);
This triggers DefaultInsLabelService.refreshLabelsToInstance, which atomically replaces existing labels and persists the new set to the database.
Discovering Services by Label
Client applications query for specific service types using label-based filters:
// Create a label filter
Label<String> engineLabel = new EngineInstanceLabel();
engineLabel.setLabelKey("engineType");
engineLabel.setStringValue("spark");
// Query matching instances
List<ServiceInstance> engines =
InstanceLabelClient.getInstance().searchInstancesByLabels(
Collections.singletonList(engineLabel));
// Process discovered Spark engine instances
engines.forEach(instance ->
System.out.println("Found engine: " + instance.getInstance()));
Under the hood, DefaultInsLabelService.searchInstancesByLabels queries the ins_label_relation table and returns hydrated ServiceInstance objects.
Configuring Cache Behavior
Override default cache settings in application.conf or linkis-instance-label.properties:
# Extend cache lifetime for stable environments
wds.linkis.instance.label.cache.expire.time-in-seconds = 30
# Increase capacity for high-scale deployments
wds.linkis.instance.label.cache.maximum.size = 5000
# Maintain default namespaces
wds.linkis.instance.label.cache.names = instance,label,appInstance
These settings directly configure the Guava cache instances used by the service to store hot lookup data.
Summary
- DefaultInsLabelService provides the core implementation for label attachment, persistence, and discovery queries in
linkis-instance-label-server. - The service uses three relational tables (
instance_label,instance_info,ins_label_relation) to maintain many-to-many relationships between labels and service instances. - InsLabelConf exposes configuration keys for batch persistence sizes, asynchronous cleanup queues, and Guava cache parameters including expiration time and maximum size.
- Client components interact via
InstanceLabelClientto register labels during startup and query instances during runtime. - Default cache expiration is 10 seconds with a 1000-entry maximum, suitable for dynamic cloud environments but tunable for stable production clusters.
Frequently Asked Questions
How does the InstanceLabel service handle concurrent label updates from multiple instances?
The DefaultInsLabelService.refreshLabelsToInstance implementation atomically removes existing relations before inserting new ones within a transactional boundary. While individual instance updates are isolated, the service does not implement global locking across different instances, relying on the database's transactional consistency to maintain relation integrity.
What happens when the cache expires during an active service discovery query?
Cache expiration in the InstanceLabel service uses Guava's time-based eviction, which occurs during access or maintenance operations. If a query triggers expiration, the service transparently falls back to the database via searchInstancesByLabels, repopulating the cache with fresh data on completion. The default 10-second TTL ensures rapid convergence of service state changes.
Can I disable the in-memory caching entirely for debugging purposes?
While InsLabelConf does not provide an explicit "disable cache" flag, setting wds.linkis.instance.label.cache.maximum.size to 0 or wds.linkis.instance.label.cache.expire.time-in-seconds to 0 effectively prevents caching, forcing every searchInstancesByLabels call to hit the database. Note that this significantly impacts performance under load.
Where are the database table schemas defined for the InstanceLabel service?
The DDL for instance_label, instance_info, and ins_label_relation tables is located in linkis-dist/package/db/module/linkis_instance_label.sql within the Linkis repository. These tables store the persistent state of all registered labels and instance relationships.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →