How Linkis ContextService Manages Unified Variables and UDFs Across Engine Types

The Linkis ContextService treats variables and UDFs as typed context entries, using a unified key-value store with ContextType discrimination to enable cross-engine sharing through persistent MySQL storage and in-memory caching.

The Apache Linkis project provides a computation middleware layer that standardizes access to diverse big data engines. Its ContextService (CS) component solves the fragmentation problem of variable and UDF management by abstracting these resources as first-class context objects. This article examines how Linkis ContextService manages unified variables and UDFs across different engine types through a type-safe, scope-aware architecture.

Unified Data Model for Context Entries

ContextKey and ContextValue Abstraction

Every piece of shared data is wrapped as a context entry composed of a ContextKey and ContextValue. In linkis-public-enhancements/linkis-pes-common/src/main/java/org/apache/linkis/cs/common/entity/source/ContextKey.java, the key encapsulates the identifier string, ContextType (VARIABLE or UDF), and ContextScope (global or engine-specific). The corresponding ContextValue defined in linkis-public-enhancements/linkis-pes-common/src/main/java/org/apache/linkis/cs/common/entity/source/ContextValue.java holds the serialized payload plus optional keyword metadata for search indexing.

Type Safety with ContextType Enumeration

The ContextType enum defined in linkis-public-enhancements/linkis-pes-common/src/main/java/org/apache/linkis/cs/common/entity/enumeration/ContextType.java categorizes entries as VARIABLE, UDF, METADATA, or other types. This classification allows the service to apply uniform storage logic while preserving type-specific handling requirements.

Variable and UDF Storage Strategies

For variables, Linkis uses the LinkisVariable class (in linkis-public-enhancements/linkis-pes-common/src/main/java/org/apache/linkis/cs/common/entity/object/LinkisVariable.java) to wrap configuration parameters or SQL settings. For UDFs, the system stores metadata in UDFInfo and UDFVersion entities (in linkis-public-enhancements/linkis-udf-service) while binaries reside in BML (BigData Material Library). The CS holds a reference pointer (resourceId) as the ContextValue with ContextType.UDF.

Core Workflow and API Operations

Storing and Retrieving Context Entries

The ContextServiceImpl class in linkis-public-enhancements/linkis-cs-server/src/main/java/org/apache/linkis/cs/server/service/impl/ContextServiceImpl.java handles persistence through ContextMapPersistenceImpl and caching via ContextCacheService. When invoking setValueByKey, the system serializes the value, updates the cache, and writes to MySQL. Retrieval via getContextValue checks the cache first, falling back to the database on miss.

To collect variables from ancestor nodes in a workflow DAG, CSVariableService utilizes DefaultSearchService (in linkis-public-enhancements/linkis-pes-client/src/main/java/org/apache/linkis/cs/client/service/DefaultSearchService.java). This walks the node lineage and aggregates all entries where ContextKey.contextType == VARIABLE, making upstream Spark SQL configurations available to downstream Flink or Hive engines.

UDF Registration and Resolution

The UDFServiceImpl (in linkis-public-enhancements/linkis-udf-service/src/main/java/org/apache/linkis/udf/service/impl/UDFServiceImpl.java) manages UDF lifecycle, uploading JARs to BML and recording metadata. Engines resolve UDFs by querying the CS for entries with ContextType.UDF, then loading the associated binaries from BML using the stored resource references.

Cross-Engine Propagation Mechanism

ContextID and Scope Management

Each engine instance operates within a logical ContextID. Entries marked with ContextScope.GLOBAL are visible to any engine sharing that ID, while scoped entries remain engine-specific. This allows fine-grained control over resource visibility across heterogeneous environments.

DAG-Aware Value Inheritance

The upstream search logic automatically merges values from parent workflow nodes. An engine receives the latest variable definitions regardless of its concrete implementation (Spark, Flink, Hive, Trino), ensuring consistency across the execution graph.

Practical Implementation Examples

Setting a Global Variable

String ctxId = SerializeHelper.serializeContextID(contextID);
String ctxKey = SerializeHelper.serializeContextKey(
    ContextKeyBuilder.builder()
        .key("my.spark.sql.conf")
        .contextType(ContextType.VARIABLE)
        .contextScope(ContextScope.GLOBAL)
        .build());

LinkisVariable var = new LinkisVariable();
var.setValue("spark.sql.shuffle.partitions=200");

ContextClient client = ContextClientFactory.getOrCreateContextClient();
client.update(
    SerializeHelper.deserializeContextID(ctxId),
    SerializeHelper.deserializeContextKey(ctxKey),
    new CommonContextValue(var));

Retrieving Upstream Variables

List<LinkisVariable> vars = CSVariableService.getInstance()
        .getUpstreamVariables(contextIDStr, nodeName);
for (LinkisVariable v : vars) {
    System.out.println("Upstream var: " + v.getValue());
}

Registering a Cross-Engine UDF

UDFAddVo addVo = new UDFAddVo();
addVo.setUdfName("myUdf");
addVo.setUdfType(ConstantVar.UDF_JAR);
addVo.setPath("hdfs:///udfs/myUdf.jar");
addVo.setRegisterFormat("CREATE FUNCTION myUdf AS 'com.example.MyUdf' USING JAR 'myUdf.jar'");
addVo.setLoad(true);

UDFService udfService = SpringContextUtil.getBean(UDFService.class);
Long udfId = udfService.addUDF(addVo, "alice");

// Create context entry for cross-engine visibility
ContextKey udfKey = new DefaultContextKey("myUdf", ContextType.UDF, ContextScope.GLOBAL);
ContextValue udfValue = new CommonContextValue();
udfValue.setValue(udfId);
contextClient.update(contextID, udfKey, udfValue);

Removing Typed Entries by Prefix

contextService.removeAllValueByKeyPrefixAndContextType(
        contextID, ContextType.UDF, "myUdf");

Summary

  • The ContextService abstracts variables and UDFs as unified context entries using ContextKey and ContextValue pairs distinguished by ContextType.
  • Cross-engine sharing works through ContextID scoping and ContextScope.GLOBAL visibility, supported by DAG-aware upstream search in DefaultSearchService.
  • The architecture combines MySQL persistence (ContextMapPersistenceImpl) with in-memory caching (ContextCacheService) for high-performance access.
  • UDF binaries reside in BML while the CS stores metadata references, enabling any engine to load functions on demand using UDFServiceImpl.
  • Operations are exposed via REST (ContextRestfulApi) and Java client (ContextClient) interfaces.

Frequently Asked Questions

How does Linkis distinguish between variables and UDFs in the ContextService?

The service uses the ContextType enum defined in linkis-public-enhancements/linkis-pes-common/src/main/java/org/apache/linkis/cs/common/entity/enumeration/ContextType.java. When creating a ContextKey, developers specify ContextType.VARIABLE for configuration parameters or ContextType.UDF for user-defined functions. This type tag determines serialization logic and search filters while both entry types share the same underlying key-value infrastructure in ContextServiceImpl.

Yes. When a UDF is registered via UDFServiceImpl, it uploads the JAR to BML and stores a reference in the ContextService with ContextScope.GLOBAL. Any engine operating under the same ContextID can query this entry, retrieve the BML resourceId, and load the JAR locally. This design enables true cross-engine UDF reuse without duplicate registrations.

What happens when upstream variables change during workflow execution?

The CSVariableService.getUpstreamVariables method triggers DefaultSearchService to perform a DAG traversal. It aggregates all ContextValue objects from ancestor nodes where ContextType equals VARIABLE. Engines receive the latest values at runtime, ensuring downstream tasks use current configurations regardless of when they were originally set.

Where does the ContextService store the actual UDF JAR files?

The binary artifacts are stored in Linkis BML (BigData Material Library), not directly in the ContextService. The CS entry (ContextValue) contains a resourceId pointer to the BML location. This separation allows the ContextService to remain lightweight while BML handles large binary storage and distribution, as implemented in the UDFServiceImpl workflow.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →