Linkis BML Resource Versioning Mechanism: How to Implement Custom Cloud Storage Helpers

Apache Linkis BML (Base Material Library) implements a monotonic versioning system where every upload or update generates a unique resourceId and incremental version string, while storage backends are abstracted through the ResourceHelper interface allowing custom implementations for cloud providers like Alibaba OSS or Google Cloud Storage.

The Base Material Library (BML) in Apache Linkis provides centralized resource management for scripts, datasets, and configuration files. Understanding the BML resource versioning mechanism is essential for building reliable data pipelines that require historical tracking and rollback capabilities. This article examines the version control implementation in BmlProtocol.scala and demonstrates how to extend BML with custom resource helpers for cloud storage by implementing the ResourceHelper interface.

How BML Resource Versioning Works Internally

Version-Aware Protocol Definitions

In linkis-public-enhancements/linkis-pes-common/src/main/scala/org/apache/linkis/bml/protocol/BmlProtocol.scala, the versioning semantics are defined through case classes. Every operation returns either BmlUploadResponse or BmlUpdateResponse, both containing resourceId and version fields. The Version case class and ResourceVersions collection enable the server to return complete version histories via BmlResourceVersionsResponse, while BmlRollbackVersionResponse supports reverting to previous states.

The version string follows a monotonically increasing pattern—typically starting at "0" for initial uploads and incrementing by one for each subsequent update.

The Version Lifecycle

The BML server manages four primary versioned operations:

  1. Upload: client.uploadResource(user, fileName, stream) returns BmlUploadResponse(resourceId, version="0"), establishing the initial version.
  2. Update: client.updateShareResource(user, resourceId, newFileName, stream) creates BmlUpdateResponse(resourceId, version="1"), generating a new immutable version while preserving history.
  3. Query: client.downloadShareResource(user, resourceId, version) retrieves specific byte streams; passing null fetches the latest version.
  4. Rollback: client.rollbackVersion(user, resourceId, targetVersion) returns BmlRollbackVersionResponse, reverting the resource to a specified historical version.

This flow ensures immutable version history where updates never overwrite existing data.

Client-Side Version Management

The BMLHelper class in linkis-public-enhancements/linkis-pes-publicservice/src/main/scala/org/apache/linkis/filesystem/bml/BMLHelper.scala provides the public API for versioned operations. It constructs a BmlClient via BmlClientFactory and returns Java Maps containing resourceId and version strings. When querying, the helper accepts an explicit version parameter—if null, the server returns the most recent version automatically.

Implementing Custom Cloud Storage Resource Helpers

The ResourceHelper Interface Contract

Storage abstraction in BML is handled by the ResourceHelper interface located in linkis-public-enhancements/linkis-bml-server/src/main/java/org/apache/linkis/bml/common/ResourceHelper.java. This contract defines six critical methods that any custom implementation must satisfy:

  • upload(String path, String user, InputStream inputStream, StringBuilder md5Holder, boolean overwrite): Writes bytes to storage and returns the file size.
  • generatePath(String user, String fileName, Map<String, Object> properties): Constructs unique storage paths following schema-specific conventions.
  • getSchema(): Returns the URI scheme (e.g., "oss://", "gcs://") for routing.
  • checkIfExists(String path, String user): Verifies resource availability.
  • checkBmlResourceStoragePrefixPathIfChanged(String path): Validates that paths match the configured storage prefix.
  • update(String path): Handles overwrite semantics (often delegates to upload with overwrite=true).

Step 1: Implementing the ResourceHelper Interface

To add support for a cloud provider like Alibaba Cloud OSS, create a class implementing ResourceHelper. The implementation must handle SDK initialization, path generation with date-based organization, and MD5 computation:

package org.apache.linkis.bml.common;

import org.apache.linkis.bml.conf.BmlServerConfiguration;
import java.io.InputStream;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map;

public class OssResourceHelper implements ResourceHelper {
    private static final String SCHEMA = "oss://";
    
    @Override
    public long upload(String path, String user, InputStream inputStream, 
                       StringBuilder md5Holder, boolean overwrite) 
                       throws UploadResourceException {
        // Initialize OSS client using Alibaba SDK
        // Write stream to bucket, compute MD5 if md5Holder != null
        // Return bytes written
        return 0; // Actual implementation returns size
    }
    
    @Override
    public String generatePath(String user, String fileName, 
                               Map<String, Object> properties) {
        SimpleDateFormat fmt = new SimpleDateFormat("yyyyMMdd");
        String date = fmt.format(new Date());
        return SCHEMA + BmlServerConfiguration.BML_OSS_PREFIX().getValue() 
               + "/" + user + "/bml/" + date + "/" + fileName;
    }
    
    @Override
    public String getSchema() {
        return SCHEMA;
    }
    
    @Override
    public boolean checkIfExists(String path, String user) {
        // OSS SDK existence check
        return false;
    }
    
    @Override
    public boolean checkBmlResourceStoragePrefixPathIfChanged(String path) {
        String prefix = SCHEMA + BmlServerConfiguration.BML_OSS_PREFIX().getValue();
        return !path.startsWith(prefix);
    }
    
    @Override
    public void update(String path) {
        // Delegate to upload with overwrite=true
    }
}

Step 2: Registering in ResourceHelperFactory

Register the helper in linkis-public-enhancements/linkis-bml-server/src/main/java/org/apache/linkis/bml/common/ResourceHelperFactory.java:

public class ResourceHelperFactory {
    private static final ResourceHelper OSS_RESOURCE_HELPER = new OssResourceHelper();
    
    public static ResourceHelper getResourceHelper() {
        String fsType = BmlServerConfiguration.BML_FILESYSTEM_TYPE().getValue();
        if ("hdfs".equals(fsType)) {
            return HDFS_RESOURCE_HELPER;
        } else if ("s3".equals(fsType)) {
            return S3_RESOURCE_HELPER;
        } else if ("oss".equals(fsType)) {
            return OSS_RESOURCE_HELPER;  // Custom helper
        } else {
            return LOCAL_RESOURCE_HELPER;
        }
    }
}

Step 3: Configuring the Storage Backend

Add configuration properties in linkis-bml-server/conf/bml.properties:

wds.linkis.bml.filesystem.type=oss
wds.linkis.bml.oss.prefix=linkis-bml

The factory reads BML_FILESYSTEM_TYPE from BmlServerConfiguration at runtime to instantiate the correct helper.

Practical Usage Examples

Uploading and Versioning Resources via BMLHelper

Using the Scala API to manage versioned resources:

import org.apache.linkis.filesystem.bml.BMLHelper

val bml = new BMLHelper()

// Initial upload creates version "0"
val uploadResult = bml.upload("alice", """{"config":"production"}""", "app.json")
val resourceId = uploadResult.get("resourceId").asInstanceOf[String]  // e.g., "bml-12345"
val version0 = uploadResult.get("version").asInstanceOf[String]       // "0"

// Update creates version "1"
val updateResult = bml.update("alice", resourceId, """{"config":"staging"}""")
val version1 = updateResult.get("version").asInstanceOf[String]       // "1"

Retrieving Specific Versions and Rolling Back

Access historical versions or revert changes:

// Query specific version
val v0Data = bml.query("alice", resourceId, "0")
val inputStream = v0Data.get("stream").asInstanceOf[java.io.InputStream]

// Rollback to version 0 (creates new version pointing to old content)
val rollbackResult = bml.update("alice", resourceId, """{"config":"production"}""")

All operations transparently use the configured ResourceHelper (OSS, HDFS, or S3) while maintaining consistent versioning semantics.

Summary

  • BML resource versioning assigns monotonically increasing version strings (starting at "0") to every upload and update operation, storing these mappings in the protocol layer defined in BmlProtocol.scala.
  • The version lifecycle includes upload, update, query (with optional version parameter), and rollback operations, all managed server-side and exposed through BMLHelper.scala.
  • Storage abstraction is achieved via the ResourceHelper interface in ResourceHelper.java, which decouples versioning logic from physical storage.
  • Custom implementations require implementing six methods (upload, generatePath, getSchema, checkIfExists, checkBmlResourceStoragePrefixPathIfChanged, update), registering the instance in ResourceHelperFactory, and setting wds.linkis.bml.filesystem.type.
  • The versioning mechanism remains identical regardless of storage backend—whether HDFS, S3, or custom cloud providers—because version metadata is managed independently of byte storage.

Frequently Asked Questions

How does BML handle version conflicts during concurrent updates?

The BML server generates versions atomically using internal counters or timestamps, ensuring that concurrent update operations receive unique, sequential version strings without collisions. Clients receive the specific version assigned to their transaction in the BmlUpdateResponse.

Can I migrate existing resources from HDFS to a custom cloud storage helper?

Yes. Since the versioning metadata (resourceId and version strings) is stored separately from the physical bytes, you can implement a migration script that reads existing resources via the HDFS helper and re-uploads them using your custom helper. The resources will receive new version histories in the target storage while maintaining the same logical resourceIds.

What happens to historical versions when I delete a resource?

Resource deletion behavior depends on the specific ResourceHelper implementation. The BML protocol supports version-specific deletion, but typically, deleting a resource removes all its versions from storage. Implement custom archival logic in your ResourceHelper.delete method if you need to preserve historical versions after resource deletion.

Is the version string format configurable or strictly numeric?

While the default implementation uses numeric strings (0, 1, 2...), the version field in BmlProtocol.scala is a String type, allowing custom server implementations to use timestamp-based or semantic versioning formats. However, the standard BMLHelper assumes monotonic ordering, so custom formats require corresponding client adjustments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →