# How to Implement Multi-Cloud File System Abstraction in Linkis Using FsPath, HDFSFileSystem, and S3FileSystem

> Implement multi-cloud file system abstraction in Linkis using FsPath, HDFSFileSystem, and S3FileSystem. Seamlessly manage diverse cloud storage without altering business logic.

- Repository: [The Apache Software Foundation/linkis](https://github.com/apache/linkis)
- Tags: tutorial
- Published: 2026-02-24

---

**Linkis provides a pluggable file system abstraction layer through the `Fs` interface, `FsPath` wrapper, and factory classes like `BuildHDFSFileSystem` and `BuildS3FileSystem`, enabling seamless multi-cloud storage operations without changing business logic.**

Apache Linkis delivers a unified storage abstraction that decouples your application code from underlying storage implementations. By leveraging the **Linkis file system abstraction**, developers can write once and deploy across HDFS, Amazon S3, and other cloud storage systems using the same API surface. This architecture relies on a minimal interface contract, a lightweight path wrapper, and runtime factories that inject cross-cutting concerns like auditing and permission checks.

## Core Components of the Linkis File System Abstraction

### The Fs Interface Contract

At the foundation of the abstraction lies the `Fs` interface defined in `org.apache.linkis.common.io.Fs`. This minimal contract specifies the essential operations every storage backend must implement, including `fsName()`, `read()`, `write()`, `list()`, and `delete()`. By programming against this interface rather than concrete implementations, your code remains agnostic to whether data resides on HDFS, S3, or local disk.

The interface resides in [`linkis-commons/linkis-common/src/main/java/org/apache/linkis/common/io/Fs.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-common/src/main/java/org/apache/linkis/common/io/Fs.java) and serves as the entry point for all file system operations within the Linkis ecosystem.

### FileSystem Abstract Base Class

The `FileSystem` abstract class in `org.apache.linkis.storage.fs.FileSystem` implements most of the `Fs` contract while adding common utilities for permission handling, ownership validation, and path manipulation. Located at [`linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/FileSystem.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/FileSystem.java), this class reduces boilerplate for concrete implementations by providing default implementations of `canRead()`, `canWrite()`, and `canExecute()` based on POSIX-style permission strings.

Concrete backends only need to override low-level operations like `list()`, `mkdir()`, and `renameTo()`, while inheriting consistent security semantics from the base class.

### FsPath as the Universal Path Wrapper

**`FsPath`** is a lightweight wrapper around string paths that carries essential metadata including owner, group, permissions, and timestamps. Defined in [`linkis-commons/linkis-common/src/main/java/org/apache/linkis/common/io/FsPath.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-common/src/main/java/org/apache/linkis/common/io/FsPath.java), this class ensures that all storage backends operate on the same data structure, eliminating the need for path format conversion when switching between HDFS and S3.

Unlike raw strings, `FsPath` objects preserve context about the file's security attributes, enabling the permission checks implemented in the abstract `FileSystem` class to function uniformly across disparate storage systems.

## Concrete Implementations for Multi-Cloud Storage

### HDFSFileSystem for Hadoop Clusters

The `HDFSFileSystem` class in [`linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/impl/HDFSFileSystem.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/impl/HDFSFileSystem.java) wraps the native Hadoop `FileSystem` API to provide full HDFS integration. This implementation respects HDFS Access Control Lists (ACLs) and integrates with Kerberos authentication when configured.

When you invoke `canWrite()` on an `HDFSFileSystem` instance, the implementation consults the NameNode to verify actual HDFS permissions against the current user and group context.

### S3FileSystem for AWS Object Storage

For Amazon S3 compatibility, Linkis provides `S3FileSystem` in [`linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/impl/S3FileSystem.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/fs/impl/S3FileSystem.java). This implementation wraps the AWS S3 SDK and emulates directory semantics by treating zero-length marker files as folder indicators. 

Because S3 does not enforce POSIX permissions, the `canRead()` and `canWrite()` methods in this implementation typically return `true`, delegating access control to AWS IAM policies configured at the bucket level. The path format remains compatible with `FsPath`, though the underlying implementation translates logical paths into S3 object keys.

## Factory Pattern and Runtime Selection

### BuildHDFSFileSystem and BuildS3FileSystem

Linkis uses factory classes to instantiate the appropriate file system implementation at runtime. The `BuildHDFSFileSystem` and `BuildS3FileSystem` classes in `linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/factory/impl/` handle construction and configuration of their respective backends.

These factories create **CGLIB proxies** around the concrete `FileSystem` instances, injecting Linkis IO method interceptors that enable transparent auditing, metrics collection, and permission validation. You obtain a file system instance by calling `getFs(String user, String proxyUser)` on the appropriate builder.

### BuildFactory for Label-Based Selection

The `BuildFactory` interface in [`linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/factory/BuildFactory.java`](https://github.com/apache/linkis/blob/main/linkis-commons/linkis-storage/src/main/java/org/apache/linkis/storage/factory/BuildFactory.java) provides a higher-level abstraction for selecting storage backends. The static method `BuildFactory.getFactory(String label)` maps string labels like `"hdfs"` or `"s3"` to their corresponding factory implementations, returning either `BuildHDFSFileSystem` or `BuildS3FileSystem` as appropriate.

This label-based resolution enables configuration-driven storage selection, allowing operations teams to change the underlying storage system for a deployment without modifying application code.

## Practical Implementation Examples

### Reading and Writing to HDFS

The following example demonstrates basic file operations using the HDFS implementation:

```java
// 1. Build an HDFS-backed Fs (proxy mode will be used if the node has HDFS config)
Fs fs = new BuildHDFSFileSystem().getFs("alice", "proxyAlice");

// 2. Wrap the target path in an FsPath
FsPath path = new FsPath("/user/alice/input.txt");

// 3. Write data (overwrite = true)
try (OutputStream out = fs.write(path, true)) {
    out.write("Hello Linkis".getBytes(StandardCharsets.UTF_8));
}

// 4. Read the data back
try (InputStream in = fs.read(path)) {
    String content = new BufferedReader(new InputStreamReader(in))
                         .lines().collect(Collectors.joining("\n"));
    System.out.println(content);   // → Hello Linkis
}

```

*Key classes used:* `BuildHDFSFileSystem`, `Fs`, `FsPath`, `HDFSFileSystem`

### Switching to S3 Without Code Changes

To migrate the same logic to S3, simply swap the factory implementation:

```java
// Obtain an S3-backed Fs (the label "s3" can be used to pick the right factory)
Fs s3Fs = new BuildS3FileSystem().getFs("bob", "proxyBob");

// S3 uses the same FsPath abstraction – the "bucket" is configured in the
// StorageConfiguration, the path is logical without a scheme.
FsPath s3Path = new FsPath("/datasets/sample.csv");

// Write a CSV file to S3
try (OutputStream out = s3Fs.write(s3Path, false)) {
    out.write("id,value\n1,foo\n2,bar".getBytes(StandardCharsets.UTF_8));
}

// List objects under a directory (S3 treats "/" as a virtual folder)
List<FsPath> files = s3Fs.list(new FsPath("/datasets"));
files.forEach(fp -> System.out.println(fp.getPath()));

```

*Key classes used:* `BuildS3FileSystem`, `S3FileSystem`, `FsPath`

### Using BuildFactory for Storage-Agnostic Code

For maximum portability, use the generic factory to hide concrete implementations:

```java
// BuildFactory decides the concrete implementation based on the label
BuildFactory factory = BuildFactory.getFactory("s3");   // returns BuildS3FileSystem
Fs fs = factory.getFs("carol", "proxyCarol");

// From here the code is identical to the HDFS example
FsPath path = new FsPath("/logs/2024/03/01.log");
fs.mkdir(new FsPath("/logs/2024/03"));   // creates virtual "directory" in S3

```

### Handling Permissions Across Storage Types

Always verify permissions before sensitive operations, noting that semantics vary by backend:

```java
FsPath dir = new FsPath("/secure/data");
if (fs.canWrite(dir)) {
    fs.create(new FsPath("/secure/data/new.txt"));
} else {
    throw new IOException("Current user lacks write permission on " + dir.getPath());
}

```

The `canWrite` implementation in `HDFSFileSystem` consults HDFS ACLs, while `S3FileSystem` returns `true` since S3 does not enforce POSIX permissions.

## Summary

- **Linkis file system abstraction** relies on the `Fs` interface in `linkis-common` and the `FileSystem` abstract class in `linkis-storage` to provide a unified API across storage backends.
- **`FsPath`** serves as the universal path wrapper carrying metadata (owner, group, permissions) for all file systems, located in `org.apache.linkis.common.io.FsPath`.
- **Concrete implementations** like `HDFSFileSystem` and `S3FileSystem` handle protocol-specific operations while inheriting common utilities from the base class.
- **Factory classes** (`BuildHDFSFileSystem`, `BuildS3FileSystem`, and `BuildFactory`) instantiate proxied file system instances at runtime, enabling label-based storage selection and transparent interceptor injection.
- **Permission semantics** differ by backend—HDFS enforces POSIX-style ACLs while S3 delegates to IAM—though the API remains consistent through the abstraction layer.

## Frequently Asked Questions

### How does Linkis handle directory creation differently between HDFS and S3?

In `HDFSFileSystem`, the `mkdir()` operation creates physical directories in the Hadoop namespace with proper inode allocation and permission bits. Conversely, `S3FileSystem` emulates directories by creating zero-length marker files with trailing slash keys, since S3 is a flat object store without native directory concepts. Both implementations expose the same `mkdir(FsPath)` signature, so callers use identical code regardless of the underlying storage architecture.

### Can I implement a custom file system for another cloud provider using Linkis?

Yes, you can extend the `FileSystem` abstract class and implement the required abstract methods such as `list()`, `read()`, `write()`, and `exists()`. Place your implementation in the `org.apache.linkis.storage.fs` package or a custom package, then create a corresponding factory class extending the factory pattern used by `BuildHDFSFileSystem`. Your custom factory should return a CGLIB-proxied instance if you require Linkis interceptors for auditing or security.

### What is the purpose of the CGLIB proxy created by BuildHDFSFileSystem and BuildS3FileSystem?

The CGLIB proxy wraps the concrete `FileSystem` implementation to inject **Linkis IO method interceptors** at runtime. These interceptors enable cross-cutting concerns such as operation auditing, performance metrics collection, and additional permission validation without cluttering the core file system logic. The proxy is created transparently when you call `getFs()` on the factory, requiring no changes to client code.

### How does StorageUtils determine which file system to instantiate?

`StorageUtils` provides utility methods like `isHDFSNode()` and constants such as `HDFS()` and `S3()` that inspect the runtime environment and configuration properties. The factory classes consult these utilities to determine whether HDFS configuration is present on the node or whether S3 credentials are configured, defaulting to the appropriate implementation. This allows Linkis deployments to automatically adapt to their infrastructure without explicit configuration of the storage backend in application code.