Linkis Engine Lifecycle States and LinkisGatewayCoreErrorCodeSummary for Engine Connection Failures
Linkis manages engine lifecycles through the EngineState enumeration with seven distinct states and two helper predicates, while engine connection failures are reported via the LinkisGatewayCoreErrorCodeSummary enum using codes 11010–11012 and 18000 when the Gateway cannot route to engine services.
Apache Linkis orchestrates distributed compute engines through a well-defined state machine and robust error handling. Understanding how the system tracks engine lifecycle states and interprets LinkisGatewayCoreErrorCodeSummary error codes is essential for debugging connection failures and building reliable submission clients.
Linkis Engine Lifecycle State Management
Linkis models every engine instance (executor) using the EngineState enumeration defined in the protocol layer. This state machine drives scheduling decisions, resource allocation, and cleanup operations across the cluster.
The EngineState Enumeration
The canonical state definitions reside in linkis-commons/linkis-protocol/src/main/java/org/apache/linkis/protocol/engine/EngineState.java. The enum defines seven possible states:
- Starting – The engine JVM is launching but not yet ready to accept tasks.
- Idle – The engine is initialized and waiting for workload assignment.
- Busy – The engine is actively executing a task.
- ShuttingDown – The engine is stopping gracefully and rejecting new tasks.
- Error – The engine encountered an unrecoverable error and cannot continue.
- Dead – The engine process is terminated and the instance is defunct.
- Success – The engine completed its work and finished successfully.
The enum provides two critical helper methods for scheduler logic:
// Returns true for Error, Dead, and Success
public static boolean isCompleted(EngineState state)
// Returns true for Idle and Busy only
public static boolean isAvailable(EngineState state)
These predicates allow the scheduler to determine whether an engine can receive new tasks (isAvailable) or whether its execution record is final and resources can be reclaimed (isCompleted).
Scheduler Integration and State Transitions
The Linkis scheduler aliases EngineState as ExecutorState in linkis-commons/linkis-scheduler/src/main/scala/org/apache/linkis/scheduler/executer/Executor.scala:
type ExecutorState = EngineState
val Starting = EngineState.Starting
val Idle = EngineState.Idle
// ... additional aliases
def isCompleted(state: ExecutorState): Boolean = EngineState.isCompleted(state)
def isAvailable(state: ExecutorState): Boolean = EngineState.isAvailable(state)
State changes trigger the ExecutorListener interface defined in ExecutorListener.scala:
def onExecutorStateChanged(
executor: Executor,
fromState: EngineState,
toState: EngineState
): Unit
Implementations of this listener react to lifecycle events—releasing cluster resources when an engine reaches Dead or updating telemetry when it transitions to Busy.
LinkisGatewayCoreErrorCodeSummary Error Codes
When the Linkis Gateway cannot locate or instantiate an Engine Service, it raises structured exceptions using the LinkisGatewayCoreErrorCodeSummary enumeration. These codes indicate exactly where the routing chain failed.
Service Resolution Errors (11010–11012)
The Gateway Core module defines three primary error codes for service discovery failures:
11010 – CANNOT_SERVICEID Thrown when the parsed service identifier does not match any registered service in the metadata registry. The error message format is: "Cannot find a correct serviceId for parsedServiceId:{0}, service list are:{1}".
11011 – CANNOT_ROUTE_SERVICE / NO_SERVICES_REGISTRY Indicates routing failures due to empty registries or missing label-based routing tables. Variants include:
- "Cannot route to the corresponding service, URL:{0} RouteLabel:{1}"
- "There are no services available in the registry URL:{0}"
- "There is no route label service with the corresponding app name"
11012 – CANNOT_INSTANCE Signifies that while the service identifier exists, none of its instances are reachable in the routing chain. Message: "Cannot find an instance in the routing chain of serviceId:{0}, please retry".
Request Handling Errors (18000)
18000 – GET_REQUESTBODY_FAILED Occurs when the Gateway cannot deserialize the HTTP request payload. While often client-side, this surfaces during engine connection attempts when malformed requests prevent service instantiation.
Exception Handling in GatewayRouter
The GatewayRouter implementation in linkis-spring-cloud-services/linkis-service-gateway/linkis-gateway-core/src/main/scala/org/apache/linkis/gateway/route/GatewayRouter.scala constructs these exceptions using MessageFormat:
// TooManyServiceException wraps CANNOT_SERVICEID
val errorMsg = new TooManyServiceException(
MessageFormat.format(CANNOT_SERVICEID.getErrorDesc, parsedServiceId, services)
)
throw errorMsg
For instance unavailability:
val message = MessageFormat.format(CANNOT_INSTANCE.getErrorDesc, serviceId)
throw new ErrorException(CANNOT_INSTANCE.getErrorCode, message)
These exceptions are defined in TooManyServiceException.scala and captured by the Gateway's HTTP layer, returning structured JSON responses containing the integer error code and formatted description.
Practical Implementation Examples
Checking Engine Availability
Use EngineState predicates before task dispatch:
import org.apache.linkis.protocol.engine.EngineState
def canAcceptWorkload(state: EngineState): Boolean =
EngineState.isAvailable(state) // true only for Idle or Busy
Handling Gateway Errors in Client Code
Map LinkisGatewayCoreErrorCodeSummary codes for user-friendly error handling:
import org.apache.linkis.gateway.errorcode.LinkisGatewayCoreErrorCodeSummary._
def handleConnectionFailure(code: Int, description: String): Unit = code match {
case CANNOT_SERVICEID.getErrorCode =>
logger.error(s"Service ID not registered: $description")
// Trigger service registry refresh
case CANNOT_INSTANCE.getErrorCode =>
logger.error(s"All engine instances down: $description")
// Initiate retry with backoff
case GET_REQUESTBODY_FAILED.getErrorCode =>
logger.error(s"Malformed request: $description")
case _ =>
logger.error(s"Gateway error $code: $description")
}
Summary
- EngineState centrally models the engine lifecycle with seven states (Starting → Idle/Busy → ShuttingDown → Dead/Success/Error) and helper methods
isAvailable()andisCompleted(). - The scheduler uses
ExecutorStatealiases andExecutorListenercallbacks to manage task dispatch and resource cleanup based on state transitions. - LinkisGatewayCoreErrorCodeSummary defines error codes 11010 (unknown service), 11011 (routing failure), 11012 (no reachable instances), and 18000 (request parsing) for engine connection failures.
- Exceptions are thrown in
GatewayRouterusing formatted messages and captured as structured HTTP error responses.
Frequently Asked Questions
What are the terminal states in the Linkis EngineState enumeration?
The terminal states are Error, Dead, and Success. The EngineState.isCompleted() method returns true for these three states, indicating that the engine instance has finished its lifecycle and will not process additional tasks.
How does the Linkis Gateway differentiate between a missing service and an unreachable instance?
The Gateway uses distinct error codes from LinkisGatewayCoreErrorCodeSummary: 11010 (CANNOT_SERVICEID) indicates the service identifier is not registered in the metadata catalog, while 11012 (CANNOT_INSTANCE) indicates the service exists but no healthy instances are available in the routing chain.
Can an engine in the Busy state accept new tasks?
No. According to the EngineState.isAvailable() implementation in EngineState.java, only Idle and Busy states return true for availability, but the scheduler interprets this differently—Idle engines accept new tasks, while Busy engines are currently occupied. The isAvailable predicate actually indicates the engine is operational rather than completed or dead.
Where are Linkis Gateway error codes documented in the source repository?
The human-readable documentation for gateway error codes resides in docs/errorcode/linkis-gateway-core-errorcode.md, while the programmatic enums are defined in the Java/Scala source files within linkis-spring-cloud-services/linkis-service-gateway/linkis-gateway-core/src/main/scala/org/apache/linkis/gateway/errorcode/.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →