What Is an Executor in Spark Standalone Mode? Key Differences from Workers and Cores

An executor in Spark is a per-application JVM process launched by a Worker node to run tasks and cache data, distinct from the long-running Worker daemon that manages node resources and the CPU cores that represent the actual compute slots allocated to that executor by the Master.

When deploying Apache Spark on a standalone cluster, understanding the hierarchy between workers, executors, and cores is essential for tuning performance and resource utilization. In the apache/spark repository, these components are implemented across specific source files that handle resource advertisement, process lifecycle management, and task execution scheduling. This article examines the actual implementation to clarify how these three elements differ and collaborate within the Standalone cluster manager.

Spark Standalone Architecture Overview

In the Standalone deployment mode, three distinct entities manage compute resources: the Worker, the Executor, and CPU cores. Each plays a specific role in the cluster resource management lifecycle.

Worker Node

The Worker is a long-running daemon process that runs on each machine in the cluster. According to core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala, the Worker registers with the Master via the RegisterWorker message (lines 58‑71), advertises the total number of CPU cores and memory available on its host, and awaits instructions to launch executor processes. The Worker maintains a web UI accessible at http://<host>:8081 and tracks coresUsed and memoryUsed to enforce local resource limits.

Executor Process

An Executor is a JVM process started by a Worker specifically for one Spark application. As implemented in core/src/main/scala/org/apache/spark/executor/Executor.scala (constructor at lines 48‑55), the Executor creates a thread pool to run tasks, holds the application’s cached RDDs and DataFrames in memory, and reports task progress and heartbeats back to the Driver. When the application finishes, the Worker terminates the executor and frees its allocated resources. Multiple executors from different applications can coexist on the same Worker node, or a single application can span multiple Workers.

CPU Cores

Cores represent the atomic unit of compute capacity that the Master allocates to applications. Workers report their total cores via the --cores or spark.worker.cores configuration (documented in docs/spark-standalone.md, lines 71‑73). When the Master schedules an application, it determines how many cores each executor receives through spark.executor.cores. If this value is omitted, the executor consumes all free cores on the Worker, which can lead to resource monopolization.

How Workers, Executors, and Cores Interact

The lifecycle from cluster startup to task execution follows a strict choreography defined in the Standalone master and worker implementations:

  1. Worker Registration – Starting a Worker via sbin/start-worker.sh initiates the daemon, which contacts the Master at spark://HOST:PORT and sends its total resource capacity (cores and memory) as stored in WorkerInfo.scala.

  2. Application Submission – When you run spark-submit --master spark://HOST:PORT, the Master evaluates the request against available resources using constraints like spark.cores.max (total cores for the application) and spark.deploy.defaultCores (fallback allocation).

  3. Executor Launch – The Master issues a LaunchExecutor RPC to a selected Worker. The Worker spawns an ExecutorRunner process, which instantiates the Executor object from Executor.scala and configures its working directory and log files.

  4. Task Execution – Inside the Executor, a thread pool runs application tasks. Each active task consumes one core slot from the executor’s allocation (configured via spark.executor.cores). The Executor sends heartbeats and metrics to the Driver while maintaining local storage for shuffle and cached data.

  5. Resource Cleanup – Upon application completion or failure, the Worker terminates the Executor process and decrements its coresUsed and memoryUsed counters, making those resources available for subsequent applications as tracked in ExecutorInfo.scala.

Key Differences Between Workers and Executors

Understanding the distinction between these components prevents common configuration errors like confusing node-level settings with application-level resource requests.

Aspect Worker Executor
Lifecycle Starts once per node and persists for the cluster’s uptime unless manually stopped. Created per application when scheduled by the Master; terminated when the application finishes or fails.
Responsibility Manages node resources, registers with Master, launches and monitors executor processes, and cleans up local directories. Executes tasks, stores cached data blocks, performs serialization and garbage collection, and reports status to the Driver.
Resource Granularity Holds the total CPU cores and memory of the physical host. Holds a subset of the Worker’s cores (determined by spark.executor.cores) and a slice of memory (spark.executor.memory).
Visibility Appears in the Master UI under the Workers tab with links to the Worker’s web interface. Appears in the Executors tab of the specific application’s UI with individual log links and task statistics.

Configuring Executors and Cores in Standalone Mode

Controlling the relationship between Workers, Executors, and cores requires setting both node-level and application-level parameters.

Starting a Worker Node

Launch the Worker daemon on each cluster node to advertise resources to the Master:

./sbin/start-worker.sh spark://master-host:7077

The Worker automatically detects local hardware or uses explicit flags to define its total resource offer to the cluster.

Submitting an Application with Resource Controls

Specify how the Master should slice Worker resources into Executor processes:

./bin/spark-submit \
  --master spark://master-host:7077 \
  --total-executor-cores 4 \
  --executor-cores 2 \
  --executor-memory 2g \
  --class org.example.MyApp \
  my-app.jar

This configuration requests two executors (each with 2 cores), which the Master may place on one Worker or distribute across two different Workers depending on availability.

Inspecting Executors Programmatically

From within your Spark application, you can query the Driver’s view of allocated executors:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .master("spark://master-host:7077")
  .appName("ResourceCheck")
  .getOrCreate()

spark.sparkContext.getExecutorMemoryStatus.foreach { 
  case (hostPort, (maxMem, remainingMem)) =>
    println(s"Executor $hostPort – max: $maxMem, free: $remainingMem")
}

This returns the memory status for each active executor process currently running on the Workers.

Default Configuration Properties

Set cluster-wide defaults in conf/spark-defaults.conf to control resource allocation behavior:

spark.executor.cores=4
spark.cores.max=12
spark.deploy.defaultCores=2

These properties determine how the Master divides a Worker’s available cores among competing applications when explicit values are not provided at submit time.

Summary

  • Workers are persistent daemons that manage node resources and spawn executor processes as directed by the Master.
  • Executors are transient JVM processes created per application to execute tasks and maintain in-memory storage, implemented in core/src/main/scala/org/apache/spark/executor/Executor.scala.
  • Cores define the parallel execution capacity; the Master allocates them to executors based on spark.executor.cores, while Workers track availability via coresUsed.
  • The Standalone Master schedules executors by matching application resource requests against Worker advertisements stored in WorkerInfo and ExecutorInfo data structures.

Frequently Asked Questions

How many executors can run on a single Worker node?

A Worker can host multiple executors simultaneously as long as it has sufficient free cores and memory. The Master schedules executors from different applications or multiple executors from the same application (if spark.executor.cores is set lower than the Worker’s total cores) until the Worker’s coresUsed and memoryUsed reach their limits.

What happens if I do not specify spark.executor.cores?

If spark.executor.cores is omitted, the Standalone Master allocates all available cores on the target Worker to that single executor. This can prevent other applications from running on that node until the executor terminates, effectively monopolizing the Worker’s compute capacity for one application.

Can multiple Spark applications share the same executor process?

No. Executors are strictly bound to a single application. Each application submission triggers the Master to request new executor processes from Workers. When an application completes, its executors are terminated and their resources returned to the Worker pool for allocation to subsequent applications.

How does the Master decide which Worker hosts a new executor?

The Master evaluates resource availability using the WorkerInfo class to compare advertised cores and memory against current coresUsed and memoryUsed values. It prioritizes Workers with sufficient capacity to satisfy the application’s spark.executor.cores and spark.executor.memory requirements, often favoring data locality when HDFS or other storage topology hints are provided.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →