# What Happens When a Node Fails or Becomes Unavailable During a Signing Operation

> Learn how the fystack/mpcium stack handles node failures during MPC signing operations. Discover immediate session aborts, error codes, data cleanup, and cluster updates.

- Repository: [Fystack Labs/mpcium](https://github.com/fystack/mpcium)
- Tags: internals
- Published: 2026-03-02

---

**When a node fails during an MPC signing operation, the fystack/mpcium stack immediately aborts the session, returns an `ErrorCodePeerUnavailable` error to the client, updates the cluster readiness registry, and securely wipes all sensitive session data.**

Distributed Multi-Party Computation (MPC) signing requires continuous participation from all nodes to prevent partial signature leakage and ensure cryptographic integrity. When a node becomes unavailable during a signing operation in the **fystack/mpcium** repository, the system implements a deterministic five-stage failure path that prioritizes immediate termination over optimistic completion.

## Message Delivery Failure Detection

The signing protocol detects node unavailability at the network layer through point-to-point message delivery failures. During an active session, the signing party sends TSS protocol messages via the `DirectMessaging` interface implemented in [`pkg/mpc/session.go`](https://github.com/fystack/mpcium/blob/main/pkg/mpc/session.go).

When the destination node is unreachable or the NATS server reports *no responders*, the `SendToOther` call returns an error. The session wrapper immediately pushes this error onto the internal error channel:

```go
s.ErrCh <- fmt.Errorf("failed to send direct message to %s", topic)

```

This detection occurs in real-time during the `s.direct.SendToOther` invocation, ensuring that transport-level failures trigger an abort before the MPC protocol advances to the next round.

## Error Propagation from Session to Consumer

Once an error enters the session’s `ErrCh`, the `eventConsumer` component takes over. In [`pkg/eventconsumer/event_consumer.go`](https://github.com/fystack/mpcium/blob/main/pkg/eventconsumer/event_consumer.go) (lines 71‑90), a dedicated watcher goroutine reads from `session.ErrChan()` and routes failures to `handleSigningSessionError`.

This function acts as the bridge between low-level transport errors and application-level event handling. It captures the error context—whether it originated from message sending, protocol violation, or timeout—and prepares it for client-facing translation.

## Error Code Translation and Result Publishing

The consumer translates internal errors into standardized response codes using the mapping logic in [`pkg/event/types.go`](https://github.com/fystack/mpcium/blob/main/pkg/event/types.go). Because the error string contains the word *“send”*, the `GetErrorCodeFromError` function classifies the failure as `ErrorCodePeerUnavailable`.

The handler then constructs a `SigningResultEvent` with `ResultType: ResultTypeError` and enqueues it onto the signing-result queue:

```go
resultQueue.Enqueue(SigningResultEvent{
    ResultType: ResultTypeError,
    ErrorCode:  ErrorCodePeerUnavailable,
    // ... session metadata
})

```

Downstream services, including the API layer, consume this queue to return deterministic failure responses to clients, ensuring that callers receive a consistent error code regardless of which specific transport layer exception occurred.

## Cluster-Wide Quorum Protection

Beyond the individual session, node failures impact cluster-wide signing availability. The `registry` component in [`pkg/mpc/registry.go`](https://github.com/fystack/mpcium/blob/main/pkg/mpc/registry.go) continuously monitors peer health through periodic “ready” keys stored in Consul via `WatchPeersReady`.

When a node disappears, its ready key expires and the registry marks the peer as unavailable (`readyMap[peerID] = false`). Before any new signing operation begins, the `signingConsumer.handleSigningEvent` function (lines 61‑66 in [`pkg/eventconsumer/sign_consumer.go`](https://github.com/fystack/mpcium/blob/main/pkg/eventconsumer/sign_consumer.go)) validates the cluster state:

```go
if !peerRegistry.AreMajorityReady() {
    return ErrorCodePeerUnavailable
}

```

This early-stage guard prevents the system from initiating new signing sessions when fewer than `t+1` peers (the quorum threshold) are available, avoiding unnecessary resource allocation and client timeouts.

## Secure Session Cleanup After Abort

When a session fails, the `Close()` method in [`pkg/mpc/session.go`](https://github.com/fystack/mpcium/blob/main/pkg/mpc/session.go) tears down NATS subscriptions and releases network resources. Critically, the `security` package in [`pkg/security/zeroize.go`](https://github.com/fystack/mpcium/blob/main/pkg/security/zeroize.go) overwrites sensitive session data—including private key shares, transaction data (`s.tx`), and derived keys—before garbage collection.

This zeroization ensures that partial signature material or ephemeral protocol state is not retained in memory on the remaining nodes after a peer failure, mitigating risks associated with memory inspection or cold boot attacks.

## Code Example: Observing Node Failure Errors

The following example demonstrates how to create a signing session and observe the error channel when a peer becomes unreachable:

```go
// 1️⃣  Create a signing session (normally done by the eventConsumer)
sess, err := node.CreateSigningSession(
    mpc.SessionTypeECDSA,   // or SessionTypeEDDSA
    "wallet-123",           // wallet ID
    "tx-abc",               // transaction ID
    "net-internal-01",      // network‑internal code
    signingResultQueue,     // queue where the final result is posted
    []uint32{44, 0, 0},    // optional derivation path
    "signing-idempotent-key",
)
if err != nil {
    log.Fatalf("cannot create session: %v", err)
}

// 2️⃣  Initialise the session with the transaction data
tx := new(big.Int).SetBytes([]byte{0x01, 0x02, 0x03})
if err = sess.Init(tx); err != nil {
    log.Fatalf("cannot init signing: %v", err)
}

// 3️⃣  Listen for internal errors while the protocol runs
go func() {
    for err := range sess.ErrChan() {
        // The error string will contain “send …” if a peer cannot be reached.
        fmt.Printf("Signing error observed: %v\n", err)
    }
}()

// 4️⃣  Kick‑off the signing process
sess.Sign(func(sig []byte) {
    fmt.Printf("Signature produced: %x\n", sig)
})

// Output when nodeB fails:
// Signing error observed: failed to send direct message to sign:ecdsa:direct:nodeA:nodeB:tx-abc

```

When a node fails, the goroutine reading from `sess.ErrChan()` receives the transport error, which the `eventConsumer` subsequently converts to `ErrorCodePeerUnavailable` and publishes as a `SigningResultEvent` with `ResultTypeError`.

## Summary

- **Immediate Detection**: The `SendToOther` method in [`pkg/mpc/session.go`](https://github.com/fystack/mpcium/blob/main/pkg/mpc/session.go) detects transport failures and pushes errors to `ErrCh` when nodes are unreachable.
- **Standardized Errors**: The `eventConsumer` maps send failures to `ErrorCodePeerUnavailable` via `GetErrorCodeFromError` in [`pkg/event/types.go`](https://github.com/fystack/mpcium/blob/main/pkg/event/types.go).
- **Quorum Enforcement**: The registry’s `AreMajorityReady()` check in [`pkg/eventconsumer/sign_consumer.go`](https://github.com/fystack/mpcium/blob/main/pkg/eventconsumer/sign_consumer.go) blocks new sessions until `t+1` peers are available.
- **Secure Termination**: Failed sessions trigger `Close()` and memory zeroization via [`pkg/security/zeroize.go`](https://github.com/fystack/mpcium/blob/main/pkg/security/zeroize.go) to prevent key material leakage.
- **Deterministic Reporting**: All failures propagate through `SigningResultEvent` structures, ensuring API clients receive consistent error codes.

## Frequently Asked Questions

### Can an MPC signing operation complete if one participant node fails mid-protocol?

No. The fystack/mpcium architecture enforces an all-or-nothing approach to signing. If any node becomes unreachable during the multi-round protocol, the `SendToOther` call fails, the error propagates through `session.ErrCh`, and the session aborts immediately. The design prevents partial signature generation that could leak information about the private key shares held by remaining nodes.

### What specific error code indicates a node is unavailable during signing?

The system returns **`ErrorCodePeerUnavailable`**. This code is assigned in [`pkg/event/types.go`](https://github.com/fystack/mpcium/blob/main/pkg/event/types.go) when `GetErrorCodeFromError` detects the substring *“send”* in the error message originating from [`pkg/mpc/session.go`](https://github.com/fystack/mpcium/blob/main/pkg/mpc/session.go). Clients receiving this code should treat the failure as transient and retry only after confirming the missing node has rejoined the cluster.

### How does the system prevent new signing requests when nodes are down?

Before processing any signing event, the `signingConsumer.handleSigningEvent` function verifies cluster health by calling `peerRegistry.AreMajorityReady()`. This check ensures at least `t+1` peers (the cryptographic threshold) are reporting ready status in Consul. If the quorum is not met, the request is rejected immediately with `ErrorCodePeerUnavailable` without allocating MPC session resources.

### Is sensitive cryptographic material exposed when a signing session fails?

No. The `session.Close()` method triggers the zeroization routines in [`pkg/security/zeroize.go`](https://github.com/fystack/mpcium/blob/main/pkg/security/zeroize.go) to overwrite memory regions containing private key shares, transaction data, and intermediate protocol state. This secure cleanup occurs on all remaining nodes before the failed session is garbage collected, ensuring that node outages do not leave exploitable artifacts in system memory.