Handling Partial Failures in Distributed Payment Systems with the Saga Pattern

Ensure transaction consistency with compensating actions and Azure Durable Functions in distributed payments.


The Problem: Partial Failures in Payments

Modern payment systems often involve multiple services:

  1. Reserve Funds (Bank API)

  2. Charge Customer (Payment Gateway)

  3. Update Inventory (Inventory Service)

  4. Send Confirmation (Email Service)

Partial Failure Example:

  • Funds are reserved, but the payment gateway fails.

  • Inventory is deducted, but the email service is down.

Without a recovery mechanism, this leaves systems in an inconsistent state.


What is the Saga Pattern?

A Saga is a sequence of local transactions where:

  • Each step has a compensating action to undo it.

  • Services collaborate via events (choreography) or a central orchestrator (orchestration).

Example Saga for Payments:

[Reserve Funds] → [Charge Customer] → [Update Inventory] → [Send Email]  
   │                  │                   │                   │  
   └─[Release Funds]  └─[Refund]          └─[Restock]         └─[N/A]

Implementing the Saga Pattern with Azure Durable Functions

We’ll use the orchestration-based Saga for centralized control and easier debugging.

Step 1: Define the Orchestrator

[FunctionName("PaymentSagaOrchestrator")]  
public static async Task RunOrchestrator(  
    [OrchestrationTrigger] IDurableOrchestrationContext context)  
{  
    var paymentRequest = context.GetInput<PaymentRequest>();  

    try  
    {  
        // Step 1: Reserve Funds  
        var reservationId = await context.CallActivityAsync<Guid>(  
            "ReserveFundsActivity",  
            paymentRequest);  

        // Step 2: Charge Customer  
        var chargeId = await context.CallActivityAsync<string>(  
            "ChargeCustomerActivity",  
            paymentRequest);  

        // Step 3: Update Inventory  
        await context.CallActivityAsync(  
            "UpdateInventoryActivity",  
            paymentRequest.Items);  

        // Step 4: Send Confirmation  
        await context.CallActivityAsync(  
            "SendConfirmationActivity",  
            paymentRequest.UserEmail);  

        return "Payment Completed Successfully";  
    }  
    catch (Exception ex)  
    {  
        // Compensate for completed steps  
        var sagaContext = new SagaContext  
        {  
            ReservationId = reservationId,  
            ChargeId = chargeId,  
            PaymentRequest = paymentRequest  
        };  

        await Compensate(context, sagaContext);  
        return "Payment Failed - Compensated";  
    }  
}  

private static async Task Compensate(  
    IDurableOrchestrationContext context,  
    SagaContext sagaContext)  
{  
    // Reverse steps in reverse order  
    if (sagaContext.ChargeId != null)  
        await context.CallActivityAsync("RefundChargeActivity", sagaContext.ChargeId);  

    if (sagaContext.ReservationId != Guid.Empty)  
        await context.CallActivityAsync("ReleaseFundsActivity", sagaContext.ReservationId);  
}

Step 2: Implement Compensating Actions

Release Reserved Funds

[FunctionName("ReleaseFundsActivity")]  
public static async Task ReleaseFunds(  
    [ActivityTrigger] Guid reservationId,  
    ILogger log)  
{  
    await _bankService.ReleaseReservationAsync(reservationId);  
    log.LogInformation($"Released funds for reservation {reservationId}");  
}

Refund Customer Charge

[FunctionName("RefundChargeActivity")]  
public static async Task RefundCharge(  
    [ActivityTrigger] string chargeId,  
    ILogger log)  
{  
    await _paymentGateway.RefundAsync(chargeId);  
    log.LogInformation($"Refunded charge {chargeId}");  
}

Key Features

1. Compensation Guarantees

  • Atomicity: Each step is undone if subsequent steps fail.

  • Order: Compensate in reverse order (e.g., refund before releasing funds).

2. Retry Policies

Add retries to transient steps (e.g., payment gateway timeouts):

var retryOptions = new RetryOptions(  
    firstRetryInterval: TimeSpan.FromSeconds(2),  
    maxNumberOfAttempts: 3);  

await context.CallActivityWithRetryAsync(  
    "ChargeCustomerActivity",  
    retryOptions,  
    paymentRequest);

3. Idempotency

  • Use idempotency keys to avoid duplicate charges/refunds.

  • Track processed requests in Azure SQL or Cosmos DB:

var isDuplicate = await _repository.Exists(request.IdempotencyKey);  
if (isDuplicate) throw new DuplicateRequestException();

Best Practices

1. Logging & Monitoring

  • Log saga state changes to Application Insights.

  • Use Durable Functions HTTP APIs to query orchestration status.

2. Alerting

  • Trigger alerts for:

    • Uncompensated failures.

    • Prolonged orchestrations.

3. Testing

  • Chaos Engineering: Use Azure Fault Injection Studio to simulate failures.

  • Unit Tests: Mock compensating actions.

4. Tooling

  • Azure Service Bus: For event-driven compensation triggers.

  • Azure Logic Apps: For human-in-the-loop approvals (e.g., manual refunds).


Real-World Use Case: Travel Booking Platform

Problem: Flight bookings failed after hotel reservations succeeded, leaving customers charged without bookings.

Solution:

  1. Saga Orchestrator:

    • Reserve Hotel → Reserve Flight → Charge Customer.

    • Compensate: Cancel Flight → Cancel Hotel → Refund.

  2. Results:

    • Reduced customer complaints by 90%.

    • Automated recovery for 95% of partial failures.


Saga vs. Other Patterns

PatternUse CaseProsCons
SagaDistributed transactionsNo distributed locks, scalableComplex compensation logic
2PCACID transactions (single database)Strong consistencyPoor scalability, blocking
Event SourcingAudit trails, replay abilityTemporal debuggingHigh storage costs

When to Use the Saga Pattern

  • Multi-Service Transactions: Payments, order fulfillment, travel bookings.

  • Eventual Consistency: Systems where temporary inconsistency is acceptable.

  • Long-Running Processes: Transactions spanning minutes/hours.


Conclusion

The Saga Pattern, combined with Azure Durable Functions, provides a robust way to handle partial failures in payment systems. By automating compensation and leveraging Azure’s serverless ecosystem, you can build resilient, self-healing workflows that maintain data consistency without monolithic transactions.