Juspay's Orchestrator and Merchant controlled Routing Engine
  • Routing engine is a core component of an orchestration platform. Juspay’s routing engine has 2 major workflows - rule-based ordering and dynamic gateway ordering
  • 95%+ merchants use rule-based ordering, giving them full control over routing traffic by defining rules via the Juspay’s routing software - based on their business priorities
  • Few merchants use dynamic gateway ordering workflow which is engineered by Juspay to optimize transaction success rate
  • In both the routing workflows, merchants have full control & visibility on how their payment routing is being routed across various gateways

Payment Orchestration is a sophisticated operating system that connects with the merchant’s tech infrastructure, providing a unified layer that manages the entire payment lifecycle, from checkout to reconciliation. It includes seamless checkout experience, no-code integrations with various gateways of merchant’s choice, routing transactions between those gateways based on factors like cost, success rates, performance, contractual commitments etc. An orchestration partner further empowers merchants with services like tokenisation & card vault, recurring payments setup and execution across PSPs, unified analytics, reconciliation and much more. In a nutshell, payment orchestration providers serve as strategic partners to merchants' payment teams, streamlining the adoption of latest payment innovations in the ecosystem.

All these aspects of orchestration come together to empower merchants with the flexibility to enhance and optimize their payments to better meet their business objectives like improving conversions, and optimize payment fees and operational costs.

This blog deep dives into one critical aspect of orchestration - Transaction Routing. We will explore the science & engineering behind this routing engine.

Understanding Transaction Routing

Image describing transaction routing

Payment routing is one of the core and critical aspects of payment orchestration that drives business results for merchants. This helps route transactions to the most suitable payment gateway based on success rate, latency and other business requirements. Furthermore, optimizing payment routing enhances customer experiences by reducing transaction failures and processing delays, increasing customer satisfaction, trust, and repeat business.

The Building Blocks of Juspay's Payment Routing Engine

1. Eligibility Check

This step acts as the initial filter in Juspay's Routing process. It evaluates each Gateway integrated with the merchant against predefined eligibility criteria configured by the merchant in Juspay's dashboard. The result is the list of gateways eligible for the specific transaction flow initiated by the customer.

Some key eligibility criteria for merchants include enabling specific payment instruments on the Payment Gateway, configuring rules based on card issuers, and applying custom settings like geography or currency restrictions and options for EMI / installment payments, authentication modes, and split settlements.

1.1 Importance of Eligibility Check:

By rigorously filtering gateways based on these criteria, the Eligibility Check ensures that only the eligible gateways are considered for processing the transaction. This significantly reduces the risk of payment failures, such as declined transactions or processing errors, leading to a smoother and more efficient payment experience for merchants and customers.

After the Eligibility Check, the next step in the Routing process involves selecting the optimal routing path for the transaction.

2. Rule-Based Ordering

Rule-based ordering relies on predefined rules set by merchants to determine the preferred payment gateway for each transaction. The routing path for each transaction is defined by these sets of rules, making the process highly predictable.

2.1 Key Characteristics:

  • Effective for business commitments: Rule-based ordering is beneficial for businesses because it ensures they meet specific commercial obligations to designated gateways. This approach maintains minimum transaction volumes and efficiently directs traffic to particular payment gateways (PGs).
  • "If-Else" Logic: The logic behind rule-based ordering often resembles a series of nested "if-else" statements, where each condition triggers a specific routing decision.

Example:

  • Rule 1: If the transaction currency is USD and the payment instrument is a Card, route it to Gateway A.
  • Rule 2: If the payment instrument is a Card and card type is CREDIT issued by Bank X, route 90% of traffic to Gateway B and 10% of traffic to Gateway C.
  • Rule 3: If the payment method is Net Banking using Bank Y, route it to Gateway C.
  • Rule 4: If the payment instrument is Wallet, route it to Gateway D.
  • Rule 5: If none of the above rules matches, route it to Gateway E as a default.

Example of rule based routing

2.2 Rule-Based Ordering Workflow

The end-to-end workflow of Rule-Based ordering is as follows:

Step 1: Eligibility check- Filtering the Payment Gateways basis pre-defined eligibility criteria (explained above).

Step 2: Rule-Based Ordering- The eligible Payment Gateways are re-ordered on the basis of the rules set by the merchant.

Step 3: Downtime Detection is done across all these Payment Gateways. In the event of a downtime, the next Payment Gateways in the merchant defined order is selected for routing traffic after this step. The methodology behind this downtime detection step is explained below (in section 4).

Step 4: Cascading retries are added as a fail-safe mechanism in case the payment still fails in the last leg. This step immediately retries the same transaction with subsequent Payment Gateway in the order.

Workflow of rule-based ordering

While rule-based ordering provides a structured approach, merchants sought a more adaptive and dynamic solution to optimize transaction success rates. This led to the evolution of Dynamic Ordering within the Juspay platform.

3. Dynamic Gateway Ordering

Dynamic gateway ordering is an alternative to the rule-based ordering described above. Unlike rule-based ordering, dynamic ordering inputs real-time success rates into consideration for each combination of payment instruments, type of transaction, Network, Platform, transaction origin country, etc., without extra manual effort. Merchants can extend criteria to additional fields based on their requirements.

Dynamic Gateway Ordering leverages advanced concepts of Reinforcement Learning and Statistical Distribution, enabling real-time success rate optimization by routing the transaction to the most optimal PG.

The problem of selecting the best Gateway can be mapped to a Non-stationary Multi-Armed Bandit (MAB) problem with Delayed Feedback, where each Gateway is an "arm" with fluctuating success rates and varying latency for success and failure. The approach used to solve this problem is driven by explore-exploit strategy. This method takes a two-pronged approach:

Exploration: We continuously evaluate all gateways by sending a small percentage of traffic to ensure up-to-date performance data.

Exploitation: We continuously route most traffic to the best-performing Gateway to maximize the overall success rate.

The algorithm uses a sliding window technique to assess the success rates of each Gateway's last few transactions. This ensures that, without downtime, the highest-ranked Gateway is chosen for transaction routing.

Illustration of sliding window technique

Key parameters that control this behaviour are:

Window Size: The number of transactions considered for computing the Gateway's success rate. This parameter affects the system's responsiveness and stability.

Exploration Factor: This number determines the percentage of traffic allocated to exploration, ensuring fairness and avoiding starvation of lower-ranked gateways.

The window size & exploration factor is determined by traffic volume and the average success rate. The gateways are ordered based on a selection process to optimize their success rates.

Special cases:

  • Starvation problem: When a gateway becomes the best-performing option, the system routes all traffic, stopping traffic to other gateways. To avoid this - The system continuously evaluates all gateways by allocating a small percentage of transactions for each Gateway (approx 5-10%). This ensures that the performance of all gateways is monitored and up-to-date.
  • Long-term data: Long-term data often becomes less reflective of current conditions and less responsive to recent issues or fluctuations in performance. Unlike traditional approaches relying on historical data, the system has adopted Reinforcement Learning, which continuously learns and updates ordering decisions to handle rapid changes effectively.

Transactions via Dynamic Ordering follows this workflow:

Workflow of dynamic gateway ordering

4. Downtime Detection

The downtime detection mechanism uses a "reward" and "penalize" feedback loop inspired by the Proportional–integral–derivative (PID) controller to maintain health scores of underlying payment gateways.

If the score for any gateway drops below the merchant-configured threshold, the gateway is classified as "down".

In case of any downtime, if a merchant uses rule-based ordering, the gateways are re-ordered on the basis of this downtime detection mechanism. The next PG in the merchant defined order is selected for routing traffic after this downtime detection step.

In case of Dynamic Ordering, the cost of exploration becomes high when the payment gateway faces downtime. Exploration is therefore stopped for that Gateway for a specific time interval, aka cool-off period. After this cool-off period, the routing system re-evaluates the Gateway for further exploration by allowing gateways to process a limited number of transactions for exploration purposes. If the underlying issue persists even after routing these small number of transactions, the Gateway is classified "down" rapidly.

By analyzing ecosystem health across multiple merchants, global downtime detection becomes even more significant. This adds value for merchants who might otherwise struggle to identify such issues promptly within their ecosystem. Integration with a payment orchestrator enables the routing engine to leverage collective performance data across merchants, significantly enhancing the robustness and reliability of downtime detection systems.

5. Cascading retries

This is the last mile of the transaction routing process. In cases of both rule based ordering as well as dynamic ordering, the transactions may fail at the last leg (even after health-check step). This happens due to system issues, timeouts, latency etc.

Cascading retries are added as a fail-safe mechanism to immediately retry the same transaction with subsequent PG in the order. This salvage mechanism happens in the back-end, without explicit retry from the customer’s end.

Workflow for cascading retries

Case studies: Experimental Result

To assess the effectiveness of the implemented dynamic ordering mechanism, Juspay conducted an in-depth analysis of its performance across multiple categories and dimensions. The results demonstrate how dynamic ordering improves overall performance by comparing dynamic ordering with traditional rule-based ordering. The following sections present and discuss the outcomes, offering valuable insights into the system's performance and practical impact.

Rule-Based vs Dynamic Gateway Ordering Success Rates over time

Comparison between rule-based and dynamic ordering


The above graph compares success rates of rule-based and dynamic ordering success rates over 21 days for one of India's most prominent quick-commerce merchants, highlighting dynamic ordering’s superior performance and stability. Dynamic ordering maintained an average success rate of 83.19%, adapting to real-time conditions to optimize performance, while rule-based ordering averaged 82.60% and showed more significant fluctuations. Notably, on day 9, rule-based ordering lags by 5% as one of the gateway’s success rates is low but not below the threshold set by the merchant. Dynamic ordering offers higher success rates and fewer variations, leading to increased resilience and efficiency.

Business Impact of Real-Time Downtime Detection

Gateway health scores

A significant downtime event began at 01:24 AM on 7th Dec 2024, as shown above. The routing engine detected the issue within a minute, marked by a sharp drop in the gateway health score. Swift action rerouted ~8,000 transactions from Gateway A to alternative Gateways B,C and D, as shown in the table below. This proactive response minimized disruptions and added ~$15,000 in GMV, for a single merchant from the online gaming industry, in those 2.5 hours, demonstrating the system's capability to adapt to critical events and maintain business continuity.

Numeric representation of traffic distribution to alternate gateways

Conclusion

In conclusion, both rule-based and dynamic ordering approaches serve distinct and valuable purposes in modern payment orchestration. Rule-based ordering provides merchants with predictable, controlled transaction flows that are particularly valuable for meeting specific business commitments and handling special use cases that require deterministic routing decisions. On the other hand, dynamic ordering leverages real-time data and mathematical concepts to optimize success rates and adapt to changing conditions in the payment ecosystem.

95%+ of Juspay's merchants use rule-based ordering to manage their payment traffic across payment gateways, as it effectively meets their core business requirements. The remaining merchants, typically those processing high transaction volumes, opt for dynamic gateway ordering as their scale justifies the application of this algorithm.

A few merchants leverage the strength of both methods across payment instruments, tailoring their routing strategies to specific needs and priorities.

Juspay's orchestration platform provides the flexibility to seamlessly blend these approaches, empowering businesses to achieve the optimal balance between control and performance.