Published on

Cutting Costs on a Blockchain Indexer: What I Learned Building a Real-Time DEX Analytics Stack

Authors

If you have built high-throughput backend systems before, a blockchain indexer will feel familiar, until it does not. The patterns are similar to any event-driven pipeline, but the cost model has some sharp edges that are not obvious coming in.

This post covers what actually hurt us, and what fixed it, while building a Go-based indexer tracking swaps and liquidity across multiple chains.

The Core Problem: RPC Calls Are Your New Database Round-Trips

In a traditional backend, you optimize database queries. In a blockchain indexer, your equivalent bottleneck is RPC calls, requests to a node (your own or a hosted provider like Alchemy/Infura) to read chain data.

The insidious part is that each call is cheap and reasonable in isolation. Together, they silently inflate your bill and add latency you cannot explain.

Here is the pattern every indexer falls into. You subscribe to on-chain events (swap logs, liquidity changes), then for each event you make a cascade of follow-up calls:

  • eth_getTransactionByHash to get who actually sent the transaction
  • eth_getBlockByNumber to get the block timestamp (events do not include it)
  • eth_call to fetch pool metadata (which tokens are in the pool)
  • External price API to convert token amounts to USD
  • DB lookups to check if this token/pool has been seen before

None of those are wrong. But synchronous, per-event enrichment at scale is death by a thousand paper cuts. A bursty block comes in, your pipeline backs up, and suddenly you are "near real-time" in the same way a traffic jam is "near the destination."

What Actually Helped

1. Filter at the source

Use eth_subscribe with specific event topics rather than subscribing to broad log streams and filtering in your application.

This is the blockchain equivalent of pushing predicates into your SQL query rather than fetching everything and filtering in memory.

2. Cache the hot metadata

Most of the follow-up calls above are fetching data that barely changes or repeats constantly. Block timestamps, pool token pairs, transaction senders - cache them with short TTLs.

You will slash repeat RPC calls during bursty blocks dramatically. This is standard caching discipline, but it matters more here because the "DB" (the node) charges per query.

3. Targeted fallback instead of full scans

For liquidity tracking specifically, we had a fallback job that would periodically scan all pools to catch anything missed by the event stream. It was a correctness safety net, but it was running too broadly.

The fix: mark pools as "dirty" when events touch them, and only include dirty or stale pools in the fallback sweep. Event-driven path does the work, fallback only fills gaps.

Classic pattern, apply it here.

4. Instrument by RPC method, not just "RPC calls"

You cannot optimize what you cannot see. Track call volume, error rate, and latency broken down by method (eth_getBlockByNumber, eth_getTransactionByHash, etc.) and by chain.

You will immediately see which calls are dominating, and you will catch spikes during volatile periods that you would otherwise only notice on your bill.

Gotchas That Will Catch You Off Guard

Transaction signing schemes vary by chain and tx type

If you are recovering the sender address from the transaction (rather than trusting the event's msg.sender), you need to use the correct signer for the transaction type. Ethereum has multiple transaction formats (legacy, EIP-1559, etc.), and some L2s have their own variants.

If your signer logic is wrong or incomplete, it will panic or silently fail on certain transactions. Keep this step out of the hot path. Fail soft and fall back to the event-provided sender rather than blocking ingestion.

Synchronous enrichment is a hidden queue

You might not have explicit timer-based delays in your code, but if you are doing 5 synchronous RPC calls per event, your effective throughput is:

1 / (5 x avg_RPC_latency)

Under load, this becomes a queue whether you designed one or not. Real-time systems degrade into queueing systems when enrichment work per event exceeds event arrival rate.

Write amplification is real

Per-event DB writes, updating indexer state, recording newly discovered tokens, newly discovered pools, add up fast. Batch or debounce these.

For example, writing indexer state once per block instead of once per trade is usually fine and dramatically reduces lock contention.

Overly broad fallback jobs are a hidden cost center

If your fallback/reconciliation job runs frequently and selects broadly ("all pools updated in the last hour"), it can dominate both DB and RPC budgets even when your primary event pipeline is healthy. Scope it down.

The Next Level: Micro-Batching

Once you have done the above, the next significant win is batching writes and resolution steps.

The pattern: instead of processing each decoded swap event immediately and independently, buffer incoming events per chain and flush every N records or T milliseconds (we used 100 records or 100ms as a starting point).

On each flush:

  • Collect all unique tokens and pools referenced in the batch
  • Resolve metadata for those unique items once, not once per trade
  • Bulk-insert all trades in a single round-trip

This is the same principle behind write batching in any high-throughput system, but it is especially effective here because the per-event discovery lookups ("is this token known?", "is this pool known?") collapse from N calls to at most 1 call per unique item per flush window.

The tradeoff is a small, predictable flush latency. For most analytics use cases, sub-200ms end-to-end is still real-time, and the throughput and cost improvement is significant.

Technical Debt to Name Explicitly

A few things worth tracking deliberately rather than letting accumulate invisibly:

  • Blocking calls in the hot path: Every synchronous network call inside event processing should be consciously categorized as critical (must block) or optional (can be done async or skipped). If you do not make this explicit, optional things quietly become critical through copy-paste.
  • Idempotency: As you move toward batching and retry logic, make sure your deduplication keys are consistent. For on-chain events, (txHash, logIndex, chainID) is a reliable composite key. Without idempotency guarantees, retries create phantom trades.
  • Error taxonomy: "Failed to process swap" as a single error category is useless. Separate decode failures, RPC failures, enrichment failures, DB failures, and publish failures. Each has different causes, different remediation, and different urgency.
  • Backpressure strategy: Define explicitly what happens when your downstream (DB, message queue) is slow. What is the queue depth limit? Do you drop? Do you retry? Do you alert?

Not having answers is fine early on. Not having thought about it is a liability.

The Mental Model

Think of an event-driven blockchain indexer as a streaming pipeline where your external API calls happen to be billed per request and have higher latency than a local DB.

Every optimization principle you know from building high-throughput backends applies: push filters upstream, cache aggressively, batch writes, instrument everything, avoid synchronous work in the hot path.

The blockchain-specific wrinkle is that the cost of a missed optimization is visible on an invoice and in your p95 ingest lag, not just in aggregate query time. That makes the feedback loop faster, which is actually useful once you know what to look for.