Blockchain Indexer: An Efficient Data Retrieval Solution Beyond RPC

2025-07-05 03:18:32

The Evolution of Blockchain Data Retrieval: Indexers and Their Applications

The Importance of Blockchain Data

Data is the core of Blockchain technology and the foundation for developing decentralized applications ( dApp ). Current discussions mostly focus on data availability ( DA ), which ensures that network participants can access recent transaction data for verification. However, another equally important yet often overlooked aspect is data accessibility.

In the era of modular Blockchain, DA solutions have become an indispensable part. These solutions ensure that all participants can access transaction data, thus enabling real-time verification and maintaining the integrity of the network. However, the DA layer is more like a billboard rather than a database. This means that data is not stored indefinitely, but rather deleted over time, just as posters on a billboard are eventually replaced by new ones.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting blockchain analysis. This aspect is particularly important for tasks that require access to historical data to ensure accurate representation and execution. Although discussions about data accessibility are less frequent, it is as important as data availability. Both play different but complementary roles in the blockchain ecosystem, and a comprehensive data management approach must address both issues simultaneously to support robust and efficient blockchain applications.

Traditional Blockchain Data Retrieval Methods

Since its inception, Blockchain has fundamentally changed infrastructure, promoting the creation of decentralized applications such as games, finance, and social networks dApp(. However, building these dApps requires access to a large amount of Blockchain data, which is both difficult and expensive.

For dApp developers, one option is to host and run their own archive RPC nodes. These nodes store all historical blockchain data from the genesis Block, allowing for complete data access. However, maintaining archive nodes is costly, and their querying capabilities are limited, making it difficult to query data in the format needed by developers. Running cheaper nodes is another option, but these nodes have limited data retrieval capabilities, which may affect the operation of the dApp.

Another approach is to use commercial RPC node providers. These providers are responsible for the cost and management of the nodes and provide data through RPC endpoints. Although public RPC endpoints are free, they have rate limits that can negatively impact the user experience of dApps. Private RPC endpoints offer better performance by reducing congestion, but even simple data retrieval requires substantial back-and-forth communication. This makes them inefficient when handling complex data queries. Furthermore, private RPC endpoints are often difficult to scale and lack compatibility across different networks.

Better Choice: Blockchain Indexer

Blockchain indexers play a key role in organizing chain data and sending it to databases for easier querying, which is why they are often referred to as "the Google of blockchain." Their operation involves indexing blockchain data and making it readily available through APIs like GraphQL using a SQL-like query language ). By providing a unified data query interface, indexers allow developers to quickly and accurately retrieve the information they need using standardized query languages, greatly simplifying the entire process.

Different types of indexers optimize data retrieval in various ways:

Full Node Indexer: These indexers run full Blockchain nodes and extract data directly from them, ensuring data integrity and accuracy, but require significant storage and processing power.
Lightweight Indexers: These indexers rely on full nodes to fetch specific data as needed, thereby reducing storage requirements but potentially increasing query time.
Specialized Indexers: These indexers are optimized for certain types of data or specific blockchains, providing more efficient retrieval for specific use cases ( such as NFT data or DeFi transactions ).
Aggregated Indexers: These indexers extract data from multiple blockchains and sources, including off-chain information, providing a unified query interface, which is particularly useful for multi-chain dApps.

Ethereum alone requires 3TB of storage space, and as the Blockchain continues to grow, the data storage volume of Erigon archive nodes will also continue to increase. The indexer protocol has deployed multiple indexers that can efficiently index and query large amounts of data at high speed, which is not achievable by RPC.

Indexers also allow for complex queries, easy filtering of data based on different criteria, and extraction for subsequent analysis. Some indexers can also aggregate data from multiple sources, avoiding the need to deploy multiple APIs in multi-chain dApps. By being distributed across multiple nodes, indexers provide enhanced security and performance, while RPC providers may experience interruptions and downtime due to their centralized nature.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while also reducing the cost of deploying a single node. This makes the Blockchain indexer protocol the preferred solution for dApp developers.

Application Scenarios of Indexers

As mentioned earlier, building a dApp requires retrieving and reading Blockchain data in order to operate its services. This includes various types of dApps, such as DeFi, NFT platforms, games, and even social networks, as these platforms need to read data first in order to execute subsequent transactions.

( DeFi

DeFi protocols require different information to quote specific prices, rates, fees, etc. The automated market maker )AMM### needs price and liquidity information from certain liquidity pools to calculate swap rates, while lending protocols need utilization rates to determine lending rates and liquidation debt ratios. It is essential to input this information into the dApp before calculating the rates executed by users.

( Game

GameFi requires quick indexing and access to data to ensure a smooth gaming experience for users. Only through rapid data retrieval and execution can Web3 games compete in performance with Web2 games, thereby attracting more users. These games need data such as land ownership, in-game token balances, and in-game operations. By using indexers, they can better ensure a stable data flow and consistent uptime to provide a perfect gaming experience.

) NFT

NFT markets and lending platforms require indexed data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid browsing each NFT individually to find ownership or NFT attribute data.

Whether it is the DeFi automated market maker ###AMM### that requires price and liquidity information, or a social application that needs to update new user posts, the ability to quickly retrieve data is crucial for the normal operation of dApps. With the help of indexers, they can efficiently and accurately retrieve data, thereby providing a smooth user experience.

( Analysis

The indexer provides a method to extract specific data from the raw Blockchain data ), including smart contract events within each Block ###. This offers the opportunity for more specific data analysis, thus providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and which tokens incur fees, thus deciding whether to list these tokens as perpetual contracts on their platform. DEX developers can create dashboards for their products to gain insights into which liquidity pools offer the highest returns or strongest liquidity. They can also create public dashboards that allow developers to freely and flexibly query any type of data they want to display on the charts.

Due to the availability of multiple blockchain indexers, identifying the differences between indexing protocols is crucial to ensure that developers choose the indexer that best suits their needs.

Overview of Main Blockchain Indexers

( The Graph

The Graph is one of the earliest indexing protocols launched on Ethereum, enabling easy queries of previously hard-to-access transaction data. It uses subgraphs to define and filter subsets of data collected from the Blockchain, such as all transactions related to a specific liquidity pool.

Using index proof, indexers stake the native token GRT for indexing and query services, and delegators can choose to stake their tokens here. Curators can access high-quality subgraphs to assist indexers in determining which subgraphs to compile data for in order to earn the best query fees. In the transition towards greater decentralization, The Graph will eventually cease its hosting services and require subgraphs to upgrade to its network, while providing upgraded indexers.

Its infrastructure brings the average cost per million queries to $40, which is much lower than the cost of self-hosted nodes. By using file data sources, it also supports parallel indexing of on-chain and off-chain data for efficient data retrieval.

The rewards for indexers of The Graph have been steadily increasing over the past few quarters. This is partly due to the increase in query volume, but also attributed to the rise in token prices, as they plan to integrate AI-assisted queries in the future.

) Subsquid

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates a large amount of on-chain and off-chain data and protects it through zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing data from a specific block subset, accelerating the data retrieval process by quickly identifying the nodes that hold the required data.

Subsquid also supports real-time indexing, allowing for indexing before a Block is finalized. It also supports storing data in formats chosen by developers, making it easier to analyze using tools like BigQuery, Parquet, or CSV. Additionally, subgraphs can be deployed on the Subsquid network without the need to migrate to the Squid SDK, enabling no-code deployment.

Despite still being in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, more than 60,000 Squid indexers deployed, and over 20,000 verified developers on the network. Recently, Subsquid launched the mainnet of its data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analytics, ZK/TEE co-processors, AI agents, and Oracles.

SubQuery

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. It originally supported the Polkadot and Substrate networks and has now expanded to include over 200 chains. Its operation is similar to The Graph, which uses indexed proofs; indexers index data and provide query requests, while delegators stake shares to the indexers. However, it introduces consumers to submit purchase orders to ensure the income of the indexers is guaranteed, rather than the managers.

It will introduce SubQuery data nodes that support sharding to prevent continuous synchronization of new data between each node, thus optimizing query efficiency while moving towards greater decentralization. Users can choose to pay a computation fee of about 1 SQT token for every 1000 requests or set custom fees for the indexer through the protocol.

Although SubQuery launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value on a month-to-month basis, which represents a continuous increase in the number of query services offered on its platform. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth of its network participation.

Covalent

Covalent is a decentralized indexing network, created by block sample producers ###BSP### network nodes through bulk export to create copies of blockchain data, and publish proofs on the Covalent L1 blockchain. This data is then refined by block result producers (BRP) nodes according to set rules, filtering out data that meets the requirements.

Through a unified API, developers can easily extract relevant Blockchain data in a consistent request and response format without the need to write custom complex queries to access the data. The CQT token, settled on Moonbeam, can be used as a payment method to extract these pre-configured data sets from network operators.

The rewards of Covalent seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of Covalent token CQT.

Factors to Consider When Choosing an Indexer

( number

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

11 Likes