Decoding Solana Data

Ozgur Akkurt, Yule Andrade | 2025-05-13

Cherry is fully open-source. Dive into the code, browse the docs, or contribute at https://github.com/steelcake/cherry.

If you're familiar with Ethereum (EVM), decoding transaction data feels almost effortless. You get a standardized ABI, consistent function and event signatures, and plenty of tooling support. Need to understand what are the inputs in a transaction? Just look up the function signature. Solana, however, is a very different landscape. There's no universal ABI, no standard for encoding instructions, and no built-in way to decode transaction data — even if you have the program's IDL.

At Cherry, we faced this problem head-on. Supporting EVM and Solana on our platform meant we needed Solana decoding to be as seamless and reliable as Ethereum. So we built it. Cherry in cherry-core includes a Rust/Python library that lets you decode raw Solana instruction data at runtime using user-defined instruction signatures. It speaks Arrow in and out, making it plug-and-play with Cherry's data pipeline.

Decoding 101

Decoding is the process of converting raw, serialized data—such as transaction inputs, logs, or instruction payloads—into a human-readable format. During serialization, values are packed into a byte stream, stripping away critical context like function names, parameter names, types, and how the data should be interpreted. Decoding restores that context, making sense of each field and ensuring proper type casting along the way.

To decode data correctly, you need a translation guide—a precise specification of the data's structure.

On Ethereum

This translation guide is standardized as an ABI (Application Binary Interface). The ABI describes how external applications interact with smart contracts and, in addition to other information, provides the function and event signatures needed to decode inputs and outputs. For example, the following signature defines the data structure of a Transfer event:

Transfer(address indexed from, address indexed to, uint256 amount)

Breakdown:

1st value – "from"; type - address (20 bytes), indexed
2nd value – "to"; type - address (20 bytes), indexed
3rd value – "amount"; type - uint256 (32 bytes), not indexed, stored in the data field

On Solana

Solana takes a more flexible and chaotic approach to instruction encoding. Unlike Ethereum, which uses a standardized ABI to describe how smart contracts serialize and deserialize data, Solana leaves that responsibility entirely up to each program. Unless the program author provides enough information, there's no standard ABI, no consistent encoding scheme, and no built-in way to decode instruction data.`

Some programs—particularly those built with the Anchor framework—automatically generate an IDL (Interface Definition Language file). IDLs describe a program's public interface: its instructions, arguments, accounts, return types, and events. They serve a similar purpose to Ethereum's ABI, acting as a bridge between raw bytes and meaningful structure. However, unlike ABIs, IDLs are not universally adopted in the Solana ecosystem. And even when available, they often lack the detail needed for complete runtime decoding—especially when programs use custom serialization or non-standard patterns.

This lack of a universal guide makes decoding Solana data more challenging—especially when building general-purpose data pipelines or explorers—but it's still entirely possible with the proper tooling.

Cherry SVM decoding

Part of Cherry's cherry-core toolkit, cherry-svm-decode, is a Rust library with Python bindings that decodes raw Solana instruction data at runtime using user-defined instruction signatures.

These signatures objects serve as decoding blueprints: they describe the expected structure of an instruction — including the name, type, and length of each parameter — allowing the pipeline to parse and interpret the raw byte data reliably.

Users can construct these signatures by gathering information from a variety of sources:

Published IDLs (when available)
Program source code (typically in Rust)
Manual inspection of raw instructions on a Solana explorer

Here's an example of an instruction signature for decoding Jupiter swap instructions. We'll break down each part below:

instruction_signature = InstructionSignature(
    discriminator="0xe445a52e51cb9a1d40c6cde8260871e2",
    params=[
        ParamInput(
            name="Amm",
            param_type=FixedArray(DynType.U8, 32),
        ),
        ParamInput(
            name="InputMint",
            param_type=FixedArray(DynType.U8, 32),
        ),
        ParamInput(
            name="InputAmount",
            param_type=DynType.U64,
        ),
        ParamInput(
            name="OutputMint",
            param_type=FixedArray(DynType.U8, 32),
        ),
        ParamInput(
            name="OutputAmount",
            param_type=DynType.U64,
        ),
    ],
    accounts_names=[],
)

An example of Cherry's Python Instruction Signature

The same Event Instruction in Solscan

Discriminators

A discriminator is a fixed sequence of bytes at the beginning of serialized data that identifies which instruction, struct, or event the data represents. During decoding, the discriminator matches raw data to the correct signature definition.

Discriminators are one of the most challenging parts to reverse-engineer because Solana has no standard for defining them. Here are some common patterns observed in real-world programs:

Sequential values: Some programs use simple, ordered values (e.g., 0x00, 0x01, 0x02, ...) as discriminators.
Anchor conventions: Anchor programs typically use the first 8 bytes of the SHA-256 hash of a struct name as the discriminator, ensuring uniqueness.
Nested Anchor logs: Some Anchor-based programs use a two-level discriminator — the first 8 bytes identify a CPI log instruction, and the next 8 bytes identify a specific data structure inside the log (for a total of 16 bytes).
Completely custom formats: Some programs define arbitrarily structured discriminators that don't follow any public pattern.

If you can reliably identify a particular instruction from observed transactions, you may be able to deduce its discriminator by finding repeated byte sequences at the start of the instruction data.

Params

The params field in the signature defines the expected values within the instruction data — in the exact order they appear. Each param can include a name, a type, and in the case of composite types, a list of fields or variants. These parameters are ordered and interpreted sequentially during decoding.

Supported types include:

Primitives: Uint, Int, and Bool
Complex types:
- FixedArray: A fixed-length array of another type (e.g., Public keys, for example, are 32 bytes (or u8) [u8; 32].)
- Array: A dynamic-length array. Data are prefixed with a length indicator to determine how many elements to decode.
- Struct: A composite of keys - value types (like a dictionary)
- Enum: A type representing one of several variants. Variant may optionally carry its own nested value.
- Option: A nullable value that either holds a nested type or is empty.

All complex types can be nested arbitrarily — for example, an array of structs, an option of an enum, or a struct containing other structs.

Accounts Names

In Solana, each instruction includes a list of accounts it interacts with, passed as a separate data structure from the instruction data itself. The accounts_names field allows you to assign meaningful names to these account indices, making decoded output easier to read and analyze.

While the decoder doesn't interpret account data contents, having named accounts helps clarify the role each address plays in the instruction (e.g., "user", "token_account", "vault", etc.).

Pipeline decoding

In cherry-core, decoding becomes a simple function call once a signature is defined. You pass in an input containing raw Solana instruction data (in Apache Arrow RecordBatch format, a table-like data structure), and the decoder returns a new RecordBatch with structured, typed values.

(function) def svm_decode_instructions(
    signature: InstructionSignature,
    batch: RecordBatch,
    allow_decode_fail: bool = False
    ) -> RecordBatch

Cherry supports a sequence of built-in transformation steps to make this even easier in pipelines. These steps abstract away everyday tasks and can be composed declaratively. Users can define a list of operations, which are then applied sequentially during pipeline execution.

steps = [
    cc.Step(
        kind=cc.StepKind.SVM_DECODE_INSTRUCTIONS,
        config=cc.SvmDecodeInstructionsConfig(
            instruction_signature=instruction_signature,
            hstack=True,
            allow_decode_fail=True,
            output_table="jup_swaps_decoded_instructions",
        ),
    ),
    cc.Step(
        kind=cc.StepKind.BASE58_ENCODE,
        config=cc.Base58EncodeConfig(),
    ),
    ...

These steps allow decoded instruction data to be joined, transformed, and stored with minimal manual intervention — streamlining Solana data workflows.

Putting It Together

Decoding Solana instruction data involves more effort than on Ethereum, mainly due to the absence of a standardized source of information such as the EVM's ABI and the wide variety of encoding formats used by different programs. While Anchor-based programs may provide helpful metadata through IDLs, many programs lack published or complete interfaces. As a result, decoding often involves reverse-engineering types and structures from multiple sources.

cherry-core simplifies this process by offering a runtime decoding engine powered by user-defined signatures. Once defined, these signatures can be reused across pipelines and shared across projects. Combined with Cherry's transformation system, this enables structured, repeatable, and scalable decoding workflows — even in a complex ecosystem like Solana.

What's Next?

👉 Full example of decoding Jupiter swaps using cherry

We now have the ability to build a database of instruction signatures, enabling us to reliably decode any Solana program for which we have a signature. While the format isn't as compact as EVM's, it's flexible and robust.

Our next goals:

Convert IDL files into instruction signatures, allowing us to reuse well-structured, existing IDLs.
Build a public, open database of signatures so anyone can decode Solana data easily using Cherry — without needing to define or search for signatures manually.