MegaETH Technical One-pager

tl;dr: MegaETH is real-time Ethereum. It provides a developer and user experience more akin to a modern computer while retaining every benefit and familiarity of Ethereum. MegaETH is enabled by two key techniques: node specialization, which concentrates the heavy lifting work to just a few nodes, and a hyper-optimized EVM execution environment that pushes throughput, latency, and resource efficiency closer to the hardware limit.

What is MegaETH?

MegaETH is an Ethereum layer-2 that, for the first time, provides sufficiently high throughput, low latency, and low cost to support sophisticated real-time decentralized applications. Expect a throughput of more than 100,000 transactions per second and a sub-millisecond latency.

How are existing “high-performance” chains doing?

Unfortunately, no existing chains are truly high-performance by the Web2 standard. They can deliver at most a few thousand TPS, a tiny fraction of what modern servers are capable of.

To understand how much performance is being sacrificed, let’s work out the blockchain throughput upper bound imposed by each of the three primary hardware resources: computing power, disk IO, and network bandwidth. First, executing transactions consumes CPU cycles. Today, EVM interpreters like evmone can already crunch >100k ERC20 transfers or >6k token swaps with a single CPU core, and almost all CPUs have multiple cores. Second, updates to the blockchain state must be written to disks. Given that a modern NVMe SSD can provide >300k random writes, one would expect the storage system to be able to handle >150k ETH transfers per second. Besides, it’s relatively easy to scale the total IOPS by adding more SSDs. Finally, propagating state updates consumes network bandwidth. Assuming that each state update can be encoded in 20 bytes, then a 100Mbps network connection can receive 625,000 unique updates (i.e., 312,500 ETH transfers) per second. In sum, our back-of-the-envelope calculation suggests that even a modest server can achieve 100k TPS.

So, why is there such a big gap between the potential and the reality? The short answer is software inefficiency. The execution clients of existing chains simply cannot effectively utilize the underlying hardware.

At MegaETH, we are pushing the envelope to close this gap. How do we achieve it? Let us dive in!

Node Specialization

Every blockchain has two key components: consensus and execution. Consensus (also known as sequencing) is the process of collecting users’ transactions and giving them an ordering. Execution is the process of executing transactions in that ordering to calculate the blockchain state such as account balance and storage. Both steps are crucial to every blockchain, L1 and L2 alike.

In an L1, every node serves the same role, so there is no specialization. Every node participates in a distributed protocol to reach consensus, and then executes every transaction locally. This is inherently inefficient since the same work is duplicated across all nodes. This also creates stragglers: some nodes will be slower than others because they may have worse hardware, internet, or luck; everyone else will have to slow down to wait for them, so the entire blockchain becomes slow.

Node specialization is the idea of concentrating the heavy lifting in consensus and/or execution to a small set of nodes. Because fewer nodes perform the work, less work is duplicated, and the blockchain becomes more efficient. More importantly, we can now focus on this small set of nodes and optimize their hardware and software to the extreme. For example, almost all L2s concentrate consensus—one or a few sequencer nodes decide the ordering of transactions, and the rest of the network listens.

However, reducing consensus overhead does not immediately result in a significant speedup, as evidenced by existing L2s, since the overall performance is now limited by execution. MegaETH is the first blockchain that doubles down on node specialization and concentrates execution as well. In MegaETH, at any given time, there is one active sequencer that executes every transaction; the rest of the network listens for state deltas (diffs) from this sequencer. Importantly, this is done without impacting the liveness and security of the network.

Hyper-optimizing the EVM execution environment

When a centralized sequencer executes every transaction, we can focus on it and heavily optimize its hardware and software. We aim to build the fastest world computer that excels at one thing—executing EVM transactions verifiably. And here is how.

Efficient state root updates. EVM differs from a generic virtual machine in that the world state is authenticated via a state trie such as MPT. Given the state root and a short storage proof, a light client can inspect any data within the EVM storage without trusting the full node that serves the data. However, updating the MPT is by far the biggest bottleneck in block production due to the excessive random disk IOs generated during tree traversal; this process now takes up >90% of the time in Reth’s live sync experiments. While several projects are independently optimizing the underlying databases to better serve the MPT, we ask ourselves: how can we design an optimal state trie from a clean slate to address the biggest bottleneck in block production? Our answer to this question is a novel data structure that (1) incurs the absolute minimum random disk IOs during updates, (2) scales smoothly to terabytes of state, and (3) has very efficient support for light clients. Cheap state root update is one of the most important innovations of MegaETH because now we can finally start to optimize EVM meaningfully (the optimizations below won’t matter if over 90% of the time is spent on updating the state root).

In-memory computing. Fetching account/contract states from the storage using the SLOAD opcode often slows down the execution. If the target state is not in memory, the EVM has to stall and wait for the data to load from disks. Unfortunately, SSD latency is in the order of 100 us, which is 10,000 times more expensive than most other opcodes. Unlike Solana, EVM does not know the states a transaction needs to read beforehand, so it cannot prefetch these states to hide the disk latency. However, since MegaETH requires only one sequencer to execute every transaction, we can afford to run the sequencer on a beefy server whose RAM is large enough to hold the entire blockchain state. The latest-generation server CPUs support 4 TB of RAM, and upcoming far memory technologies like Compute Express Link (CXL) will expand the capacity by over 10x. In comparison, the Ethereum state is just a little over 100 GB. That is, we can fit a blockchain state that is 40 times larger than Ethereum’s in a commodity server today and execute transactions without ever hitting the disk! This technology, known as in-memory computing, was pioneered in high-performance data-intensive applications in Web2. We are bringing the technology to Web3, enabling data-intensive dApps for the first time.

Compiling smart contracts. It’s no secret that performance is not EVM’s strong suit. This is understandable because VM was never a bottleneck; other factors, such as consensus, state accesses, and merkleization, had much larger impacts on the chain’s end-to-end performance. Recall that evmone can execute >6,000 token swaps per second with a single core, which is over 200x higher than Ethereum’s max TPS. However, as we make significant improvements on other components, the compute efficiency of EVM is becoming a limitation for MegaETH. To understand why EVM is slow, let’s walk through what happens when evmone tries to add two integers. Evmone decodes bytecode and jumps to the correct handler for the ADD operation; it pops two integers from EVM’s stack and adjusts the stack pointer accordingly; it performs a 256-bit arithmetic addition and pushes the result to the stack; finally, it deducts gas from the remaining balance and handles potential error conditions. So, a simple ADD operation in EVM translates to a long sequence of x86 instructions, containing expensive memory accesses and conditional jumps, while the real work is just adding up two big integers! The high costs of interpreting opcodes have become the most critical bottleneck of sequential computation in EVM. To tackle this problem, we plan to compile EVM bytecode to native machine code for execution, reducing the CPU instructions to a bare minimum. Our preliminary results suggest that some contracts may see a speedup of up to 100x. We are excited about this direction because it will enable a new generation of compute-intensive dApps that are unthinkable today!