The internal mechanism of Cairo 1.0: Exploring Sierra

Translator's Introduction#

Cairo is a Turing-complete ZK-friendly high-level language and the contract development language for Ethereum L2-Starknet, which is undergoing a revision upgrade. This article is the first in a series analyzing Cairo 1.0, where the author Mathieu discusses the design motivations and implementation principles of Sierra as an intermediate layer from the high-level language Cairo to Cairo assembly. The article mentions numerous issues present in Cairo 0, improvements in Cairo 1, and includes rich code details, recommending Cairo developers to read the full article for an in-depth understanding of Cairo 1.0.

Original link

TL;DR#

Sierra plays an important intermediary role between the high-level Cairo programming language and more primitive compilation targets (like CASM), ensuring that the generated CASM can run safely on Starknet. Its design is security-centric, using a set of functions to generate secure CASM code, combined with a powerful compiler and linear type system to prevent runtime errors, as well as a built-in Gas system to prevent infinite loops. In the following sections, we will focus on understanding the structure of Sierra programs, providing the basic requirements needed to read and understand Sierra programs.

Introduction#

I (Mathieu) recently attended two sessions at Starkware Sessions, namely Shahar Papini's “Enforcing Safety Using Typesystems” and Ori Ziv's “Not Stopping at the Halting Problem”. If you want to learn more about the Cairo stack, I highly recommend watching these videos. The following article is the first in a series where I will delve into Sierra to better understand Cairo, its mechanisms, and the entire Starknet.

Sierra (Safe Intermediate Representation) is an intermediate layer between the high-level language Cairo and compilation targets like Cairo Assembly (CASM). The language is designed to ensure safety and prevent runtime errors. It uses a compiler to detect operations that may fail at compile time, ensuring that every function returns and that there are no infinite loops. Sierra employs a simple yet powerful type system to express intermediate layer code while ensuring safety. This allows for efficient compilation into CASM.

Motivation#

In Cairo 0, developers would write Starknet contracts using Cairo, compile them into CASM, and directly deploy the compiled output to Starknet. Users interact with Starknet contracts by calling smart contract functions, signing transactions, and sending them to the sequencer. The sequencer runs the transactions to collect user transaction fees, and the prover (SHARP) generates ZK proofs for batches that include these transactions, with the sequencer collecting transaction fees that include the transactions.

Cairo 0 Transaction Flow

However, this Cairo 0 process presents several issues:

Only valid statements in Cairo can be proven, so failed transactions cannot be proven. Invalid statements, such as assert 0 = 1, cannot be proven as they translate into unsatisfiable polynomial constraints.
Transaction execution may fail, resulting in the transaction not being included in a block. In this case, the sequencer does unpaid work. Since failed transactions do not have valid proofs, they cannot be included, and there is no way to enforce the sequencer to charge fees.
The sequencer may be subjected to DDoS attacks, where attackers use invalid transactions to waste its efforts, and the sequencer cannot collect any fees for running these transactions.
It is impossible to distinguish between censorship (when the sequencer deliberately decides not to include certain transactions) and invalid transactions, as both types of transactions will not be included in blocks.

On Ethereum, all failed transactions are marked as reverted, but are still included in blocks, allowing validators to collect transaction fees upon failure. To prevent malicious users from bombarding the network with invalid transactions and overwhelming the sequencer, thus preventing legitimate transactions from being processed, Starknet needs a similar system that allows the sequencer to charge fees for failed transactions. To address the above issues, the Starknet network needs to achieve two goals: integrity and validity. Integrity ensures that transaction execution can always be proven, even if it is expected to fail. Validity ensures that valid transactions are not rejected, thus preventing censorship.

Sierra is constructed correctly (Correct-by-construction), allowing the sequencer to charge for all transactions. We can deploy branching code (like if/else) instead of potentially failing code (like asserts). The asserts in Cairo 1 are translated into branching Sierra code, allowing errors to propagate back to the original entry point that returns a boolean value, indicating whether the transaction succeeded or failed. If the entry point returns true/false, the Starknet operating system can determine whether the transaction is valid and decide whether to apply state updates if the transaction succeeds.

Cairo 1 offers syntax similar to Rust and creates a provable, developer-friendly programming language by abstracting Sierra's safety constructs. It compiles to Sierra, which is a correct-by-construction intermediate representation of Cairo code that contains no failure semantics. This ensures that no Sierra code will fail, and it ultimately compiles to a safe subset of CASM. Developers will compile their Cairo 1 code to Sierra and deploy Sierra programs to Starknet, rather than deploying CASM code to Starknet. When declaring transactions, the sequencer will be responsible for compiling Sierra code to CASM to ensure that failing code cannot be deployed on Starknet.

Correct-by-construction#

To design a language that does not fail, we must first identify the unsafe operations in Cairo 0. These include:

Illegal memory address references; attempting to access unallocated memory units
Assertions, as they may fail and cannot recover
Multiple writes to the same memory address due to Cairo's single-write memory model
Infinite loops, making it impossible to determine whether the program will exit

Ensuring Dereferencing Does Not Fail#

In Cairo 0, developers can write the following code, attempting to access the contents of an unallocated memory unit.

let (ptr:felt*) = alloc();

tempvar x = [ptr];

Sierra's type system prevents common pointer-related errors by enforcing strict ownership rules and utilizing smart pointers like Box, allowing invalid pointer dereferencing to be detected and prevented at compile time. The Box<T> type serves as a pointer to valid and initialized pointer instances and provides two functions for instantiation and dereferencing: box_new() and box_deref(). By using the type system to catch dereferencing errors at compile time, CASM compiled from Sierra avoids invalid pointer dereferencing.

Ensuring No Memory Unit Is Written to Multiple Times#

In Cairo 0, users will use arrays like this:

let (array:felt*) = alloc();

assert array[0] = 1;

assert array[1] = 2;

assert array[1] = 3; // fails

However, attempting to write to the same array index twice will result in a runtime error, as memory units can only be written once. To avoid this issue, Sierra introduces an Array<T> type and a function array_append<T>(Array<T>, value:T) -> Array<T>. This function takes an array instance and a value to append, returning an updated array instance pointing to the new next free memory unit. Thus, values are appended to the end of the array in order, without worrying about conflicts due to already written memory units.

To ensure that previously used array instances that have been appended are not reused, Sierra uses a linear type system to ensure that objects are used only once. Therefore, any Array instance that has already been appended cannot be reused in another array_append call.

The following code shows a snippet of a Sierra program that creates a felt array and uses the array_append library function to append the value 1 twice. In the code, the first array_append call uses the array variable with id [0] as input and returns a variable with id [2] representing the updated array. This variable is then used as the input parameter for the next array_append call. It is important to note that once the variable with id [0] is used by the library function, it cannot be reused; attempting to use the variable with id [0] as an input parameter for array_append will result in a compilation error.

array_new<felt>() -> ([0]);

felt_const<1>() -> ([1]);

store_temp<felt>([1]) -> ([1]);

array_append<felt>([0], [1]) -> ([2]);

felt_const<1>() -> ([4]);

store_temp<felt>([4]) -> ([4]);

array_append<felt>([2], [4]) -> ([5]);

For objects that can be reused multiple times, like felts, Sierra provides the dup<T>(T) -> (T,T) function, which returns two instances of the same object that can be used for different operations. This function is only applicable to safely copyable types, which typically do not include arrays or dictionaries.

Non-failure Assertions#

Assertions are typically used to evaluate the result of a boolean expression at a specific point in the code. If the evaluation does not match, an error is raised. Unlike in Cairo 0, the compilation of assertion instructions in Cairo 1 will generate branching Sierra code. If the assertion is not satisfied, this code will terminate the current function execution early and continue executing the next instruction.

Ensuring the Soundness of Programs Using Dictionaries#

Dictionaries, like arrays, face the issue of multiple value additions, which can be addressed by introducing a special Dict<K,V> type and a set of utility functions to instantiate, retrieve, and set values in the dictionary. However, dictionaries have a soundness issue. Each Dict must call the dict_squash(Dict<K,V>) -> () function at the end of the program to compress it, verifying the consistency of the key update sequence. Uncompressed dictionaries are dangerous because malicious provers can prove the correctness of inconsistent updates.

As we have seen before, the linear type system enforces that objects can only be used once. The only way to "use" a Dict is to call the dict_squash function, which takes a dictionary instance and does not return anything. This means that when compiling Sierra code to CASM, uncompressed dictionaries will be detected and will raise an error at compile time. For other types that do not require one-time use, typically types that do not include Dict, Sierra introduces the drop<T>(T)->() function, which uses an instance of the object and does not return anything.

It is worth noting that both drop and dup do not produce any CASM code. They only provide type safety at the Sierra level, ensuring that variables are used only once.

Preventing Infinite Loops#

Determining whether a program will eventually stop or run forever is a fundamental problem in computer science known as the halting problem, which is generally unsolvable. In a distributed environment like Starknet, where users can deploy and run arbitrary code, it is important to prevent users from running infinite loop code, such as the following Cairo code.

fn foo() { foo() }

Since recursive functions can lead to infinite loops if their stopping condition is never met, the Cairo-to-Sierra compiler will inject the withdraw_gas method at the beginning of recursive functions. As this functionality has not yet been implemented, developers still need to call withdraw_gas within recursive functions and handle the results themselves, although it should be included in future versions of the compiler.

The withdraw_gas function will deduct the amount of Gas required to run the function from the total available Gas for the transaction by calculating the running cost of each instruction in the function. The cost is analyzed by determining how many steps each operation requires, with the steps for most operations being known at compile time. During the execution of the Cairo program, if the withdraw_gas call returns null or a negative value, the current function execution will stop, and all pending variables will be consumed by calling dict_squash on uncompressed dictionaries and drop on other variables, and will be considered a failure of execution. Since all transactions on Starknet have a finite amount of available Gas to execute the transaction, this avoids infinite loops and ensures that there is still enough Gas available to delete variables and stop execution, allowing the sequencer to charge fees for transaction failures.

Achieving Safe CASM Through a Limited Set of Instructions#

The main goal of Sierra is to ensure that the generated CASM code does not fail. To achieve this, Sierra programs consist of statements that call libfuncs. These are a set of built-in library functions, and the CASM code generated for these functions is guaranteed to be safe. For example, the CASM code generated for the array_append library function is safe for appending values to an array.

This approach to achieving code safety by only allowing a set of safe and trusted library functions is similar to the philosophy of the Rust programming language. By providing a set of safe and trusted abstractions, both languages help avoid common programming errors and increase the safety and reliability of the code. Cairo 1 employs a similar ownership and borrowing system to Rust, providing developers with a way to infer code safety at compile time, which helps prevent errors and improve overall code quality.

Disclaimer#

This article aims to provide general information and understanding to readers and does not represent Nethermind's endorsement of any specific asset, project, or team, nor does it guarantee their safety. Nethermind makes no representations or warranties, express or implied, regarding the accuracy or completeness of the information or opinions contained in this article. No third party should rely on this article in any way, including but not limited to financial, investment, tax, regulatory, legal, or other advice, or interpret this article as any form of advice. Please note that while Nethermind provides services to Starkware, this article is not part of those services.

About Us#

Nethermind is a team of world-class builders and researchers. We provide global enterprises and developers with the ability to access and build the decentralized web. Our work touches every part of the Web3 ecosystem, from Nethermind nodes to foundational cryptographic research and infrastructure for the Starknet ecosystem. Discover our Starknet toolkit: Solidity to Cairo Compiler Warp, StarkNet Block Explorer Voyager, Open-source Formal Verification Tool Horus for StarkNet Smart Contracts, StarkNet Client Implementation Juno, and Cairo Smart Contract Security Audit Services. If you are interested in solving some of the toughest problems in blockchain, visit our job openings!