The goal of this post is to illustrate the need for secure operations processes in the web3 space, and showcase some of the problems that we have identified by interviewing and working with hundreds of teams throughout the development of OpenZeppelin Defender.
Security in web3 development has improved remarkably over the past years. Even though we still see frequent hacks due to coding errors, development security practices have evolved significantly across web3 teams. Full test coverage, static analysis tools, fuzzing, and mandatory peer reviews are a requirement across most teams, while audits and bug bounties add an extra layer of security.
However, the security in web3 operations is far from this level of maturity. Hardly any projects have any dedicated ops teams, and there is a general lack of awareness on best practices for operations in the web3 space. Security does not end when the code is frozen and audited: deployments, upgrades, integration testing, administration, automation, and monitoring also need to be bulletproof to avoid any hacks.
Going through the Rekt leaderboard is evidence enough: many of the top incidents were not caused by development mishaps but by operational oversights, and several of them could have been mitigated by proper monitoring and incident response practices. Examples include compromised frontends, hacks that go unnoticed for days, incorrect initializations, compromised keys, and more. In more recent news, a missing initialization in Arbitrum Nova led to a 400 ETH bounty payout, while Wintermute was hacked for $160M due to a compromised private key.
At OpenZeppelin, we first started following this issue in early 2020, as we set out to interview many projects in the space to better understand their security needs. While training their Solidity developers in security was top of mind for most teams, we noted that operational concerns were often relegated:
“We write a one-off script whenever we need to upgrade”
“Transactions are sent from a private key stored in a file on a shared server”
“We’d want to use a multisig or timelock eventually but tooling is lacking”
“I don’t know what to monitor”
“The lead developer just deploys the contracts from his working copy”
“I open etherscan every morning to see if everything looks fine”
This research led to the development of OpenZeppelin Defender, today in widespread use across the space for securing web3 operations. But the goal of this text is not shilling a platform, it’s sharing the learnings collected along the way and the pain points we found in the space. We’ll focus on five different operational areas: Integration, Deployment, Administration, Transaction Automation, and Monitoring.
The DeFi cambrian explosion from a few years back was made possible by a highly interconnected ecosystem, with a synergy only attainable by building protocols on top of others, and leveraging the primitives available to build more powerful instruments.
However, testing in such an interconnected environment is challenging: how do you test your protocol when it depends on so many live systems and moving pieces? As an example, Compound’s recent cETH price feed incident was triggered by perfectly valid code, audited by three different companies, but which interacted with a token implementation that had not been considered.
Testnet deployments are far from representative of an actual production setup. Few protocols deploy their systems to both testnet and mainnet, and even fewer bother to deploy exactly the same versions. Testnets fragmentation doesn’t help either, though the upcoming deprecation of Rinkeby, Kovan, and Ropsten will help. Furthermore, even if the code is the same in mainnet and testnet, the state of the system differs widely due to real production usage.
Mainnet forks help a lot here, but it is still difficult to test what would happen in other possible scenarios. Even if it’s an extreme case, hardly any team has ever tested how their protocol would behave should MakerDAO hit the emergency button on DAI.
Another issue is that automated integration tests on mainnet forks typically only focus on contracts. Nowadays, a web3 system is rarely composed of just smart contracts: it integrates multiple off-chain components as well, such as oracles, bots, and indexing engines, not to mention the frontends that users depend on.
These issues call for new solutions in integration testing. First and foremost, live staging environments, with a setup as representative as possible (both on and off chain) of the actual mainnet environment, is a must for manual testing and assuring that all components work as expected. Long-lived mainnet forks are a good way to start here.
Reliable deployment packages for popular protocols can provide the means for setting up environments with hypothetical scenarios. Imagine being able to spin up a version of Maker going through a black thursday scenario in any network with a few commands, so you can test your system reacting to it. While this requires a massive community effort to populate the many protocols and the many scenarios that are possible, it is worth the investment. This endeavor also requires a declarative deployment standard to build on top of.
Last, blockchain determinism and public record gives us a unique opportunity to test new implementations without even having to put them in production - the opposite of the test in prod meme. As an example, in the traditional development space, Github tests new code strategies by executing them against live production requests and comparing them with the legacy code until they are satisfied with the results. Today, we have all the building blocks needed for replaying live transactions against a new version of a system before making the upgrade, without even having to deploy the system to mainnet for it.
While most web2 systems are deployed from CI/CD pipelines, contract deployment is still very much a manual process. It usually falls on a developer to checkout the code, compile it, write a hardhat deployment script, deploy from their workstation, and then verify the source code on Etherscan or Sourcify.
This has several downsides, besides the hassle for the developer, who also needs to keep a funded private key handy to pay for the deployment gas costs. The main problem is a lack of traceability, as the process for going from the vetted source code to a live instance is opaque. This can lead to issues where the deployed code does not match the audited version, inadvertently introducing a vulnerability. And, in the interest of transparency, having an audit report that refers to a git commit does not offer much information to a user who is interacting with an address on Metamask.
Lack of reproducibility is also a problem as more protocols go multi-chain. Ideally you want to keep the same version of the code running on all the chains where you operate, but to do this, you need to be able to reliably re-deploy exactly the same version. This means starting from the same version of the source code, building with the same settings and compiler version, and deploying with the same configuration.
The low-hanging fruit here is to move deployments to a transparent CI/CD pipeline, where anyone can follow the automated process from the source code to compiled bytecode, and from there to the deployed instance. Online tools can also facilitate auditability by verifying that the code at an address matches a specific artifact. The end goal is to have a reproducible process and auditable trail from a git commit to a deployed instance.
With very few exceptions, most smart contract systems keep a set of administrative functions that allow a governing body to tweak how the protocol operates. These can go from adjusting a parameter to a full-blown code upgrade. Treasury management can also be considered part of the administration of a protocol. Given how critical these functions are, it is imperative they are controlled by secure and trusted entities.
The state of the art when it comes to governance has improved remarkably over the last few years. It was not uncommon to see protocols with millions of dollars managed by a single private key sitting in the Metamask wallet of the project lead developer.
Fortunately, it has now become commonplace the use of multi-signature wallets, with Gnosis Safe being the most popular option. However, multisig contract implementations are only one part of the picture: good tooling for creating and reviewing transactions is needed to guarantee secure operations run through these contracts.
Tools for creating transactions need to account for not just the multisig, but also other governance and security mechanisms set up, such as timelocks or vote-based governance contracts. Governance contracts decentralize the decisions in a community, though many projects still restrict who can create new proposals, retaining that power behind a multisig. And timelocks provide two simultaneous benefits: they give the project’s team time to react if their setup was compromised, and they protect the community against rugpulls by giving users the time to exit the protocol if a malicious proposal is put forward. While a multisig transaction builder is good enough for creating transactions sent directly to the application, more complex tools are needed as these building blocks are combined.
Reviewing administrative proposals is particularly challenging from a UX perspective. Many project stakeholders are not developers, meaning they cannot manually run scripts to decode or analyze a transaction before approving it. All too often, an N/M multisig becomes 1/M, where the 1 is the lead developer who pushes a proposal and asks the remaining M-1 to sign, who blindly hit the approve button.
Simulation is the best resource we have for analyzing the outcome of a transaction, but the simulation tools that we use as developers may not cut it when it comes to less tech-savvy users. There is a need for simulation capabilities that are accessible and easy to understand for everyone, so they can verify by themselves, and not trust.
Contract upgrades deserve a separate paragraph. Approving an upgrade as a multisig signer means agreeing to a potentially massive change, while having only the 40 hex characters of the new implementation contract as context. And while having the source code verified on Etherscan is a requirement, the process of linking that code back to a verified or audited git commit is cumbersome (if not impossible) for most users. Easy-to-audit deployments, as detailed in the previous section, are a requirement for secure upgrade operations. It also helps to have a strong suite of integration tests for the new implementation that signers can review before approving.
An automated script that submits transactions to the network needs access to a funded hot wallet. Gasless transactions is one of the most popular use cases here, and compromising the key of a meta-transaction relayer would mean draining all the funds allocated to paying gas fees. However, there are scenarios where these off-chain scripts need to manage keys with sensitive privileges, such as updating an oracle or managing a bridge. A compromised private key in this scenario can have dire consequences, as was the case with Harmony’s bridge hack for over $100M.
This makes private keys more sensitive than your average application secret, so storing them in plaintext in a dot-env file in a server where half the dev team has ssh access is not a good idea. It is key (pun intended) to leverage key management solutions that can keep the key safely stored. There are managed cloud-based options, such as GCP KMS or AWS KMS, self-managed like Hashi Vault, and even hardware security modules for hosting your keys.
Most of these solutions also expose cryptographic functions in their API, so you can send your payloads to be signed by the KMS and the private key never leaves the secure vault. Keeping the keys in a secure vault also lets you maintain an audit trail of all requests for using it, so you can easily detect or trace back any unauthorized usage.
Reliability is also a concern when it comes to transaction automation. In blockchain, It is not enough to send a transaction, since you have no guarantees that it will get mined. Any automation infrastructure needs to monitor any in-flight transactions to reprice and resubmit them if network congestion increases, and ensure that a low-priced transaction will not clog the sender for any subsequent ones. There are multiple managed solutions for tackling this problem, though you can also write your own.
Monitoring and alerting is commonplace on any web2 application, but its adoption in the web3 space started only recently with platforms like Defender, Forta, or Tenderly. For an embarrassingly long time, the most widely adopted approach to security alerts was samczsun’s trademarked “u up?”. Such was the state of monitoring that the largest crypto hack ever ($624M) went unnoticed for almost a week.
Reacting timely to a security event is important not just for being on top of the incident and keeping your community reassured, but also because many hacks don’t involve a single action. Attacks usually require multiple transactions in preparation and execution, spread over time. As an example, BadgerDAO’s hack lasted 10 hours until all user wallets were drained.
Receiving an early notice of an attack can give you the time to pause the protocol and stop the bleeding. Circuit breakers are a particularly interesting building block here: scripts triggered by security monitoring alerts that rely on transaction automation for pausing the system when a threat is detected.
Last, custom monitoring scripts are also becoming a common practice, but it’s important to monitor the monitoring. A failure on an alerting script can go undetected, since it is expected to be silent, and can lead to a missed opportunity to deter a hack.
As recent hacks prove, security in operations is as critical as in development. Security does not end with Solidity: the most secure smart contract is worth nothing if its onlyOwner wallet gets compromised. There is a need for a DevSecOps approach for the web3 ecosystem, powered by platforms and tools that provide the means to implement its processes and best practices.