APNIC Pty Ltd.

10/01/2024 | Press release | Distributed by Public on 09/30/2024 19:33

RPKI: Deployed is better than perfect

Generated by AI.

Haya Schulmann and Niklas Vogel co-authored this work

Starting as an experimental technology in the mid-2000s, the Resource Public Key Infrastructure (RPKI) became a central component in the Internet and already affects a significant fraction of networks. Today, over 50% of announced prefixes are covered with Route Origin Authorizations (ROA), and about 25% of networks enforce Route Origin Validation (ROV).

Until recently, few outside the Internet operational, engineering, and research communities were aware of RPKI. That changed in September 2024 when the White House identified RPKI as the key component for securing Internet routing, pushing RPKI from niche to mainstream. It may be expected that mainstream technologies are fully mature, in particular, stable and secure. As a niche technology, RPKI developed organically in many small steps, each inching a bit closer to maturity. However, studies show that RPKI is far from being fully mature.

We review the problems and hurdles in RPKI specification, implementations, and deployments and discuss what these issues actually mean in practice. Can RPKI align with the expectations outlined in the White House roadmap? Did the White House push for adopting an immature technology, potentially doing more harm than good? Or did the White House promote the best available, good enough technology, motivating research and industry to speed up and put more resources behind improving RPKI?

Certainly, RPKI is not perfect, but is it good enough?

The answer, as we will explore, varies depending on one's viewpoint.

Challenges in RPKI specification, implementations, and deployments

Specification has conflicting and vague requirements

IETF defined about 40 RPKI-related RFCs, which are generally complementary so that each document addresses a specific aspect of the RPKI ecosystem. Although the multiple standards aim to address missing or vague details and guide the developers in implementing RPKI, their large number and often conflicting or vague requirements increase the complexity and hence the risk of bugs and vulnerabilities.

Vague or under-specified requirements can lead to routing instabilities, rejected routes, and even security vulnerabilities because the conflicting rules create small windows where a network could unintentionally accept hijacked prefixes or block legitimate ones.

RPKI software packages are not sufficiently stable and contain vulnerabilities

The RPKI specifications, RPKI software packages, and RPKI repository implementations are still not sufficiently stable and contain critical vulnerabilities. Overall, at least 53 vulnerabilities in RPKI software packages were disclosed, including persistent DoS, authentication bypass, cache poisoning, and remote-code-execution. While the large majority of these vulnerabilities were swiftly fixed, they still raise the question of the resilience of implementations and the potential existence of other zero days.

While we expect that software security will increase in the future, with improved secure coding patterns and the growing availability of tooling for developers to test their software, the current state of software security in RPKI makes it attractive for attackers, with the relative abundance of vulnerabilities that have potentially devastating consequences for RPKI validation and might even open a backdoor into the local network running the vulnerable software component.

As RPKI is gaining traction, the risk of intentional backdoors will also grow. All popular RPKI software implementations are open-source and accept code contributions by the community; the threat of intentional backdoors is substantial in the context of RPKI.

Deployment hurdles

In our research, we discuss several deployment hurdles ranging from errors to lack of documentation. Here, we review an important aspect of experience operating RPKI in strict validation mode.

RFC 7115 recommends that operators' policies should not be too strict; the operators should use RPKI to prefer valid announcements, assign a lower preference to NotFound announcements, and either discard Invalid announcements or give them a very low preference. To support this test mode, RPKI was designed to be 'fail open', namely, if ROAs cannot be fetched, for example, because they do not exist, or the RPKI repository that hosts them is unreachable, the RPKI validation for those resources gets the status 'NotFound' and announcements are hence accepted.

Operating RPKI validation in test mode is critical for the stability of the Internet since not all address space is covered by ROAs yet, and adversaries may be able to prevent relying party validators from fetching RPKI objects. In fact, failures to access the RPKI repositories may occur even under benign network conditions. If strict validation is applied, prefixes for which RPKI objects could not be retrieved will be filtered, impairing reachability to those Autonomous Systems (ASes). The risk of losing legitimate traffic is a substantial concern for network operators and one of the main obstacles hindering the wide adoption of RPKI validation. Operating in fail-open test mode facilitates incremental RPKI deployment, reducing failures and traffic loss.

The downside of the fail-open test mode is that networks that invest in deploying RPKI filtering with ROV may still be vulnerable to routing hijacks if adversaries can disable RPKI validation, for example, by blocking access to the RPKI repositories or by preventing the relying parties from fetching fresh RPKI objects. As a result, adversaries may be able to hijack network resources covered with ROAs.

Undoubtedly, the fail-open mode does not offer sufficient security guarantees for Internet routing, and in the long term, there should be a transition to strict RPKI validation. However, enforcing strict validation, which RFC 7715 acknowledges to not be realistic in the near future, exposes the networks to DoS attacks. For example, suppose the existence of a valid ROA is mandated but a router does not have the required RPKI data. In that case, it will eventually drop all announcements, leading to full DoS and a lack of reachability to those routes. Thus, while the impact of attacks on availability differs between fail-open and strict validations, the availability of RPKI data remains a core concern in RPKI deployments that still needs to be addressed.

Perfect is the enemy of the good

These and other problems indicate that the RPKI implementations are not sufficiently stable and lack resilience to existing and future cyberattacks. The RPKI validation exhibits inconsistent results. The RPKI standard specifications have not yet been finalized. The developers and operators lack documentation and automated tools for the development and configuration of RPKI technology. All these indicate that RPKI is not sufficiently mature.

But so what? Systems in the real world are never fully mature

Arguably, demanding full maturity before large-scale deployment is a very academic expectation; in real life, there is nothing like full maturity and perfection, only more or less good enough.

The Internet, like many information and communication technologies, is not mature. This applies to its applications, protocols, and widely used security mechanisms, such as SSL/TLS. Many Internet systems started from collaborative efforts between researchers and operators and grew organically. Over time, these efforts mature from experimental research prototypes and individual initiatives into deployments by large networks. The software is improved 'on-the-fly' with periodic patches that close bugs or add new features. The maturity of this organic system doesn't fully align with academic definitions and frameworks. In reality, systems are never completely perfect or mature, instead, they evolve gradually over time.

The BGP example: Mature or not, it connects the Internet

Examples of immature but nevertheless heavily used technologies exist in abundance. Internet routing with BGP is among the most prominent ones. Nowadays, BGP enables all Internet activities. In addition to its central role in any online activity, the complexity of BGP also grew. BGP was designed on three napkins, to connect different Internet domains, but it was not designed to be a robust and secure protocol that one would rely on for critical functionalities like the Internet has become.

Since then, BGP has evolved in terms of its computation steps, processes, attributes, and the number of supported networks. However, these aspects have become more complex, harder to configure, and more vulnerable. Indeed, software bugs and issues in protocol specification are common in inter-domain routing with BGP and may lead to outages, failures, and attacks. For instance, Free Range Routing (FRR) routers crashed and disconnected large networks from the Internet because they could not parse standard-compliant BGP attributes in routing announcements. In addition, the complexity of BGP may create a chain of side effects, such that a small failure or misconfiguration in one part of the Internet can have devastating global consequences.

Despite all the problems, outages, and attacks, the triple napkin protocol connects the Internet. Not only that but also the applications of BGP evolved far beyond BGP's original purpose, including many new and emerging applications.

Our analysis should be used as a TODO list

Academic analysis is important, and it allows us to identify directions to improve the security and stability of systems, but the implementation of academic analysis needs to be adapted to how the systems evolve and mature in the real world. An academic analysis provides a TODO list to guide the adopters, operators and developers in prioritizing their actions, addressing the problems one at a time, towards improving the maturity of an operating system. The list of problems, however, does not reflect the state of maturity of a system.

The roadmap is a huge leap forward

2024's US governmental roadmap recognition of RPKI as a critical security measure is an important step forward. Until recently, RPKI was mostly experimental, but the cybersecurity strategy of the White House, the Notice of Proposed Rulemaking (NPRM) of the FCC, and the recent roadmap made a huge push towards securing the routing infrastructure with RPKI. Now it is important to identify the hurdles that need to be resolved towards this goal. In this work, we outlined several such challenges.

Conclusion

RPKI implementations started as collaborative efforts between researchers, operators, and the broader IETF community. Over time, these efforts matured from experimental research projects and individual operator initiatives into deployments by some of the largest networks in the world. Our research shows that RPKI still suffers from problems and is not sufficiently stable. Nevertheless, RPKI already delivers benefits and it is an essential part of the Internet's ongoing efforts to improve routing security.

Research shows that RPKI can substantially limit the propagation of invalid BGP announcements, hence mitigating traffic hijacks. RPKI also provides an important prerequisite for prospective routing security solutions, including origin validation, path validation, and route leak prevention. The roadmap of the White House is a huge push to truly mature and meet the expectations of security, reliability, and scalability for production-level deployments across the global Internet.

Moreover, routing security is a global issue that necessitates an international approach, requiring collaboration across economies. We hope our insights and recommendations will support and guide these international endeavours.

This article is based on a full preprint draft : 'RPKI: Not Perfect But Good Enough', by Haya Schulmann, Niklas Vogel and Michael Waidner.

Dr Michael Waidner is a Professor of Computer Scienceat the Technische Universität Darmstadt, director of Fraunhofer-Institute for Secure Information Technology and is the CEO of Athene.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.