Subscribe to receive notifications of new posts:

here) and flexible purge handling, despite the difference in termination points.</p><h3>The payoff</h3><p>Our new approach reduced the long tail of the lazy purge, saving 10x storage. Better yet, we can now delete purged content immediately instead of waiting for the lazy purge to happen or expire. This new-found storage will improve cache retention on disk for all users, leading to improved cache HIT ratios and reduced egress from your origin.</p>\n <figure class=\"kg-card kg-image-card \">\n \n <Image src=\"https://cf-assets.www.cloudflare.com/slt3lc6tev37/6B0kVX9Q6qA2JshmcTZSrt/bf74ba1901efbb8ae92e2295b216e762/image1.png\" alt=\"\" class=\"kg-image\" width=\"1554\" height=\"423\" loading=\"lazy\"/>\n \n </figure><p><sup><i>The shift from lazy content purging (</i></sup><sup><i><u>left</u></i></sup><sup><i>) to the new Coreless Purge architecture allows us to actively delete content (</i></sup><sup><i><u>right</u></i></sup><sup><i>). This helps reduce storage needs and increase cache retention times across our service.</i></sup></p><p>With the new coreless cache purge, we can now get a purge request into any datacenter, distribute the keys to purge, and instantly purge the content from the cache database. This all occurs in less than 150 milliseconds on P50 for tags, hostnames, and prefix URL, covering all <a href=\"https://www.cloudflare.com/network/\"><u>330 cities</u></a> in <a href=\"https://blog.cloudflare.com/backbone2024/\"><u>120 countries</u></a>.</p><h3>Benchmarks</h3><p>To measure Instant Purge, we wanted to make sure that we were looking at real user metrics — that these were purges customers were actually issuing and performance that was representative of what we were seeing under real conditions, rather than marketing numbers.</p><p>The time we measure represents the period when a request enters the local datacenter, and ends with when the purge has been executed in every datacenter. When the local data center receives the request, one of the first things we do is to add a timestamp to the purge request. When all data centers have completed the purge action, another timestamp is added to “stop the clock.” Each purge request generates this performance data, and it is then sent to a database for us to measure the appropriate quantiles and to help us understand how we can improve further.</p><p>In August 2024, we took purge performance data and segmented our collected data by region based on where the local data center receiving the request was located.</p><table><tr><td><p><b>Region</b></p></td><td><p><b>P50 Aug 2024 (Coreless)</b></p></td><td><p><b>P50 May 2022 (Core-based)</b></p></td><td><p><b>Improvement</b></p></td></tr><tr><td><p>Africa</p></td><td><p>303ms</p></td><td><p>1,420ms</p></td><td><p>78.66%</p></td></tr><tr><td><p>Asia Pacific Region (APAC)</p></td><td><p>199ms</p></td><td><p>1,300ms</p></td><td><p>84.69%</p></td></tr><tr><td><p>Eastern Europe (EEUR)</p></td><td><p>140ms</p></td><td><p>1,240ms</p></td><td><p>88.70%</p></td></tr><tr><td><p>Eastern North America (ENAM)</p></td><td><p>119ms</p></td><td><p>1,080ms</p></td><td><p>88.98%</p></td></tr><tr><td><p>Oceania</p></td><td><p>191ms</p></td><td><p>1,160ms</p></td><td><p>83.53%</p></td></tr><tr><td><p>South America (SA)</p></td><td><p>196ms</p></td><td><p>1,250ms</p></td><td><p>84.32%</p></td></tr><tr><td><p>Western Europe (WEUR)</p></td><td><p>131ms</p></td><td><p>1,190ms</p></td><td><p>88.99%</p></td></tr><tr><td><p>Western North America (WNAM)</p></td><td><p>115ms</p></td><td><p>1,000ms</p></td><td><p>88.5%</p></td></tr><tr><td><p><b>Global</b></p></td><td><p><b>149ms</b></p></td><td><p><b>1,570ms</b></p></td><td><p><b>90.5%</b></p></td></tr></table><p><sup>Note: Global latency numbers on the core-based measurements (May 2022) may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion, so outliers and retries might have an outsized effect.</sup></p><h3>What’s next?</h3><p>We are currently wrapping up the roll-out of the last throughput changes which allow us to efficiently scale purge requests. As that happens, we will revise our rate limits and open up purge by tag, hostname, and prefix to all plan types! We expect to begin rolling out the additional purge types to all plans and users beginning in early <b>2025</b>.</p><p>In addition, in the process of implementing this new approach, we have identified improvements that will shave a few more milliseconds off our single-file purge. Currently, single-file purges have a P50 of 234ms. However, we want to, and can, bring that number down to below 200ms.</p><p>If you want to come work on the world&#39;s fastest purge system, check out <a href=\"http://www.cloudflare.com/careers\">our open positions</a>.</p><h3>Watch on Cloudflare TV</h3><div style=\"position: relative; padding-top: 56.25%;\">\n <iframe\n src=\"https://customer-rhnwzxvb3mg4wz3v.cloudflarestream.com/8c6878c0b57e93d0024c42caf351107e/iframe?preload=true&poster=https://customer-rhnwzxvb3mg4wz3v.cloudflarestream.com/8c6878c0b57e93d0024c42caf351107e/thumbnails/thumbnail.jpg?time=2s&height=600\"\n loading=\"lazy\"\n style=\"border: none; position: absolute; top: 0; left: 0; height: 100%; width: 100%;\"\n allow=\"accelerometer; gyroscope; autoplay; encrypted-media; picture-in-picture;\"\n allowfullscreen=\"true\"\n ></iframe>\n</div><p></p>"],"published_at":[0,"2024-09-25T00:00 01:00"],"updated_at":[0,"2024-09-27T14:58:10.350Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/59kdx20ZFbCY9AqWfUCsew/feb52e2791c11583c6eaf1f06176f366/image4.png"],"tags":[1,[[0,{"id":[0,"1Cv5JjXzKWKEA10JdYbXu1"],"name":[0,"Birthday Week"],"slug":[0,"birthday-week"]}],[0,{"id":[0,"4gN0ARax0fHxjtZL07THOe"],"name":[0,"Performance"],"slug":[0,"performance"]}],[0,{"id":[0,"5RrjSR5vIOJAfRdT8966hf"],"name":[0,"Cache"],"slug":[0,"cache"]}],[0,{"id":[0,"48r7QV00gLMWOIcM1CSDRy"],"name":[0,"Speed & Reliability"],"slug":[0,"speed-and-reliability"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Alex Krivit"],"slug":[0,"alex"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/1e9gtWsffhoAWG0rTN45cR/4acc93f498ebe57bca2563c4c6c0ffc7/alex.jpg"],"location":[0,null],"website":[0,null],"twitter":[0,"@ackriv"],"facebook":[0,null]}],[0,{"name":[0,"Tim Kornhammar"],"slug":[0,"tim-kornhammar"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/48jH4cWwD5fakrdEvnB67d/acbeacd06bc94ffd597549a834c80f92/tim-kornhammar.jpg"],"location":[0,"Texas, USA"],"website":[0,null],"twitter":[0,null],"facebook":[0,null]}],[0,{"name":[0," Connor Harwood"],"slug":[0,"connor-harwood"],"bio":[0],"profile_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/7ChQBYRPNIiEuQnGQdajbt/d11901c93d25d4d82d01274d09a03d8a/_tmp_mini_magick20221017-43-y5gfah.jpg"],"location":[0],"website":[0],"twitter":[0],"facebook":[0]}]]],"meta_description":[0,"Today we’re excited to share that we’ve built the fastest cache purge in the industry. We now offer a global purge latency for purge by tags, hostnames, and prefixes of less than 150ms on average (P50), representing a 90% improvement since May 2022. "],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/instant-purge"],"metadata":[0,{"title":[0,"Instant Purge: invalidating cached content in under 150ms"],"description":[0,"Today we’re excited to share that we’ve built the fastest cache purge in the industry. We now offer a global purge latency for purge by tags, hostnames, and prefixes of less than 150ms on average (P50), representing a 90% improvement since May 2022. "],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/18ZVwqqO5THZ7TrRFKjxR6/a1609c424549ac26ab7a4b47549cdebb/Instant_Purge-_invalidating_cached_content_in_under_150ms-OG.png"]}]}],[0,{"id":[0,"1CRWV43VAHSo5XHLkwPw2R"],"title":[0,"Network performance update: Birthday Week 2024"],"slug":[0,"network-performance-update-birthday-week-2024"],"excerpt":[0,"Since June 2021, we’ve been measuring and ranking our network performance against the top global networks in the world. We use this data to improve our performance, and to share the results of those initiatives. \n\nIn this post, we’re going to share with you how network performance has changed since our last post in March 2024, and discuss the tools and processes we are using to assess network performance. \n"],"featured":[0,false],"html":[0,"<p>When it comes to the Internet, everyone wants to be the fastest. At Cloudflare, we’re no different. We want to be the fastest network in the world, and are constantly working towards that goal. Since <a href=\"https://blog.cloudflare.com/benchmarking-edge-network-performance/\"><u>June 2021</u></a>, we’ve been measuring and ranking our network performance against the top global networks. We use this data to improve our performance, and to share the results of those initiatives. </p><p>In this post, we’re going to share with you how our network performance has changed since our last <a href=\"https://blog.cloudflare.com/network-performance-update-security-week-2024/\"><u>post in March 2024</u></a>, and discuss the tools and processes we are using to assess network performance. </p><h3>Digging into the data</h3><p>Cloudflare has been measuring network performance across these top networks from the top 1,000 ISPs by estimated population (according to the <a href=\"https://stats.labs.apnic.net/cgi-bin/aspop?c=IN\"><u>Asia Pacific Network Information Centre (APNIC)</u></a>), and optimizing our network for ISPs and countries where we see opportunities to improve. For performance benchmarking, we look at TCP connection time. This is the time it takes an end user to connect to the website or endpoint they are trying to reach. We chose this metric to show how our network helps make your websites faster by serving your traffic where your customers are. Back in June 2021, Cloudflare was ranked #1 in 33% of the networks.</p><p>As of September 2024, examining 95th percentile (p95) TCP connect times measured from September 4 to September 19, Cloudflare is the #1 provider in 44% of the top 1000 networks:</p>\n <figure class=\"kg-card kg-image-card\">\n <Image src=\"https://cf-assets.www.cloudflare.com/slt3lc6tev37/6zN4WFnD4yijPB5fCX2wOY/78b5881938a7a15010f06dbbb02e4722/BLOG-2569_2.png\" alt=\"BLOG-2569 2\" class=\"kg-image\" width=\"1082\" height=\"500\" loading=\"lazy\"/>\n </figure><p>This graph shows that we are fastest in 410 networks, but that would only make us the fastest in 41% of the top 1,000. To make sure we’re looking at the networks that eyeballs connect from, we exclude networks like transit networks that aren’t last-mile ISPs. That brings the number of measured networks to 932, which makes us fastest in 44% of ISPs.</p><p>Now let’s take a look at the fastest provider by country. The map below shows the top network by 95th percentile TCP connection time, and Cloudflare is fastest in many countries. For those where we weren’t, we were within a few milliseconds of our closest competitor. </p>\n <figure class=\"kg-card kg-image-card\">\n <Image src=\"https://cf-assets.www.cloudflare.com/slt3lc6tev37/7pEP1CiCQKL2lzSg3vH3C2/40dabf5bdacc6f9eb847c5870c7cd6d4/BLOG-2569_3.png\" alt=\"BLOG-2569 3\" class=\"kg-image\" width=\"1999\" height=\"1151\" loading=\"lazy\"/>\n </figure><p>This color coding is generated by grouping all the measurements we generate by which country the measurement originates from, and then looking at the 95th percentile measurements for each provider to see who is the fastest. This is in contrast to how we measure who is fastest in individual networks, which involves grouping the measurements by ISP and measuring which provider is fastest. Cloudflare is still the fastest provider in 44% of the measured networks, which is consistent with our March report. Cloudflare is also the fastest in many countries, but the map is less orange than it was when we published our measurements from March 2024:</p>\n <figure class=\"kg-card kg-image-card\">\n <Image src=\"https://cf-assets.www.cloudflare.com/slt3lc6tev37/MTyot2PbJD2K6eFJ41o5V/d4e15f001aecc21b651b0ce0d064245c/BLOG-2569_4.png\" alt=\"BLOG-2569 4\" class=\"kg-image\" width=\"1999\" height=\"1344\" loading=\"lazy\"/>\n </figure><p>This can be explained by the fact that the fastest provider in a country is often determined by latency differences so small it is often less than 5% faster than the second-fastest provider. As an example, let’s take a look at India, a country where we are now the fastest:</p><p><b>India performance by provider</b></p><table><tr><th><p>Rank</p></th><th><p>Entity</p></th><th><p>95th percentile TCP Connect (ms)</p></th><th><p>Difference from #1</p></th></tr><tr><td><p>1</p></td><td><p><b>Cloudflare</b></p></td><td><p>290 ms</p></td><td><p>-</p></td></tr><tr><td><p>2</p></td><td><p>Google</p></td><td><p>291 ms</p></td><td><p> 0.28% ( 0.81 ms)</p></td></tr><tr><td><p>3</p></td><td><p>CloudFront</p></td><td><p>304 ms</p></td><td><p> 4.61% ( 13 ms)</p></td></tr><tr><td><p>4</p></td><td><p>Fastly</p></td><td><p>325 ms</p></td><td><p> 12% ( 35 ms)</p></td></tr><tr><td><p>5</p></td><td><p>Akamai</p></td><td><p>373 ms</p></td><td><p> 29% ( 83 ms)</p></td></tr></table><p>In India, we are the fastest network, but we are beating the runner-up by less than a millisecond, which shakes out to less than 1% difference between us and the number two! The competition for the number one spot in many countries is fierce and sometimes can be determined by what days you’re looking at the data, because variance in connectivity or last-mile outages can materially impact this data.</p><p>For example, on September 17, there was <a href=\"https://economictimes.indiatimes.com/news/india/jio-network-down-several-users-complaint-network-issue-downdetector-confirms-outage/articleshow/113417252.cms?from=mdr\"><u>an outage on a major network in India</u></a>, which impacted many users’ ability to access the Internet. People using this network were significantly impacted in their ability to connect to Cloudflare, and you can actually see that impact in the data. Here’s what the data looked like on the day of the outage, as we dropped from the number one spot that day:</p><p><b>India performance by provider</b></p><table><tr><th><p>Rank</p></th><th><p>Entity</p></th><th><p>95th percentile TCP Connect (ms)</p></th><th><p>Difference from #1</p></th></tr><tr><td><p>1</p></td><td><p>Google</p></td><td><p>219 ms</p></td><td><p>-</p></td></tr><tr><td><p>2</p></td><td><p>CloudFront</p></td><td><p>230 ms</p></td><td><p> 5% ( 11 ms)</p></td></tr><tr><td><p>3</p></td><td><p><b>Cloudflare</b></p></td><td><p>236 ms</p></td><td><p> 7.47% ( 16 ms)</p></td></tr><tr><td><p>4</p></td><td><p>Fastly</p></td><td><p>261 ms</p></td><td><p> 19% ( 41 ms)</p></td></tr><tr><td><p>5</p></td><td><p>Akamai</p></td><td><p>286 ms</p></td><td><p> 30% ( 67 ms)</p></td></tr></table><p>We were impacted more than other providers here because our automated traffic management systems detected the high packet loss as a result of the outage and aggressively moved all of our traffic away from the impacted provider. After review internally, we have identified opportunities to improve our traffic management to be more fine-grained in our approach to outages of this type, which would help us continue to be fast despite changes in the surrounding ecosystem. These unplanned occurrences can happen to any network, but these events also provide us opportunities to improve and mitigate the randomness we see on the Internet.</p><p>The phenomenon of providers having fluctuating latencies can also work against us. Consider Poland, a country where we were the fastest provider in March, but are no longer the fastest provider today. Digging into the data a bit more, we can see that even though we are no longer the fastest, we’re slower by less than a millisecond, giving us confidence that our architecture is optimized for performance in the region:</p><p><b>Poland performance by provider</b></p><table><tr><th><p>Rank</p></th><th><p>Entity</p></th><th><p>95th percentile TCP Connect (ms)</p></th><th><p>Difference from #1</p></th></tr><tr><td><p>1</p></td><td><p>Google</p></td><td><p>246 ms</p></td><td><p>-</p></td></tr><tr><td><p>2</p></td><td><p><b>Cloudflare</b></p></td><td><p>246 ms</p></td><td><p> 0.15% ( 0.36 ms)</p></td></tr><tr><td><p>3</p></td><td><p>CloudFront</p></td><td><p>250 ms</p></td><td><p> 1.7% ( 4.17 ms)</p></td></tr><tr><td><p>4</p></td><td><p>Akamai</p></td><td><p>272 ms</p></td><td><p> 11% ( 26 ms)</p></td></tr><tr><td><p>5</p></td><td><p>Fastly</p></td><td><p>295 ms</p></td><td><p> 20% ( 50 ms)</p></td></tr></table><p>These nuances in the data can make us look slower in more countries than we actually are. From a numbers&#39; perspective we’re neck-and-neck with our competitors and still fastest in the most networks around the world. Going forward, we’re going to take a longer look at how we’re visualizing our network performance to paint a clearer picture for you around performance. But let’s go into more about how we actually get the underlying data we use to measure ourselves.</p><h3>How we measure performance</h3><p>When you see a Cloudflare-branded error page, something interesting happens behind the scenes. Every time one of these error pages is displayed, Cloudflare gathers Real User Measurements (RUM) by fetching a tiny file from various networks, including Cloudflare, Akamai, Amazon CloudFront, Fastly, and Google Cloud CDN. Your browser sends back performance data from the end-user’s perspective, helping us get a clear view of how these different networks stack up in terms of speed. The main goal? Figure out where we’re fast, and more importantly, where we can make Cloudflare even faster. If you&#39;re curious about the details, the original <a href=\"https://blog.cloudflare.com/introducing-radar-internet-quality-page/\"><u>Speed Week blog post</u></a> dives deeper into the methodology.</p><p>Using this RUM data, we track key performance metrics such as TCP Connection Time, Time to First Byte (TTFB), and Time to Last Byte (TTLB) for Cloudflare and the other networks. </p><p>Starting from March, we fixed the list of networks we look at to be the top 1000 networks by estimated population as determined by <a href=\"https://stats.labs.apnic.net/cgi-bin/aspop?c=IN\"><u>APNIC</u></a>, and we removed networks that weren’t last-mile ISPs. This change makes our measurements and reporting more consistent because we look at the same set of networks for every reporting cycle.</p><h3>How does Cloudflare use this data?</h3><p>Cloudflare uses this data to improve our network performance in lagging regions. For example, in 2022 we recognized that performance on a network in Finland was not as fast as some comparable regions. Users were taking 300 ms to connect to Cloudflare at the 95th percentile:</p><p><b>Performance for Finland network</b></p><table><tr><th><p>Rank</p></th><th><p>Entity</p></th><th><p>95th percentile TCP Connect (ms)</p></th><th><p>Difference from #1</p></th></tr><tr><td><p>1</p></td><td><p>Fastly</p></td><td><p>15 ms</p></td><td><p>-</p></td></tr><tr><td><p>2</p></td><td><p>CloudFront</p></td><td><p>19 ms</p></td><td><p> 19% ( 3 ms)</p></td></tr><tr><td><p>3</p></td><td><p>Akamai</p></td><td><p>20 ms</p></td><td><p> 28% ( 4.3 ms)</p></td></tr><tr><td><p>4</p></td><td><p>Google</p></td><td><p>72 ms</p></td><td><p> 363% ( 56 ms)</p></td></tr><tr><td><p>5</p></td><td><p><b>Cloudflare</b></p></td><td><p>368 ms</p></td><td><p> 2378% ( 353 ms)</p></td></tr></table><p>After investigating, we recognized that one major network in Finland was seeing high latency due to issues resulting from congestion. Simply put, we were using all the capacity we had. We immediately planned an expansion, and within two weeks of that expansion completion, our latency decreased, and we became the fastest provider in the region, as you can see in the map above.</p><p>We are constantly improving our network and infrastructure to better serve our customers. Data like this helps us identify where we can be most impactful, and improve service for our customers. </p><h3>What’s next </h3><p>We’re sharing our updates on our journey to become as fast as we can be everywhere so that you can see what goes into running the fastest network in the world. From here, our plan is the same as always: identify where we’re slower, fix it, and then tell you how we’ve gotten faster.</p>"],"published_at":[0,"2024-09-23T14:00 01:00"],"updated_at":[0,"2024-09-23T13:00:02.519Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/4ingLn98E5dg7XWTS3uBmX/8d856ee313674209461ec2973e29fa12/BLOG-2569_1.png"],"tags":[1,[[0,{"id":[0,"1Cv5JjXzKWKEA10JdYbXu1"],"name":[0,"Birthday Week"],"slug":[0,"birthday-week"]}],[0,{"id":[0,"9nGp6RbH7FA3Tqg2MEy2p"],"name":[0,"Network Performance Update"],"slug":[0,"network-performance-update"]}],[0,{"id":[0,"4gN0ARax0fHxjtZL07THOe"],"name":[0,"Performance"],"slug":[0,"performance"]}],[0,{"id":[0,"5kZtWqjqa7aOUoZr8NFGwI"],"name":[0,"Cloudflare Radar"],"slug":[0,"cloudflare-radar"]}],[0,{"id":[0,"2s3r2BdfPas9oiGbGRXdmQ"],"name":[0,"Network Services"],"slug":[0,"network-services"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Emily Music"],"slug":[0,"emily-music"],"bio":[0],"profile_image":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/21ATlfLfQzpDmBkxvopG6d/ed83d191bdbc028e90a77521acf43f19/unnamed-3.png"],"location":[0],"website":[0],"twitter":[0],"facebook":[0]}]]],"meta_description":[0,"Since June 2021, we’ve been measuring and ranking our network performance against the top global networks in the world. We use this data to improve our performance, and to share the results of those initiatives. \n\nIn this post, we’re going to share with you how network performance has changed since our last post in March 2024, and discuss the tools and processes we are using to assess network performance. \n"],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/network-performance-update-birthday-week-2024"],"metadata":[0,{"title":[0],"description":[0],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/slt3lc6tev37/4crMCzNwlA33QT4mML0C0/51259c043cd321e9e91bbfab0520d789/BLOG-2569_OG.png"]}]}]]],"locale":[0,"en-us"],"translations":[0,{"posts.by":[0,"By"],"footer.gdpr":[0,"GDPR"],"lang_blurb1":[0,"This post is also available in {lang1}."],"lang_blurb2":[0,"This post is also available in {lang1} and {lang2}."],"lang_blurb3":[0,"This post is also available in {lang1}, {lang2} and {lang3}."],"footer.blurb":[0,"Cloudflare's connectivity cloud protects <a target='_blank' href='https://www.cloudflare.com/network-services/' rel='noreferrer'>entire corporate networks</a>, helps customers build <a target='_blank' href='https://workers.cloudflare.com/' rel='noreferrer'>Internet-scale applications efficiently</a>, accelerates any <a target='_blank' href='https://www.cloudflare.com/performance/accelerate-internet-applications/' rel='noreferrer'>website or Internet application</a>, <a target='_blank' href='https://www.cloudflare.com/ddos/' rel='noreferrer'>wards off DDoS attacks</a>, keeps <a target='_blank' href='https://www.cloudflare.com/application-security/' rel='noreferrer'>hackers at bay</a>, and can help you on <a target='_blank' href='https://www.cloudflare.com/products/zero-trust/' rel='noreferrer'>your journey to Zero Trust</a>.<br/><br/>Visit <a target='_blank' href='https://one.one.one.one/' rel='noreferrer'>1.1.1.1</a> from any device to get started with our free app that makes your Internet faster and safer.<br/><br/>To learn more about our mission to help build a better Internet, <a target='_blank' href='https://www.cloudflare.com/learning/what-is-cloudflare/' rel='noreferrer'>start here</a>. If you&apos;re looking for a new career direction, check out <a target='_blank' href='http://www.cloudflare.com/careers' rel='noreferrer'>our open positions</a>."],"footer.press":[0,"Press"],"header.title":[0,"The Cloudflare Blog"],"search.clear":[0,"Clear"],"search.filter":[0,"Filter"],"search.source":[0,"Source"],"footer.careers":[0,"Careers"],"footer.company":[0,"Company"],"footer.support":[0,"Support"],"footer.the_net":[0,"theNet"],"search.filters":[0,"Filters"],"footer.our_team":[0,"Our team"],"footer.webinars":[0,"Webinars"],"page.more_posts":[0,"More posts"],"posts.time_read":[0,"{time} min read"],"search.language":[0,"Language"],"footer.community":[0,"Community"],"footer.resources":[0,"Resources"],"footer.solutions":[0,"Solutions"],"footer.trademark":[0,"Trademark"],"header.subscribe":[0,"Subscribe"],"footer.compliance":[0,"Compliance"],"footer.free_plans":[0,"Free plans"],"footer.impact_ESG":[0,"Impact/ESG"],"posts.follow_on_X":[0,"Follow on X"],"footer.help_center":[0,"Help center"],"footer.network_map":[0,"Network Map"],"header.please_wait":[0,"Please Wait"],"page.related_posts":[0,"Related posts"],"search.result_stat":[0,"Results <strong>{search_range}</strong> of <strong>{search_total}</strong> for <strong>{search_keyword}</strong>"],"footer.case_studies":[0,"Case Studies"],"footer.connect_2024":[0,"Connect 2024"],"footer.terms_of_use":[0,"Terms of Use"],"footer.white_papers":[0,"White Papers"],"footer.cloudflare_tv":[0,"Cloudflare TV"],"footer.community_hub":[0,"Community Hub"],"footer.compare_plans":[0,"Compare plans"],"footer.contact_sales":[0,"Contact Sales"],"header.contact_sales":[0,"Contact Sales"],"header.email_address":[0,"Email Address"],"page.error.not_found":[0,"Page not found"],"footer.developer_docs":[0,"Developer docs"],"footer.privacy_policy":[0,"Privacy Policy"],"footer.request_a_demo":[0,"Request a demo"],"page.continue_reading":[0,"Continue reading"],"footer.analysts_report":[0,"Analyst reports"],"footer.for_enterprises":[0,"For enterprises"],"footer.getting_started":[0,"Getting Started"],"footer.learning_center":[0,"Learning Center"],"footer.project_galileo":[0,"Project Galileo"],"pagination.newer_posts":[0,"Newer Posts"],"pagination.older_posts":[0,"Older Posts"],"posts.social_buttons.x":[0,"Discuss on X"],"search.source_location":[0,"Source/Location"],"footer.about_cloudflare":[0,"About Cloudflare"],"footer.athenian_project":[0,"Athenian Project"],"footer.become_a_partner":[0,"Become a partner"],"footer.cloudflare_radar":[0,"Cloudflare Radar"],"footer.network_services":[0,"Network services"],"footer.trust_and_safety":[0,"Trust & Safety"],"header.get_started_free":[0,"Get Started Free"],"page.search.placeholder":[0,"Search Cloudflare"],"footer.cloudflare_status":[0,"Cloudflare Status"],"footer.cookie_preference":[0,"Cookie Preferences"],"header.valid_email_error":[0,"Must be valid email."],"footer.connectivity_cloud":[0,"Connectivity cloud"],"footer.developer_services":[0,"Developer services"],"footer.investor_relations":[0,"Investor relations"],"page.not_found.error_code":[0,"Error Code: 404"],"footer.logos_and_press_kit":[0,"Logos & press kit"],"footer.application_services":[0,"Application services"],"footer.get_a_recommendation":[0,"Get a recommendation"],"posts.social_buttons.reddit":[0,"Discuss on Reddit"],"footer.sse_and_sase_services":[0,"SSE and SASE services"],"page.not_found.outdated_link":[0,"You may have used an outdated link, or you may have typed the address incorrectly."],"footer.report_security_issues":[0,"Report Security Issues"],"page.error.error_message_page":[0,"Sorry, we can't find the page you are looking for."],"header.subscribe_notifications":[0,"Subscribe to receive notifications of new posts:"],"footer.cloudflare_for_campaigns":[0,"Cloudflare for Campaigns"],"header.subscription_confimation":[0,"Subscription confirmed. Thank you for subscribing!"],"posts.social_buttons.hackernews":[0,"Discuss on Hacker News"],"footer.diversity_equity_inclusion":[0,"Diversity, equity & inclusion"],"footer.critical_infrastructure_defense_project":[0,"Critical Infrastructure Defense Project"]}],"localesAvailable":[1,[]]}" ssr="" client="load" opts="{"name":"Post","value":true}" await-children="">

Oxy is Cloudflare's Rust-based next generation proxy framework

2023-03-02

10 min read
Oxy is Cloudflare’s morphing next generation proxy framework

In this blog post, we are proud to introduce Oxy - our modern proxy framework, developed using the Rust programming language. Oxy is a foundation of several Cloudflare projects, including the Zero Trust Gateway, the iCloud Private Relay second hop proxy, and the internal egress routing service.

Oxy leverages our years of experience building high-load proxies to implement the latest communication protocols, enabling us to effortlessly build sophisticated services that can accommodate massive amounts of daily traffic.

We will be exploring Oxy in greater detail in upcoming technical blog posts, providing a comprehensive and in-depth look at its capabilities and potential applications. For now, let us embark on this journey and discover what Oxy is and how we built it.

What Oxy does

We refer to Oxy as our "next-generation proxy framework". But what do we really mean by “proxy framework”? Picture a server (like NGINX, that reader might be familiar with) that can proxy traffic with an array of protocols, including various predefined common traffic flow scenarios that enable you to route traffic to specific destinations or even egress with a different protocol than the one used for ingress. This server can be configured in many ways for specific flows and boasts tight integration with the surrounding infrastructure, whether telemetry consumers or networking services.

Now, take all of that and add in the ability to programmatically control every aspect of the proxying: protocol decapsulation, traffic analysis, routing, tunneling logic, DNS resolution, and so much more. And this is what Oxy proxy framework is: a feature-rich proxy server tightly integrated with our internal infrastructure that's customizable to meet application requirements, allowing engineers to tweak every component.

This design is in line with our belief in an iterative approach to development, where a basic solution is built first and then gradually improved over time. With Oxy, you can start with a basic solution that can be deployed to our servers and then add additional features as needed, taking advantage of the many extensibility points offered by Oxy. In fact, you can avoid writing any code, besides a few lines of bootstrap boilerplate and get a production-ready server with a wide variety of startup configuration options and traffic flow scenarios.

High-level Oxy architecture

High-level Oxy architecture

For example, suppose you'd like to implement an HTTP firewall. With Oxy, you can proxy HTTP(S) requests right out of the box, eliminating the need to write any code related to production services, such as request metrics and logs. You simply need to implement an Oxy hook handler for HTTP requests and responses. If you've used Cloudflare Workers before, then you should be familiar with this extensibility model.

Similarly, you can implement a layer 4 firewall by providing application hooks that handle ingress and egress connections. This goes beyond a simple block/accept scenario, as you can build authentication functionality or a traffic router that sends traffic to different destinations based on the geographical information of the ingress connection. The capabilities are incredibly rich, and we've made the extensibility model as ergonomic and flexible as possible. As an example, if information obtained from layer 4 is insufficient to make an informed firewall decision, the app can simply ask Oxy to decapsulate the traffic and process it with HTTP firewall.

The aforementioned scenarios are prevalent in many products we build at Cloudflare, so having a foundation that incorporates ready solutions is incredibly useful. This foundation has absorbed lots of experience we've gained over the years, taking care of many sharp and dark corners of high-load service programming. As a result, application implementers can stay focused on the business logic of their application with Oxy taking care of the rest. In fact, we've been able to create a few privacy proxy applications using Oxy that now serve massive amounts of traffic in production with less than a couple of hundred lines of code. This is something that would have taken multiple orders of magnitude more time and lines of code before.

As previously mentioned, we'll dive deeper into the technical aspects in future blog posts. However, for now, we'd like to provide a brief overview of Oxy's capabilities. This will give you a glimpse of the many ways in which Oxy can be customized and used.

On-ramps

On-ramp defines a combination of transport layer socket type and protocols that server listeners can use for ingress traffic.

Oxy supports a wide variety of traffic on-ramps:

  • HTTP 1/2/3 (including various CONNECT protocols for layer 3 and 4 traffic)

  • TCP and UDP traffic over Proxy Protocol

  • general purpose IP traffic, including ICMP

With Oxy, you have the ability to analyze and manipulate traffic at multiple layers of the OSI model - from layer 3 to layer 7. This allows for a wide range of possibilities in terms of how you handle incoming traffic.

One of the most notable and powerful features of Oxy is the ability for applications to force decapsulation. This means that an application can analyze traffic at a higher level, even if it originally arrived at a lower level. For example, if an application receives IP traffic, it can choose to analyze the UDP traffic encapsulated within the IP packets. With just a few lines of code, the application can tell Oxy to upgrade the IP flow to a UDP tunnel, effectively allowing the same code to be used for different on-ramps.

The application can even go further and ask Oxy to sniff UDP packets and check if they contain HTTP/3 traffic. In this case, Oxy can upgrade the UDP traffic to HTTP and handle HTTP/3 requests that were originally received as raw IP packets. This allows for the simultaneous processing of traffic at all three layers (L3, L4, L7), enabling applications to analyze, filter, and manipulate the traffic flow from multiple perspectives. This provides a robust toolset for developing advanced traffic processing applications.

Multi-layer traffic processing in Oxy applications

Multi-layer traffic processing in Oxy applications

Off-ramps

Off-ramp defines a combination of transport layer socket type and protocols that proxy server connectors can use for egress traffic.

Oxy offers versatility in its egress methods, supporting a range of protocols including HTTP 1 and 2, UDP, TCP, and IP. It is equipped with internal DNS resolution and caching, as well as customizable resolvers, with automatic fallback options for maximum system reliability. Oxy implements happy eyeballs for TCP, advanced tunnel timeout logic and has the ability to route traffic to internal services with accompanying metadata.

Additionally, through collaboration with one of our internal services (which is an Oxy application itself!) Oxy is able to offer geographical egress — allowing applications to route traffic to the public Internet from various locations in our extensive network covering numerous cities worldwide. This complex and powerful feature can be easily utilized by Oxy application developers at no extra cost, simply by adjusting configuration settings.

Tunneling and request handling

We've discussed Oxy's communication capabilities with the outside world through on-ramps and off-ramps. In the middle, Oxy handles efficient stateful tunneling of various traffic types including TCP, UDP, QUIC, and IP, while giving applications full control over traffic blocking and redirection.

Additionally, Oxy effectively handles HTTP traffic, providing full control over requests and responses, and allowing it to serve as a direct HTTP or API service. With built-in tools for streaming analysis of HTTP bodies, Oxy makes it easy to extract and process data, such as form data from uploads and downloads.

In addition to its multi-layer traffic processing capabilities, Oxy also supports advanced HTTP tunneling methods, such as CONNECT-UDP and CONNECT-IP, using the latest extensions to HTTP 3 and 2 protocols. It can even process HTTP CONNECT request payloads on layer 4 and recursively process the payload as HTTP if the encapsulated traffic is HTTP.

Recursive processing of HTTP CONNECT body payload in HTTP pipeline

Recursive processing of HTTP CONNECT body payload in HTTP pipeline

TLS

The modern Internet is unimaginable without traffic encryption, and Oxy, of course, provides this essential aspect. Oxy's cryptography and TLS are based on BoringSSL, providing both a FIPS-compliant version with a limited set of certified features and the latest version that supports all the currently available TLS features. Oxy also allows applications to switch between the two versions in real-time, on a per-request or per-connection basis.

Oxy's TLS client is designed to make HTTPS requests to upstream servers, with the functionality and security of a browser-grade client. This includes the reconstruction of certificate chains, certificate revocation checks, and more. In addition, Oxy applications can be secured with TLS v1.3, and optionally mTLS, allowing for the extraction of client authentication information from x509 certificates.

Oxy has the ability to inspect and filter HTTPS traffic, including HTTP/3, and provides the means for dynamically generating certificates, serving as a foundation for implementing data loss prevention (DLP) products. Additionally, Oxy's internal fork of BoringSSL, which is not FIPS-compliant, supports the use of raw public keys as an alternative to WebPKI, making it ideal for internal service communication. This allows for all the benefits of TLS without the hassle of managing root certificates.

Gluing everything together

Oxy is more than just a set of building blocks for network applications. It acts as a cohesive glue, handling the bootstrapping of the entire proxy application with ease, including parsing and applying configurations, setting up an asynchronous runtime, applying seccomp hardening and providing automated graceful restarts functionality.

With built-in support for panic reporting to Sentry, Prometheus metrics with a Rust-macro based API, Kibana logging, distributed tracing, memory and runtime profiling, Oxy offers comprehensive monitoring and analysis capabilities. It can also generate detailed audit logs for layer 4 traffic, useful for billing and network analysis.

To top it off, Oxy includes an integration testing framework, allowing for easy testing of application interactions using TypeScript-based tests.

Extensibility model

To take full advantage of Oxy's capabilities, one must understand how to extend and configure its features. Oxy applications are configured using YAML configuration files, offering numerous options for each feature. Additionally, application developers can extend these options by leveraging the convenient macros provided by the framework, making customization a breeze.

Suppose the Oxy application uses a key-value database to retrieve user information. In that case, it would be beneficial to expose a YAML configuration settings section for this purpose. With Oxy, defining a structure and annotating it with the #[oxy_app_settings] attribute is all it takes to accomplish this:

///Application’s key-value (KV) database settings
#[oxy_app_settings]
pub struct MyAppKVSettings {
    /// Key prefix.
    pub prefix: Option<String>,
    /// Path to the UNIX domain socket for the appropriate KV 
    /// server instance.
    pub socket: Option<String>,
}

Oxy can then generate a default YAML configuration file listing available options and their default values, including those extended by the application. The configuration options are automatically documented in the generated file from the Rust doc comments, following best Rust practices.

Moreover, Oxy supports multi-tenancy, allowing a single application instance to expose multiple on-ramp endpoints, each with a unique configuration. But, sometimes even a YAML configuration file is not enough to build a desired application, this is where Oxy's comprehensive set of hooks comes in handy. These hooks can be used to extend the application with Rust code and cover almost all aspects of the traffic processing.

To give you an idea of how easy it is to write an Oxy application, here is an example of basic Oxy code:

struct MyApp;

// Defines types for various application extensions to Oxy's
// data types. Contexts provide information and control knobs for
// the different parts of the traffic flow and applications can extend // all of them with their custom data. As was mentioned before,
// applications could also define their custom configuration.
// It’s just a matter of defining a configuration object with
// `#[oxy_app_settings]` attribute and providing the object type here.
impl OxyExt for MyApp {
    type AppSettings = MyAppKVSettings;
    type EndpointAppSettings = ();
    type EndpointContext = ();
    type IngressConnectionContext = MyAppIngressConnectionContext;
    type RequestContext = ();
    type IpTunnelContext = ();
    type DnsCacheItem = ();

}
   
#[async_trait]
impl OxyApp for MyApp {
    fn name() -> &'static str {
        "My app"
    }

    fn version() -> &'static str {
        env!("CARGO_PKG_VERSION")
    }

    fn description() -> &'static str {
        "This is an example of Oxy application"
    }

    async fn start(
        settings: ServerSettings<MyAppSettings, ()>
    ) -> anyhow::Result<Hooks<Self>> {
        // Here the application initializes various hooks, with each
        // hook being a trait implementation containing multiple
        // optional callbacks invoked during the lifecycle of the
        // traffic processing.
        let ingress_hook = create_ingress_hook(&settings);
        let egress_hook = create_egress_hook(&settings);
        let tunnel_hook = create_tunnel_hook(&settings);
        let http_request_hook = create_http_request_hook(&settings);
        let ip_flow_hook = create_ip_flow_hook(&settings);

        Ok(Hooks {
            ingress: Some(ingress_hook),
            egress: Some(egress_hook),
            tunnel: Some(tunnel_hook),
            http_request: Some(http_request_hook),
            ip_flow: Some(ip_flow_hook),
            ..Default::default()
        })
    }
}

// The entry point of the application
fn main() -> OxyResult<()> {
    oxy::bootstrap::<MyApp>()
}

Technology choice

Oxy leverages the safety and performance benefits of Rust as its implementation language. At Cloudflare, Rust has emerged as a popular choice for new product development, and there are ongoing efforts to migrate some of the existing products to the language as well.

Rust offers memory and concurrency safety through its ownership and borrowing system, preventing issues like null pointers and data races. This safety is achieved without sacrificing performance, as Rust provides low-level control and the ability to write code with minimal runtime overhead. Rust's balance of safety and performance has made it popular for building safe performance-critical applications, like proxies.

We intentionally tried to stand on the shoulders of the giants with this project and avoid reinventing the wheel. Oxy heavily relies on open-source dependencies, with hyper and tokio being the backbone of the framework. Our philosophy is that we should pull from existing solutions as much as we can, allowing for faster iteration, but also use widely battle-tested code. If something doesn't work for us, we try to collaborate with maintainers and contribute back our fixes and improvements. In fact, we now have two team members who are core team members of tokio and hyper projects.

Even though Oxy is a proprietary project, we try to give back some love to the open-source community without which the project wouldn’t be possible by open-sourcing some of the building blocks such as https://github.com/cloudflare/boring and https://github.com/cloudflare/quiche.

The road to implementation

At the beginning of our journey, we set out to implement a proof-of-concept  for an HTTP firewall using Rust for what would eventually become Zero Trust Gateway product. This project was originally part of the WARP service repository. However, as the PoC rapidly advanced, it became clear that it needed to be separated into its own Gateway proxy for both technical and operational reasons.

Later on, when tasked with implementing a relay proxy for iCloud Private Relay, we saw the opportunity to reuse much of the code from the Gateway proxy. The Gateway project could also benefit from the HTTP/3 support that was being added for the Private Relay project. In fact, early iterations of the relay service were forks of the Gateway server.

It was then that we realized we could extract common elements from both projects to create a new framework, Oxy. The history of Oxy can be traced back to its origins in the commit history of the Gateway and Private Relay projects, up until its separation as a standalone framework.

Since our inception, we have leveraged the power of Oxy to efficiently roll out multiple projects that would have required a significant amount of time and effort without it. Our iterative development approach has been a strength of the project, as we have been able to identify common, reusable components through hands-on testing and implementation.

Our small core team is supplemented by internal contributors from across the company, ensuring that the best subject-matter experts are working on the relevant parts of the project. This contribution model also allows us to shape the framework's API to meet the functional and ergonomic needs of its users, while the core team ensures that the project stays on track.

Relation to Pingora

Although Pingora, another proxy server developed by us in Rust, shares some similarities with Oxy, it was intentionally designed as a separate proxy server with a different objective. Pingora was created to serve traffic from millions of our client’s upstream servers, including those with ancient and unusual configurations. Non-UTF 8 URLs or TLS settings that are not supported by most TLS libraries being just a few such quirks among many others. This focus on handling technically challenging unusual configurations sets Pingora apart from other proxy servers.

The concept of Pingora came about during the same period when we were beginning to develop Oxy, and we initially considered merging the two projects. However, we quickly realized that their objectives were too different to do that. Pingora is specifically designed to establish Cloudflare’s HTTP connectivity with the Internet, even in its most technically obscure corners. On the other hand, Oxy is a multipurpose platform that supports a wide variety of communication protocols and aims to provide a simple way to develop high-performance proxy applications with business logic.

Conclusion

Oxy is a proxy framework that we have developed to meet the demanding needs of modern services. It has been designed  to provide a flexible and scalable solution that can be adapted to meet the unique requirements of each project and by leveraging the power of Rust, we made it both safe and fast.

Looking forward, Oxy is poised to play one of the critical roles in our company's larger effort to modernize and improve our architecture. It provides a solid block in foundation on which we can keep building the better Internet.

As the framework continues to evolve and grow, we remain committed to our iterative approach to development, constantly seeking out new opportunities to reuse existing solutions and improve our codebase. This collaborative, community-driven approach has already yielded impressive results, and we are confident that it will continue to drive the future success of Oxy.

Stay tuned for more tech savvy blog posts on the subject!

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
ProxyingRustPerformanceEdgeiCloud Private RelayCloudflare GatewayOxy

Follow on X

Cloudflare|@cloudflare

Related posts

September 23, 2024 1:00 PM

Network performance update: Birthday Week 2024

Since June 2021, we’ve been measuring and ranking our network performance against the top global networks in the world. We use this data to improve our performance, and to share the results of those initiatives. In this post, we’re going to share with you how network performance has changed since our last post in March 2024, and discuss the tools and processes we are using to assess network performance. ...