On Netlify’s bandwidth bill & saving thousands of dollars

Short rant not on Netlify’s billing but more into engineering teams’ inertia for SaaS tools, leading to expensive bills.

A few things make me feel icky at work. Lack of communication, mixed signals, credit stealing. But recently, it has been ignorant spending on supposed industry-leading SaaS solution(s) that offer no competitive benefit yet almost cost half of someone’s monthly salary.

These products go by many names. They slip by quarterly reviews of SaaS subscriptions for historical reasons or little context about the tool’s use. This is the thought process of “What ain’t broke …” actually starts losing money. All it takes is someone willing to dig.

This is the story of being a firestarter.

  • It’s all about identifying a place to make a fire (Spec an improvement),
  • Starting the fire (Start the work)
  • Rallying the team around it to keep it going (Making iterations)
  • In the end, delivering the cooked improvement, feature, or product (Release MVP)

I enjoy the role and have loved keeping the team’s energy (this fire) high, even if I was a contractor at this particular organization.

Surprisingly, this time it was Netlify.

Worrying doesn’t begin to describe how much hosting a few static sites cost the company until I found out about it. Full disclaimer and credit to Netlify, someone else’s historically bad decisions are not the product’s fault. Their pricing is clear, the docs are informational, and their developer experience is at least better than most competitors.

The decision to start the work & build on Netlify was initially made with good intentions. I didn’t see any reason good enough to keep low five figures to host a bunch of React/Gatsby websites. I follow Chesteron’s fence closely in the firestarter line of work. With all things considered, I was determined to wipe Netlify out of the organization’s stack. Even if that’s not what I was contracted to do.

Ladies and gentlemen, we have identified a fire.

Whoa! but why did Netlify cost this much?

Context. The company hosted the marketing site, blog, landing page, and docs on Netlify, along with 30 or so other projects on Netlify. Projects were hosted as website.netlify.app or blog.netlify.app. Each project is assigned a domain, for example, company.io, which goes to the website. Since it’s the company’s docs and blogs, they naturally wanted other components on the same `company.io` domain as a subdomain.

Examples:

company.io 
product.io
blog.company.io
docs.company.io

To enable this structure, you would need to tell the website to redirect blog.company.io subdomain to the actual blog.netlify.app netlify site. This can be achieved with CNAME records added to Netlify DNS. The organization used Cloudflare to manage domain DNS rather than Netlify DNS to make matters from less ideal to worse. Hence, the same behavior was achieved using 301 redirects to a different domain. Here’s an example:

COMPANY.io --> company.netlify.app [200 success]
SOMEPRODUCT.io  --> someproduct.netlify.app [200 success]
docs.COMPANY.io --> company.netlify.app --> docs.herokuapp.com [301 redirect]
blog.COMPANY.io --> company.netlify.app --> blog.netlify.app [301 redirect]

Issues

Firstly, we poorly used Netlify as a makeshift redirect store. Redirecting was a widespread problem, with some sites going as far as four hops affecting first content load and response time. Additionally, all these redirected counted under bandwidth consumption on Netlify for the COMPANY.io domain, which did the redirecting.

Bandwidth is obscenely expensive on Netlify. The organization used a lot of it. An absolute mind-boggling 4.5 Tb per month without breaking a sweat. As I am writing this today, we have already served over 124 gigs of data, as I can see from the migrated solution’s dashboard.

Secondly, The reason for 95% of bandwidth consumption; was an unoptimized dynamic landing page hosted on Netlify that the user is shown when the tool is successful. This tool had millions of users each month who saw this landing page. As the tool’s popularity grew over the years, bandwidth consumption also grew exponentially. With that one landing page gone, the bill can be swept under the rug, but the fire has already started.

Thirdly, ignorance. Here are the questions I asked, with no satisfying answers. Why have we been paying this bill when free alternatives like GitHub Pages exist? Why haven’t we considered moving to another platform with low-cost bandwidth for the 95% use case. Why is the organization using multiple cloud platforms like Heroku, Netlify, and GitHub pages. Netlify itself has no reliability benefits, and several posts from the community have been there asking about patchy reliability or false downtime events that have nothing to do with us.

I have had enough.

Start the fire: Looking for alternatives

I participated and even maintained these systems until I knew the atrocious cost we were paying for a subpar solution in the making. Heroku’s month-long outage and Netlify’s constant reliability issues that resulted in false positives had already irritated our on-call team. GitHub Pages is unable to come around as a production-ready solution that they provide support for. I took ownership to find alternatives in the dense Static Site hosting Forest to pro and con several products.

By talking directly to product owners across the organization. I identified a list of features we wanted in our next solution.

  1. Regex Redirect support for intra and inter domains
  2. Integrations with Github (CD)
  3. Custom domain support
  4. CDN support if possible
  5. Deployment & revision management with preview builds. 
  6. Uptime and reliability. Essential.
  7. Pricing plans, especially for bandwidth, builds, and members.
  8. SSL support by default.

Optional Features:

  1. Analytics source 
  2. Automated testing possibility (CI)
  3. Better performance 

Finally, we found an alternative.

After a lot of research and recommendations, Cloudflare Pages was the one that managed to check all the boxes for us. Key winners for CF pages:

  • Generous free plan with unlimited bandwidth usage, unlimited admin users, and 500 builds per month.
  • Cloudflare being Cloudflare you know reliability will be good.
  • First-class Cloudflare worker support to build full-stack applications if needed.
  • Tight integration with Github.
  • Actually provides support for the product. Update: Enterprise support sucks; the community is much better.
  • We were already on their enterprise plan using other products.
  • Domain configuration taken care off.
  • Clear documentation, hundreds of tutorials, and active community.

We took time to assess the product. We want to ensure this satisfies our company’s static site hosting needs. Three months later and thousands of dollars saved, I just finished migrating the company’s complete documentation to Cloudflare Pages. Around 30 sites total. They have many products.

We also migrated our documentation to Docusaurus with this move. My colleague and I built a tool called DocuBuilder. It uses GitHub actions, Docusaurus and Cloudflare Pages to deploy, maintain and configure all Docusaurus sites owned by our company. You can check it out here. Let me know if you have any feedback.

Experience with CloudFlage Pages

The satisfaction of shutting down our Heroku and Netlify accounts was like winning the marathon. I had to convince and negotiate with many stakeholders to migrate their deployment to CF pages ASAP and even helped them with it. I documented the process and improved several things in the pipeline. We are now running even more static sites on Cloudflare Pages. Everything just works. Here are some statistics.

Reliability has been patchy with minor incidents that have never affected production Cloudflare Pages deployments but are quickly resolved. My support experience during the migration could have been better. It took them a month to resolve a thread I opened that blocked our migration. So your mileage may vary.

With the move, the performance improvement has also been shocking since there is no longer a redirect hell, and many other people rave about the caching performance.

First image: Documentation website on Netlify vs on Cloudflare Pages.

Here’s a feature comparison between the two platforms.

Source: https://www.grm.digital/en/blog/cloudflare-pages-compared-to-netlify/ dated March 2022.

Important Note (Update 18/04/23)

The table above is outdated. Cloudflare now provides unlimited admins, rollback to any version and have tutorials to add Form handling as well. I have kept the table around for it’s clear head to head comparison. Refer to the website below for an up-to-date comparison hopefully.

Updated comparison: https://bejamas.io/compare/cloudflare-pages-vs-netlify/

Thanks Saumya for pointing this out on Twitter.


Conclusion: Don’t let bills creep in. Get better performance per dollar and migrate if necessary.

Static site hosting alternatives are a dime a dozen in the market. You can easily find something better if you look hard enough. By removing multiple static site solutions, I am glad to unintentionally provide clarity on what to use for static site hosting all over the organization. The existing team has been loving the new deployment pipeline, with other teams deploying new sites exclusively on Cloudflare Pages.

My recommendation: Do your own research with the tons of blogs, videos, and reviews on each platform. You are bound to find something you like.

If you like this, I will write a blog explaining our Cloudflare Pages + Docusaurus setup all over the organization, allowing us to deploy 20 distinct documentation sites from one configuration.

Follow the blog to stay tuned for that and the conversation on Twitter. Till then, live in the mix.

https://twitter.com/vipulgupta2048/status/1647599605104906241?s=20

Leave a Reply

Your email address will not be published. Required fields are marked *