Methodology
Why I Built This
Web page bloat has been a hot topic lately, and I got curious. How much data are popular sites actually loading on a first visit? I couldn't find a good side-by-side comparison, so I built one. I crawl dozens of well-known websites every few hours and publish what I find.
How It Works
Every 6 hours, I load each site in a real Chromium browser via Cloudflare's Browser Rendering API. I capture every network request, recording the URL, transfer size, resource type, and status code. I also take a screenshot. No ad blockers, no login, just a clean first visit.
For each site, I track total page weight, request count, resource breakdown (JS, CSS, images, fonts, media), and first-party vs third-party split. Third-party domains get classified into categories like advertising, analytics, social, and CDN using a curated list I maintain.
Some sites detect headless browsers and serve lighter pages, block content, or show challenge screens. Sites like CNN.com and Chevy.com actively block automated browsers, so I couldn't include them. More advanced techniques like residential proxies or Browserbase could help, and I may explore those later. For now, I'm using a straightforward Chromium session from Cloudflare's datacenter. No tricks.
Because of this, the numbers here probably understate real-world bloat. I also only measure the homepage, and results can vary between runs due to A/B testing, ad rotation, and cookie consent dialogs.
The Stack
Everything runs on Cloudflare. The crawler is a Worker on a cron that enqueues one job per site to a Queue. Each job launches a browser session via Browser Rendering. Results go into KV, screenshots into R2. The site you're reading is an Astro app on Workers with Tailwind CSS. Infrastructure is managed with OpenTofu. It's a TypeScript pnpm monorepo, simple and cheap to run.
About Me
I'm Chris Ebert. This is a hobby project, and I plan to open-source the code on GitHub soon. If you have feedback, want to suggest a site, or just want to chat:
- Blog: chrisebert.net
- GitHub: cebert (personal) / ceberttylertech (work)
- X: @realchrisebert
- Bluesky: @realchrisebert.bsky.social