A few weeks ago, one of the SDF’s Mastodon instances had its TLS cert expire. My understanding is, the cert had been renewed but the server had to be restarted to pick up the change, and for whatever reason, that just didn’t happen. In the conversation on BBOARD (the SDF’s virtual bulletin board) about the outage, was this post:
TACKER: mcornick (Mark Cornick)
SUBJECT: .. mastodon cert
DATE: 02-Feb-24 19:56:47
HOST: icelandI really hope that this gets addressed soon, Mastodon is essentially “down” for a lot of people who depend on SDF right now.
As cseiler noted, a new certificate is in place, but the server is still catching up with its backlog.
(my opinions follow, feel free to disregard)
I would advise not “depend(ing) on SDF” for anything. This is not a professional, businesslike service with SLAs or guarantees. It’s a best-effort deal, and SSL certificates are a recurring issue that SDF does not seem to be interested in addressing beyond just telling everyone to be patient. If your use of Mastodon, or any other SDF service, depends on you being able to access it RIGHT NOW when a cert has expired, I hate to say it but you should look elsewhere for that kind of usage. SDF offers a lot of fun stuff, but that’s all it is - fun, not serious.
That resonated with me. These are services run by people, for people, on the timeline that people can keep, not an industry-driven timeline. There’s something amiss when services intended to be for a community, by a community, are placed under industry demands, and I think that how we got here is part of the problem.
I grew up with a web that was a little bit DIY. A little bit stitched together. Somewhere along the way, with the rise of the corporate web, we saw a shift towards five nines availability, the promise of 24/7/365 (and better) uptime of sites, the idea that you should always be able to get the shiny new thing, from a shiny website. The expectations of the commercial world have trickled down into community spaces, and we see it in all the myriad ways people try to replicate large commercial setups for their small, community projects.
I’m coming to realize, I don’t actually want that. I don’t think we need 100% uptime and availability, because that perfection has a real, tangible cost. I see it all around me, and I’ll be honest, it’s just not necessary. We can all deal if a website’s down for a bit, or a service doesn’t update instantly. Capitalism might not like it, but this isn’t about making money. It’s about having sustainable, long term viable communities and services to support them.
Sustainable services
Money talks
There’s a lot of ways to talk about sustainability, so I’m going to start with the most readily quantified: money. An expensive service is inherently less sustainable in the long term, because it necessarily draws more resources from its community to operate. Festivals and feast-days are rare, because they require intense resource allocation that draws from the community’s available funds without replacement in the near term—but they are themselves important. If you operate like every day is a feast-day, however, you’re draining resources constantly.
When looking at operating a high-availability, high-reliability service constantly, there’s a financial burden that grows superlinearly to the rate of availability intended. If you want two instances of a service running in parallel to ensure there’s better service, faster, well… now you’re not only paying to operate two instances of the service, you’re paying for the infrastructure on top of those instances to keep them both maintained, and to enable handoff between them. (I’ll admit, this example is a little weak, as I can easily see a reason to have a cold—or even warm—spare for handoff, but beyond that, you’re headed for commercial ops scale.)
Virtual machines, physical hardware
On top of burning money for a level of reliability we generally don’t need, that overspend means you’re driving many more cores and dedicating more physical resources to the replication; that’s an environmental burden imposed by your desire to be high availability. Keeping the lights on for each running copy of your service means allocating physical machines somewhere to your uptime. Is three times the CPU time worth it?
In the era of blockchains and wasteful power consumption for “proof of work”, it feels like I’m screaming into the void about reducing power usage, but realistically, we can’t just green-energy our way out of the cloud’s power budget. We need to reduce overall CPU usage for the future of our environment, and choosing to run your service just that little bit less reliably may just be the thing to do.
Social pressures
Beyond the monetary and environmental costs, there’s a human cost to very high uptime, which we experience most as burnout and all the social toils of “on-call”. Lots of my friends, if they want to go out on weekends while they’re on call, they’ve got a backpack along. The idea that we should all just have 24/7 jobs once in a while is absurd. By and large, websites are not so fundamentally critical as to require this sort of uptime, and … well, I’m writing this in Notion. Maybe stuff should be designed to work offline, too. (Side note: Notion actually doesn’t work for anything beyond simple note-taking, and this is reminding me why.)
As to how a rougher, somewhat less reliable web benefits its community: it builds in some need for patience, and a bit of concern for each other (and the welfare of the community as a whole.) If you’re petulant because your service is down, that can play out as a negative interaction with your sysadmins, even though you both want the same outcome. That’s no fun for the people doing the maintenance, and the users are mad. If the people using sites understand that the people running those sites are themselves human with their own lives and needs—and perhaps fewer people than you’d imagine run a commercial service (which may itself be fewer than you think)—it’s easier to understand when something isn’t fixed right away.
Simpler websites, lighter footprints
There’s another aspect of the smaller web that I want to touch on. Alongside the unsustainable website cost dialogue, we should also be encouraging site developers to build smaller sites with less tech. Websites simply are too big, too flashy, and most of it’s completely unnecessary. Even my blog routinely has a several kilobyte load size because I want to include a particular font—and I agonize over that. Meanwhile, news sites serve up megabytes of javascript just to … make something bounce as you mouse over it? (And serve ads. Can’t forget the ads, really.)
We live in an era with fantastically capable devices in our pockets. Why does my phone take so long to load some pages, and why are they so hard to read? Web design lost its way, and we’ve never really recovered.
Part of why I’ve built my website to be so minimal is because while I like some styling, I want to recapture some of that TUI look and feel—and I wanted to have a tiny footprint, ideally served by only my host (thank you, SDF). This all results in a site that looks like I want—and which loads in a pretty acceptable timespan, roughly a third of a second on my (admittedly) fast internet. The most embarrassing aspect of it is the fonts. I use Font Awesome Free for the Mastodon and Creative Commons logos, and for the RSS icon; that’s 259kB, more than my entire (un-minified) homepage, and only a little bit more than the Cascadia Code webfont that I include. I’ve considered just abandoning having one specific font and saying “typeset this in monospace!” but… I can have nice things too, right? (Looking at the MetaARPA transfer limits… I might well switch to monospace and minify my Font Awesome setup, honestly. Or see what else I can reduce, that’s pretty heavy.)
Critical services
I’ve been workshopping this commentary for a few days, and there is one argument that I’ve heard a few times: “but this doesn’t apply to critical things!” And that’s true and correct. If lives depend on your service, you should be maintaining that high availability. Medical services, key communications backbones, in some cases financial mechanisms: all of these deserve high uptime. To achieve that, however, those services should be paying enough and have enough staff to achieve reasonable on-call periods. Where teams are globally distributed, follow-the-sun models of on-call can reduce off-hours calls, and where they’re not, either have enough staff to handle shifts, or reduce workload during on-call.
There’s no excuse, in an industry so flush with cash, to have on-call rotations that destroy the people working them. Human effort is one of the most limited resources we have, we should be cautious about needless waste.
Against mediocrity
This is emphatically not a call to accept bad experiences, served in a lackluster manner. I’d argue that the vast majority of the commercial internet is exactly that right now: on a daily basis, I deal with slow-loading sites that have special interstitial graphics just to tell me something is working real hard in the background to load something you want. (The local movie theatre, for instance, took a lot of time to give me showtimes for Dune Part 2 when I was booking tickets yesterday.) Instead, I want excellent, simple, easy experiences that take almost no time to load—even if they’ve got some styling—and that just work. Maybe they’re offline when it’s 03:00 because they have to update a database. Maybe I should be offline at 03:00, too.
Fundamentally, I think the usability and utility of websites can be made better, with far less resource consumption. This isn’t a new gripe or statement, but dammit, I gotta say it myself too. Make the web more human-scale, for everyone.