The new 0.19.0 release of Lemmy has been released! 🎉

To take advantage of some of the new features, such as the previously mentioned account import/export feature, The Outpost will be performing some maintenance at 9PM EST to start this upgrade. That is in about 11 hours from the time of this post. The release notes indicate that the upgrade should take less than 30 minutes, but given the hardware of the system that The Outpost is running under, this might take a bit over the 30 minute window to account for the backup time, and the actual database migration time. Of course, we’ll keep you updated on the progress over at our status page just so that you’re kept in the loop!

Keep in mind, that 0.19.0 has some breaking API changes that may impact third party clients and alternative frontends that have not been updated in a while, however most have already published updates beforehand to take this into account.

The BitForged Space will also be updating to this release about an hour before The Outpost just to ensure that there are no major problems that would cause a need to fully reverse course (at the moment, it is just Myth and I on the instance so worse-case scenario it would impact us) - but this also means that assuming all goes well, you can import your settings from The Outpost over there once the upgrade has completed for both instances.

On an additional note, even though 0.19.0 is out now, I don’t have any plans on immediately decommissioning The Outpost right afterwards of course. There is still no timeline on this as of yet given that we’re still trying to sort out what hardware we’re going to keep, what will be getting dropped, etc - the results of this will be when a timeline starts to be established. Once that has been drafted up, it’ll be posted here to keep everyone updated as well.

If you have any questions, please don’t hesitate to let me know!

  • russjr08@outpost.zeuslink.netOPM
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Well, that took a lot more blood, sweat, and tears than I thought it would.

    Usually when performing an update, I do the following:

    • Take a snapshot of the VM
    • Change the version number in Lemmy docker-compose.yml file for both the lemmy and lemmy-ui containers
    • Re-create the containers, and start following the logs
    • If the database migration (if any) appears to be taking longer than expected, I temporarily disable the reverse-proxy so that Lemmy isn’t getting slammed while trying to perform database migrations (and then re-enable it once complete)
    • Upon any issues, examine where things might’ve gone wrong, adjust if needed, and worse-case scenario rollback to the snapshot created at the start

    Everything was going to plan until step 4, the database migrations. After about 30 minutes of database migrations running, I shut off external access to the instance. Once we got to the hour and a half mark, I went ahead and stopped the VM and began rolling back to the snapshot… Except normally a snapshot restore doesn’t take all that long (maybe an hour at most), so when I stepped back 3 hours later and saw that it had performed about 20% of the restore that’s where things started going wrong. It seemed like the whole hypervisor was practically buckling while attempting to perform the restore. So I thought, okay I’ll move it back to hypervisor “A” (“Zeus”)… except then I forgot why I initially migrated it to hypervisor “B” (“Atlas”) which was that Zeus was running critically low on storage, and could no longer host the VM for the instance. I thought “Okay, sure we’ll continue running it on Atlas then, let me go re-enable the reverse-proxy (which is what allows external traffic into Lemmy, since the containers/VM is on an internal network)”… which then lead me to find out that the reverse-proxy VM was… dead. It was running Nginx, nothing seemed to show any errors, but I figured “Let’s try out Caddy” (which I’ve started using on our new systems) - that didn’t work either. It was at that point that I realized I couldn’t even ping that VM from its public IP address - even after dropping the firewall. Outbound traffic worked fine, none of the configs had changed, no other firewalls in place… just nothing. Except I could get 2 replies to a continuous ping in between the time the VM was initializing and finished starting up, after that it was once again silent.

    So, I went ahead and made some more storage available on Zeus by deleting some VMs (including my personal Mastodon instance, which thankfully I had already migrated my account over to our new Mastodon instance a week before) and attempted to restore Lemmy onto Zeus. Still, I noticed that the same behavior of a slow restore was happening even on this hypervisor, and everything on the hypervisor was coming to a crawl while it was on-going.

    This time I just let the restore go on, which took numerous hours. Finally it completed, and I shut down just about every other VM and container on the hypervisor, once again followed my normal upgrade paths, and crossed my fingers. It still took about 30 minutes for database migrations to complete, but it did end up completing. Enabled the reverse-proxy config, and updated the DNS record for the domain to point back to Zeus, and within 30 seconds I could see federation traffic coming in once again.

    What an adventure, to say the least. I still haven’t been able to determine why both hypervisors come to a crawl with very little running on them. I suspect one or more drives are failing, but its odd for to occur on both hypervisors at around the same time, and SMART data for none of the drives show any indications of failure (or even precursors to failure) so I honestly do not know. It does however tell me that its pretty much time to sunset these systems sooner rather than later since the combination of the systems and the range of IP addresses that I have for them comes out to about ~$130 a month. While I could probably request most of the hardware to be swapped out and completely rebuild them from scratch, it’s just not worth the hassle considering that my friend and I have picked up a much newer system (the one mentioned in my previous announcement post and with us splitting the cost it comes out to about the same price.

    Given this, the plan at this point is to renew these two systems for one more month when the 5th comes around, meaning that they will both be decommissioned on the 5th of February. This is to give everyone a chance to migrate their profile settings from The Outpost over to The BitForged Space as both instances are now running Lemmy 0.19.0 (to compare, the instance over at BitForged took not even five minutes to complete its database migrations - I spent more time verifying everything was alright) and to also give myself a bit more time to ensure I can get all of my other personal services migrated over, along with any important data.

    I’ve had these systems for about three years now, and they’ve served me quite well! However, its very clear that the combination of the dated specs, and lack of setting things up in a more coherent way (I was quite new to server administration at the time) is showing that its time to mark this chapter, and turn the page.

    Oh, and to top off the whole situation, my status page completely died during the process too - the container was running (as I was still receiving numerous notifications as various services went up and down), however inbound access was also not working either… So I couldn’t even provide an update on what was going on. I am sorry to have inconvenienced everyone with how long the update process took, and it wasn’t my intention to make it seem as if The Outpost completely vanished off the planet. However I figured it was worth it to spend my time focusing on bringing the instance back online instead of side-tracking to investigate what happened to the status page.

    Anyways, with all that being said, we’re back for now! But it is time for everyone to finish their last drink while we wrap things up.