I have a load-bearing raspberry pi on my network - it runs a DNS server, zigbee2mqtt, unifi controller, and a restic rest server. This raspberry pi, as is tradition, boots from a microSD card. As we all know, microSD cards suck a little bit and die pretty often; I've personally had this happen not all that long ago.
I'd like to keep a reasonably up-to-date hot spare ready, so when it does give up the ghost I can just swap them out and move on with my life. I can think of a few ways to accomplish this, but I'm not really sure what's the best:
- The simplest is probably cron + dd, but I'm worried about filesystem corruption from imaging a running system and could this also wear out the spare card?
- recreate partition structure, create an fstab with new UUIDs, rsync everything else. Backups are incremental and we won't get filesystem corruption, but we still aren't taking a point-in-time backup which means data files could be inconsistent with each other. (honestly unlikely with the services I'm running.)
- Migrate to BTRFS or ZFS, send/receive snapshots. This would be annoying to set up because I'd need to switch the rpi's filesystem, but once done I think this might be the best option? We get incremental updates, point-in-time backups, and even rollback on the original card if I want it.
I'm thinking out loud a little bit here, but do y'all have any thoughts? I think I'm leaning towards ZFS or BTRFS.
I've always used dd + sshfs to backup the entire sd card daily at midnight to an ssh server; retaining 2 weeks of backups.
Should the card die, I've just gotta write the last backup to a new card and pop it in. If that one's not good, I've got 13 others I can try.
I've only had to use it once and that went smoothly. I've tested half a dozen backups though and no issues there either.
Do you dd the device directly, and while it is running for this?
Yeah "dd if=/dev/mmcblk0 of=$HOSTNAME.$(date +%Y.%m.%d).img" and while its running. (!!! Make sure the output is NOT going to the sd card you are backing up....)
I deliberately chose a time when it's not very active to perform the backup. Never had an issue, going on 6 years now.