Forum: >>> Magnum BBS <<<

Re: backup of backup or alternating backups?

From Henrik Ahlgren@21:1/5 to Default User on Mon Sep 30 19:00:02 2024

On Mon, 2024-09-30 at 12:39 -0400, Default User wrote:

But of course, any errors on drive A propagate daily to drive B.

Having both drives connected and spinning simultaneusly creates a
window of opportunity for some nasty ransomware (or a software bug,
mistake, power surge, whatever) to destroy both backups. Of course it
is safer to always have one copy offline.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Woodall@21:1/5 to Default User on Mon Sep 30 20:30:01 2024

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

On Mon, 30 Sep 2024, Default User wrote:

Hi!

On a thread at another mailing list, someone mentioned that they, each
day, alternate doing backups between two external usb drives. That got
me to thinking (which is always dangerous) . . .

I have a full backup on usb external drive A, "refreshed" daily using rsnapshot. Then, every day, I use rsync to make usb external drive B an "exact" copy of usb external drive A. It seemed to be a good idea,
since if drive A fails, I can immediately plug in drive B to replace
it, with no down time, and nothing lost.

But of course, any errors on drive A propagate daily to drive B.

So, is there a consensus on which would be better:�
1) continue to "mirror" drive A to drive B?
or,
2) alternate backups daily between drives A and B?

IMO it can take days, weeks even months to discover that something has
got corrupted and/or deleted in error.

I don't think either strategy is "better", they have different pros and
cons. But in particular, your strategy doesn't require both drives to be
online at once and at least gives you a one day window to discover that
you've synced corruption.

I think my strategy would be something more akin to the following (I
think rsnapshot can do this but I've not actually used it)

1. Alternate disks (as you are doing)
2. Create a new directory YYYYMMDD and backup into that directory,
creating hard links to the files from the previous backup (two days
before)
3. Delete the oldest directories as/when you start running out of space.

On a slight tangent, how does rsnapshot deal with ext4 uninited extents?
These are subtly different to sparse files, they're still not written to
disk but the disk blocks are explicitly reserved for the file:

truncate (sparse file) vs fallocate (blocks reserved)

I've noticed that, at least on bookworm, lseek for SEEK_HOLE/SEEK_DATA
treats fallocate as a hole similar to a sparse file. I haven't tested
tar with the --sparse option but I suspect it will treat the two types
of file the same too.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan Dowland@21:1/5 to Default User on Mon Sep 30 23:00:01 2024

On Mon Sep 30, 2024 at 5:39 PM BST, Default User wrote:

So, is there a consensus on which would be better:
1) continue to "mirror" drive A to drive B?
or,
2) alternate backups daily between drives A and B?

I'd go for (B), especially if you're continuing to do daily backups, so
the oldest backup with an alternating pattern is 2 days.

I do daily backups to a permanently-connected drive (I used to use
rsnapshot for that, then rdiff-backup, now I use borg); monthly syncs of
that drive to one of two external drives (rsync of the borg repository),
which live off-site and I alternate those. If I lose my
permanently-connected backup drive, one of the external drives is a
month old and the other two months old, which I am happy with.

--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ jmtd@debian.org
🔗 https://jmtd.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Mon Sep 30 23:40:01 2024

On 30 Sep 2024 13:12 -0400, from hunguponcontent@gmail.com (Default User):

Having both drives connected and spinning simultaneusly creates a
window of opportunity for some nasty ransomware (or a software bug,
mistake, power surge, whatever) to destroy both backups.

Also why I would not want all backup-storage devices connected
simultaneously. All it takes is one piece of software going haywire
and you may have a situation where both the original and all backups
are corrupted simultaneously.

Of course it is safer to always have one copy offline.

True. But easier (and cheaper) said than done. [...]

Not at all. Backup to one of those external drives one day; the other
one the next; the first one the day after that; and so on.

It seems to me that you already have everything you need to remove
this particular failure mode. You just need to tweak your usage
slightly.

I do that myself, except I switch approximately weekly rather than
daily.

--
Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Christensen@21:1/5 to Default User on Tue Oct 1 03:20:01 2024

On 9/30/24 09:39, Default User wrote:

Hi!

On a thread at another mailing list, someone mentioned that they, each
day, alternate doing backups between two external usb drives. That got
me to thinking (which is always dangerous) . . .

I have a full backup on usb external drive A, "refreshed" daily using rsnapshot. Then, every day, I use rsync to make usb external drive B an "exact" copy of usb external drive A. It seemed to be a good idea,
since if drive A fails, I can immediately plug in drive B to replace
it, with no down time, and nothing lost.

But of course, any errors on drive A propagate daily to drive B.

So, is there a consensus on which would be better:
1) continue to "mirror" drive A to drive B?
or,
2) alternate backups daily between drives A and B?

I migrated my data to a dedicated ZFS file server several years ago, in
part due to advanced ZFS backup features -- snapshots, compression, de-duplication, replication, etc.. I used FreeBSD, but Debian has ZFS
and should be able to do the same thing.

My live server has a ZFS pool with two striped mirrors of two 3 TB HDD's
each and a special mirror of two 180 GB SSD's:

2024-09-30 16:44:38 toor@f5 ~
# zpool iostat -v p5
capacity operations bandwidth
pool alloc free read write read write ------------------------------ ----- ----- ----- ----- ----- -----
p5 3.19T 2.39T 49 2 28.4M 69.2K
mirror-0 1.58T 1.14T 21 0 14.0M 10.7K
gpt/hdd1.eli - - 8 0 6.99M 5.35K
gpt/hdd2.eli - - 12 0 6.99M 5.35K
mirror-1 1.58T 1.13T 20 0 14.0M 10.4K
gpt/hdd3.eli - - 10 0 7.00M 5.20K
gpt/hdd4.eli - - 9 0 7.00M 5.20K special - - - - - -
mirror-2 29.4G 120G 7 2 408K 48.1K
gpt/ssd1.eli - - 3 1 204K 24.1K
gpt/ssd2.eli - - 3 1 204K 24.1K ------------------------------ ----- ----- ----- ----- ----- -----

The 'special' SSD mirror stores metadata, which improves overall
performance.

I create ZFS filesystems for groups of data -- Samba users, CVS
repository, rsync(1) backups of various non-ZFS filesystems, raw disk
image backups, etc..

ZFS has various properties that you can tune for each filesystem. Here
is the filesystem for my Samba data:

2024-09-30 16:50:07 toor@f5 ~
# zfs get all p5/samba/dpchrist | sort | egrep 'NAME|inherited'
NAME PROPERTY VALUE SOURCE p5/samba/dpchrist atime off
inherited from p5
p5/samba/dpchrist com.sun:auto-snapshot true
inherited from p5
p5/samba/dpchrist compression on
inherited from p5
p5/samba/dpchrist dedup verify
inherited from p5
p5/samba/dpchrist mountpoint /var/local/samba/dpchrist
inherited from p5/samba
p5/samba/dpchrist special_small_blocks 16K
inherited from p5

'atime' is off to eliminate metadata writes when files and directories
are read.

'com.sun:auto-snapshot' is true so that zfs-auto-snapshot(8) run via
crontab(1) will find this filesystem, take snapshots periodically
(daily, monthly, yearly), and manage (prune) those snapshots:

2024-09-30 16:54:00 toor@f5 ~
# crontab -l
9 3 * * * /usr/local/sbin/zfs-auto-snapshot -k d 40
21 3 1 * * /usr/local/sbin/zfs-auto-snapshot -k m 99
27 3 1 1 * /usr/local/sbin/zfs-auto-snapshot -k y 99

I currently have 96 snapshots (e.g. backups) of the above filesystem
going back three and a half years:

2024-09-30 16:59:48 dpchrist@f5 ~
$ ls -d /var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_[dmy]* | wc -l
96

2024-09-30 17:01:12 dpchrist@f5 ~
$ ls -dt /var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_[dmy]* |
tail -n 1 /var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_m-2020-03-01-00h21

'compression' is on so that compressible files are compressed. (The
default compression algorithm will skip files that are incompressible.)

'dedup' is on so that duplicate blocks are saved only once within the
pool. De-duplication metadata is stored on the pool 'special' SSD
mirror, which improves de-duplication performance.

'special_small_blocks' is set to 16K so that files of size 16 KiB and
smaller are stored on the pool 'special' SSD mirror, which improves
small file read and write performance.

I have a backup server with matching pool construction. I periodically replicate live server snapshots to the backup server (via SSH pull and pre-shared keys). I would like to automate this task.

Both servers have SATA HDD mobile rack bays:

https://www.startech.com/en-us/hdd/drw150satbk

I have a pair of 6 TB HDD's in corresponding mobile rack trays, one for near-site backups and one off-site backups. Each HDD contains one ZFS
pool. I periodically insert the near-site HDD into the backup server
and replicate the live server snapshots to the removable HDD. I
periodically rotate the near-site HDD and the off-site HDD.

Be warned that ZFS has a non-trivial learning curve. I suggest the
Lucas books if you are interested:

https://mwl.io/nonfiction/os#fmzfs

https://mwl.io/nonfiction/os#fmaz

David

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Tue Oct 1 16:20:01 2024

Also why I would not want all backup-storage devices connected simultaneously. All it takes is one piece of software going haywire
and you may have a situation where both the original and all backups
are corrupted simultaneously.

You can minimize this risk by having them both connected simultaneously
but to different machines (this is also necessary if you want A and B to
be in different physical locations, e.g. to survive disasters), and then
make sure the machine which copies from A to B doesn't have write access
to A.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Purgert@21:1/5 to Default User on Wed Oct 2 17:40:01 2024

On Sep 30, 2024, Default User wrote:

(...)
So, is there a consensus on which would be better:�
1) continue to "mirror" drive A to drive B?
or,
2) alternate backups daily between drives A and B?

Primarily, I do (1); though every so often I do a variation of (2).

Backups from all the PCs in the house go to drive "A" (a spare desktop
in the basement playing server) as a daily process. This is performed
with rsync in a cronjob on the PCs dumping to dated directories and
symlinking the "current", so the next run just hardlinks anything that
hasn't changed.

"Drive A" is backed up to "Drive B" (an external USB SSD; only mounted
for the copy job, then unmounted afterwards). Every 6 months or so
(yes, this should be more frequent, but meh) I do this with "Drive C",
which otherwise lives at the parents house.

--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: DDAB 23FB 19FA 7D85 1CC1 E067 6D65 70E5 4CE7 2860

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE3asj+xn6fYUcweBnbWVw5UznKGAFAmb9Z+gACgkQbWVw5Uzn KGC2KhAAj097ugOOAsQm+DIs3A/SzPCrGDEoz8r/ouwVfJCoCRXsfChlGjyDPjfd 43PZtri2N21Hc1XnQBOLbCh3YOPk+f/zu+zrWwvvIOxKXGThLbf0vu2HhgZLl0Ym LsiTIohhjGRY8p0B7i7MoCG5N5qDPFF9CBLtqytHwlWpXpKhK7rfsep1Yk8sbPyI RXJZ4qFq43zGTy7yqiGRH79lNTCRNNt7foG2b6JUd5khR2QdZ/g03AqWpaa69jZ2 h84xF6LCYv6H5CnU/rj9N+YpwtFHtpdF0uM1Wg/iUQTIuWYl0rETdEhL+VR6DHqD Q5D1bSnd6DHpt/VppzOvywZFkUYLRdeYnI4SxILw9ncAsw4khIF8aOSJZlwmL3D7 BIfrlhsovSQsBaPKWliSw0dadzLdEL8V69U75HrPA3DeGvGlimTRfwA6H/4TCQRv E41honTNfzHGhARth3o2kClmWhFWVkV0tG2b4bAS87S3FcIHTbNHLfEuyqQEycvE +pR8nJhh4QUB9TojB5Qqd37nJqK4e2R0PuoVMN7ag9+tpoPTlIFAho1X3ZaQ7b5s HO7RVEPYJppPJGAImCzpvCCZ+4ErEalN+wYeYLieENYujc7mJO/uGKxEKwR2VrmK Ngj/c86HCu70IkNxT5X3n4xO6H9cNRzcZwZn3O35zDr9E+OeVZc=
=qUCR
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Us

From Jonathan Dowland@21:1/5 to Default User on Sun Oct 6 20:50:01 2024

On Wed Oct 2, 2024 at 12:33 AM BST, Default User wrote:

May I ask why you decided to switch from rsnapshot to rdiff-backup, and
then to borg?

Sure!

At the time I was using rsnapshot, I was subscribed to some very high
traffic mailing lists (such as LKML), and storing the mail in Maildir
format (=1 file per email). rsnapshot's design of lots of hardlinks for
files that are present in more than one backup increment proved very
expensive at the time (I switched to rdiff-backup in around 2006-2007).

I have a lot of time for rdiff-backup, I think it's very well designed.
It addressed the problem I had with rsnapshot, and the backup format is
simple enough and well documented that you could feasibly write other
tools to read from it, should you need to. That gave me confidence.

The main issue I hit with rdiff-backup was if I wanted to move files
or directories containing large files around on my storage: that
resulted in the new locations being considered "new", and the next
backup increment being comparatively large. This reduced the number
of increments I could fit into my backup storage (= shorter horizon
for restores, although I've never had to restore back in time a great
deal), and I found I started limiting the amount of moving around I
was doing of large file (size) trees, to avoid that happening.

I switched to Borg in Summer 2020 mainly to address that (Borg
de-duplicates files and stores them content-addressible, of a fashion,
so file moves don't grow increment sizes). At the time rdiff-backup was
not being actively developed; that has changed. I was nervous about
Borg's significant increase in complexity, but I've been running it for
four years now and it's been fine.

--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ jmtd@debian.org
🔗 https://jmtd.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From eben@gmx.us@21:1/5 to Jonathan Dowland on Sun Oct 6 22:30:01 2024

On 10/6/24 14:44, Jonathan Dowland wrote:

On Wed Oct 2, 2024 at 12:33 AM BST, Default User wrote:

May I ask why you decided to switch from rsnapshot to rdiff-backup, and
then to borg?

The main issue I hit with rdiff-backup was if I wanted to move files
or directories containing large files around on my storage: that
resulted in the new locations being considered "new", and the next
backup increment being comparatively large.

I use rdiff to do the backups on the "server" (its job is serving video
content to the TV box over NFS) and ran into that problem, so what I did was write a series of scripts that relinked identical files. It's not perfect,
I suspect there are still bugs. It tries to be efficient (by not comparing files that can't possibly be the same because they have different sizes, or
are already linked), but it gets the job done. Eventually. Running it
takes about as long as running the backup in the first place. But hey,
we're talking about 1 GiB of filespace which might change by 10-20 MiB
between backups, so not a big deal.

--
We're standing there pounding a dead parrot on the counter,
and the management response is to frantically swap in new counters
to see if that fixes the problem.
-- Peter Gutmann

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michel Verdier@21:1/5 to Jonathan Dowland on Mon Oct 7 10:40:08 2024

On 2024-10-06, Jonathan Dowland wrote:

At the time I was using rsnapshot, I was subscribed to some very high
traffic mailing lists (such as LKML), and storing the mail in Maildir
format (=1 file per email). rsnapshot's design of lots of hardlinks for files that are present in more than one backup increment proved very expensive at the time (I switched to rdiff-backup in around 2006-2007).

Do you mean inodes expensive ? Which filesystem do you used ?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan Dowland@21:1/5 to Michel Verdier on Mon Oct 7 21:50:01 2024

On Mon Oct 7, 2024 at 9:37 AM BST, Michel Verdier wrote:

Do you mean inodes expensive ? Which filesystem do you used ?

It was 18 years ago so I can't remember that clearly, but I think it was
a mixture of inodes expense and an enlarged amount of CPU time with the
file churn (mails moved from new to cur, and later to a separate archive Maildir, that sort of thing). It was probably ext3 given the time.

--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ jmtd@debian.org
🔗 https://jmtd.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan Dowland@21:1/5 to eben on Mon Oct 7 21:50:01 2024

On Sun Oct 6, 2024 at 9:24 PM BST, eben wrote:

I use rdiff to do the backups on the "server" (its job is serving video content to the TV box over NFS) and ran into that problem, so what I did was write a series of scripts that relinked identical files. It's not perfect,
I suspect there are still bugs. It tries to be efficient (by not comparing files that can't possibly be the same because they have different sizes, or are already linked), but it gets the job done. Eventually.

That's a neat solution!

--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ jmtd@debian.org
🔗 https://jmtd.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dan Ritter@21:1/5 to eben@gmx.us on Mon Oct 7 22:30:01 2024

eben@gmx.us wrote:

I use rdiff to do the backups on the "server" (its job is serving video content to the TV box over NFS) and ran into that problem, so what I did was write a series of scripts that relinked identical files. It's not perfect,
I suspect there are still bugs. It tries to be efficient (by not comparing files that can't possibly be the same because they have different sizes, or are already linked), but it gets the job done. Eventually. Running it
takes about as long as running the backup in the first place. But hey,
we're talking about 1 GiB of filespace which might change by 10-20 MiB between backups, so not a big deal.

Possibly of interest: Debian package rdfind:

Description: find duplicate files utility
rdfind is a program to find duplicate files and optionally list, delete
them or replace them with symlinks or hard links. It is a command
line program written in c++, which has proven to be pretty quick compared
to its alternatives.

-dsr-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From eben@gmx.us@21:1/5 to Dan Ritter on Mon Oct 7 22:50:01 2024

On 10/7/24 16:06, Dan Ritter wrote:

eben@gmx.us wrote:

I use rdiff to do the backups on the "server" ... and ran into that
problem, so what I did was write a series of scripts that relinked
identical files.

Possibly of interest: Debian package rdfind:

Description: find duplicate files utility rdfind is a program to find duplicate files and optionally list, delete them or replace them with symlinks or hard links. It is a command line program written in c++,
which has proven to be pretty quick compared to its alternatives.

That's cool. I wonder if the apt subsystem on there still works. The installation's pretty old. It's off 90+% of the time and behind double-NAT
the rest, so I'm not excessively worried.

--
I'm an apatheist. The question is no longer
interesting, and the answer no longer matters.

-- petro on ASR

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Charles Curley@21:1/5 to Jonathan Dowland on Tue Oct 8 04:00:02 2024

On Mon, 07 Oct 2024 20:44:44 +0100
"Jonathan Dowland" <jmtd@debian.org> wrote:

It was 18 years ago so I can't remember that clearly, but I think it
was a mixture of inodes expense and an enlarged amount of CPU time
with the file churn (mails moved from new to cur, and later to a
separate archive Maildir, that sort of thing). It was probably ext3
given the time.

Interesting.

I've used rsnapshot for several years now with no such issue. My
rsnapshot repository resides on ext4, on its own LVM logical volume, on
top of an encrypted RAID 5 array on four four terabyte spinning rust
drives.

root@hawk:~# df /crc/rsnapshot/
Filesystem Size Used Avail Use% Mounted on /dev/mapper/hawk--vg--raid-rsnapshot 247G 179G 55G 77%
/crc/rsnapshot root@hawk:~# df -i /crc/rsnapshot/
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/hawk--vg--raid-rsnapshot 16M 3.2M 13M 21%
/crc/rsnapshot root@hawk:~#

As you can see, I am not greatly worried about running out of inodes.

I have 11G of mail, also in maildir format, to back up. Since the
archive goes back a year, I probably have more than 11G in the archive.
Plus other stuff: /etc, etc., etc..

As for the churn, that should be less of an issue now than it might have
been 18 years ago, even though my motherboard dates to 2015. I
definitely notice it (the hard drive activity light if nothing else),
but it doesn't slow me down at all.

--
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From tomas@tuxteam.de@21:1/5 to Jonathan Dowland on Tue Oct 8 06:40:01 2024

On Mon, Oct 07, 2024 at 08:44:44PM +0100, Jonathan Dowland wrote:

On Mon Oct 7, 2024 at 9:37 AM BST, Michel Verdier wrote:

Do you mean inodes expensive ? Which filesystem do you used ?

It was 18 years ago so I can't remember that clearly, but I think it was
a mixture of inodes expense and an enlarged amount of CPU time with the
file churn (mails moved from new to cur, and later to a separate archive Maildir, that sort of thing). It was probably ext3 given the time.

Note that the transition to Ext4 must have been around 2006, making
huge directories viable (HTree). So perhaps this is a factor too.

Cheers
--
t

-----BEGIN PGP SIGNATURE-----

iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCZwS3FwAKCRAFyCz1etHa RhYtAJwOdUV/KzmO/9mKx1hoUbrnSu7JDQCfYa06F79jBv4/VQJx8Di/aZOPnVQ=
=8vgD
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michel Verdier@21:1/5 to Jonathan Dowland on Tue Oct 8 10:50:02 2024

On 2024-10-07, Jonathan Dowland wrote:

It was 18 years ago so I can't remember that clearly, but I think it was
a mixture of inodes expense and an enlarged amount of CPU time with the
file churn (mails moved from new to cur, and later to a separate archive Maildir, that sort of thing). It was probably ext3 given the time.

Ok I see. I use nnml so I have no new/cur moves. Also I add dateext
parameter for logrotate so old logs keep the same name.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Smith@21:1/5 to Michel Verdier on Tue Oct 8 20:00:01 2024

Hi,

On Tue, Oct 08, 2024 at 10:41:33AM +0200, Michel Verdier wrote:

I add dateext parameter for logrotate so old logs keep the same name.

This is another drawback to the design of rsnapshot. It doesn't matter
that the files in your backup retain the same path: if they differ at
all in any way, you'll get a new copy.

i.e. if you have a 1GiB log file /var/log/somelog and you append one
byte to it, rsync will take care of only transferring one byte, but both
the old 1GiB file and the new 1GiB-and-1-byte version will be stored in
their entirety in your backups.

Other backup systems would chunk these files and recognise that the vast majority of the new file is the same as the old file and only store
those chunks once. But would be more complicated than rsnapshot.

"Differing at all" can also include mere metadata changes such as
permissions, ownership or times, since all hardlinked versions of a file
share all this metadata.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Smith@21:1/5 to Charles Curley on Wed Oct 9 00:20:01 2024

Hi,

On Mon, Oct 07, 2024 at 07:52:55PM -0600, Charles Curley wrote:

I've used rsnapshot for several years now with no such issue. My
rsnapshot repository resides on ext4, on its own LVM logical volume, on
top of an encrypted RAID 5 array on four four terabyte spinning rust
drives.

/crc/rsnapshot root@hawk:~# df -i /crc/rsnapshot/
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/hawk--vg--raid-rsnapshot 16M 3.2M 13M 21%

This really isn't that much data and you have four drives to spread
random reads across, so I'm not surprised that you don't really feel it
yet.

When you have hundreds of millions of files in rsnapshot it really
starts to hurt because every backup run involves:

- Deleting the oldest tree of files;
- Walking the entire tree of the most recent backup once to cp -l it and
then;
- Walking it all again when rsync compares the new data to your previous
iteration.

Worse, it's all small, largely random IO which is worst case for
spinning media. It easily gets to the point where the copy and compare
steps take much longer than the actual data transfer.

Other backup solutions get better performance by using some sort of
index, manifest or other database, not just by walking every inode in
the filesystem. But are then more complicated.

This rsnapshot I have is really quite slow with only two 7200rpm HDDs.
It spends way longer walking its data store than actually backing up any
data. I could definitely make it speedier by switching to something
else. But I like rsnapshot for this particular case.

$ sudo find /data/backup/rsnapshot -print0 | grep -zc '.'
202326554

(This is a btrfs filesystem which doesn't report an inode count with df
-i)

Although it probably matters most how many files you have only in the
most recent backup iteration rather than the entire rsnapshot store. For
me that is approx 5.8 million.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Wright@21:1/5 to tomas@tuxteam.de on Wed Oct 9 04:30:01 2024

On Tue 08 Oct 2024 at 06:37:43 (+0200), tomas@tuxteam.de wrote:

On Mon, Oct 07, 2024 at 08:44:44PM +0100, Jonathan Dowland wrote:

On Mon Oct 7, 2024 at 9:37 AM BST, Michel Verdier wrote:

Do you mean inodes expensive ? Which filesystem do you used ?

It was 18 years ago so I can't remember that clearly, but I think it was
a mixture of inodes expense and an enlarged amount of CPU time with the file churn (mails moved from new to cur, and later to a separate archive Maildir, that sort of thing). It was probably ext3 given the time.

Note that the transition to Ext4 must have been around 2006, making
huge directories viable (HTree). So perhaps this is a factor too.

Perhaps you're on the inside track with respect to Debian.
I didn't use ext4 at all until it was added to the squeeze
installer (Feb 2011), and only when I was sure that a lenny
ext3 installation would not need to read a file from a
squeeze-written ext4 partition on the same machine.

I think I eliminated my last ext3 partition at the end of 2014.
(I was extremely conservative with my laptop during part of
2013/2014, as I was totally reliant on this sole machine to be
trouble-free.)

Cheers,
David.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michel Verdier@21:1/5 to Andy Smith on Wed Oct 9 11:00:01 2024

On 2024-10-08, Andy Smith wrote:

When you have hundreds of millions of files in rsnapshot it really
starts to hurt because every backup run involves:

- Deleting the oldest tree of files;

rsnapshot can rename it apart and delete it after backup is done. Thus involving only the backup system

- Walking the entire tree of the most recent backup once to cp -l it and
then;

rsnapshot only renames directories when rotating backups then does rsync
with hard links to the newest

This rsnapshot I have is really quite slow with only two 7200rpm HDDs.
It spends way longer walking its data store than actually backing up any data. I could definitely make it speedier by switching to something
else. But I like rsnapshot for this particular case.

On 7200rpm HDDs I was using xfs over RAID1 and the slowest/blocking part
was the deletion

Although it probably matters most how many files you have only in the
most recent backup iteration rather than the entire rsnapshot store. For
me that is approx 5.8 million.

I don't remember but I should have been around your volume.
rsync uses metadata so it also depends on the filesystem. Some are
quicker. I think metadata is quite like the index used by other backup
systems.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Smith@21:1/5 to Michel Verdier on Wed Oct 9 12:40:01 2024

Hi,

On Wed, Oct 09, 2024 at 10:57:12AM +0200, Michel Verdier wrote:

On 2024-10-08, Andy Smith wrote:

When you have hundreds of millions of files in rsnapshot it really
starts to hurt because every backup run involves:

- Deleting the oldest tree of files;

rsnapshot can rename it apart and delete it after backup is done. Thus involving only the backup system

Yes but this is still a necessary part of each backup cycle. You can't
do another backup run while that job is still outstanding and the load
it puts on the system is still there regardless of the timing within the
backup procedure.

- Walking the entire tree of the most recent backup once to cp -l it and
then;

rsnapshot only renames directories when rotating backups then does rsync
with hard links to the newest

Okay yes when you set link_dest to 1 in rsnapshot.conf then rsync will
do that bit during its run, but having to hard link a directory tree of
5 million files is not speedy. Other backup designs do not do this
because they don't need to take any form of copies of what is already
there. The point is that this step is "compare and hard link if
unchanged" whereas usually it is "compare and do nothing if unchanged".

rsync uses metadata so it also depends on the filesystem. Some are
quicker. I think metadata is quite like the index used by other backup systems.

The big difference is that to read the metadata of a tree of files in
the filesystem you have to walk through all the inodes which is a lot of
small random access.

70 years of database theory has tried to make queries efficient and
minimise random access, maximise cache locality etc. Otherwise all
databases would just be filesystems!

Like I say I like and use rsnapshot in some places, but speed and
resource efficiency are not its winning points.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Wed Oct 9 17:00:02 2024

Like I say I like and use rsnapshot in some places, but speed and
resource efficiency are not its winning points.

I have never used Rsnapshot, but I used Rsync backups for many years and
then moved to Bup. The time to perform backups has been *very*
substantially shortened by moving to Bup.

The size of the backup repository is also nicely reduced (probably a mix
of compression and of deduplication between files on different hosts
that are backed up to the same repository).

It is also much less demanding on the backup server, both in terms of
RAM use and CPU time (I use low-power SBCs for that job).

A full restore from Bup can be fairly slow, OTOH. Luckily, I've only
ever had to fetch a few files from the backup (via `fuse`), but it does
make it more costly to *use* your backup (e.g. I used to have a script
which tracked the size of the last set of backed up files, as a way to
detect unexpected changes in this size, but that is now impractical).

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (0 / 16)
Uptime:	164:19:49
Calls:	10,385
Calls today:	2
Files:	14,057
Messages:	6,416,518

Re: backup of backup or alternating backups?

Who's Online

Recent Visitors

System Info