• Re: rsync: source and destination drive data used sizes differ

    From Andy Smith@21:1/5 to Default User on Sun Jan 19 03:20:01 2025
    Hi Default,

    On Sat, Jan 18, 2025 at 08:36:42PM -0500, Default User wrote:
    So, back to the original question: what in the world am I supposed to
    do to have rsync copy so that the size change in the two drives is
    equal, and DRIVE2 has (theoretically) the same data, taking up the same space, as DRIVE1?

    -----------------------------------------------------------------------

    BTW, I forgot to mention, FWIW, that the Borgbackup and rsnapshot
    backups are of /home/user and its subdirectories only. Everything else
    (all the system stuff) is backed up using Timeshift. The Timeshift
    data, like everything else, is part of the stuff on DRIVE1.

    I do not think that Borgbackup uses sparse files or hardlinks so there shouldn't be any difference there, but rsnapshot definitely does use
    hardlinks and if you used the recommended rsync arguments in rsnapshot
    then it would also retain the sparseness of any sparse files from
    whatever you backed up. I don't know about Timeshift.

    So, all that to say, unless you use -H and -S on your new rsync there
    will definitely be a discrepancy.

    I expect different fragmentation levels to also cause a discrepancy,
    with the new drive using less space as it started off empty and with no fragmentation.

    Beyond that, if I were worried I would compare the file tree file by
    file with a tool like sha512sum or better still xxhsum (from xxhash
    package) as that will likely be MUCH faster on an amd64 machine¹. If all
    the files were the same content then I would stop worrying.

    Thanks,
    Andy

    ¹ Last year I wrote this:

    https://strugglers.net/~andy/mothballed-blog/2024/04/20/for-file-integrity-testing-youre-wasting-your-time-with-md5/

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From eben@gmx.us@21:1/5 to Default User on Sun Jan 19 04:00:01 2025
    On 1/18/25 21:50, Default User wrote:
    Hi Andy!

    Thanks for the reply.

    I may just delete everything on DRIVE2 overnight,

    Might be faster to mkfs than to rm *.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Curley@21:1/5 to Default User on Sun Jan 19 05:10:01 2025
    On Sat, 18 Jan 2025 20:36:42 -0500
    Default User <hunguponcontent@gmail.com> wrote:

    So, back to the original question: what in the world am I supposed to
    do to have rsync copy so that the size change in the two drives is
    equal, and DRIVE2 has (theoretically) the same data, taking up the
    same space, as DRIVE1?

    I suggest that instead of using rsync directly you use rsnapshot. You
    can set it up so that it only copies if DRIVE2 is there. The cron
    entries let it happen automatically.

    What I do is different from what you are doing. I have an SSD where all
    my live stuff exists. I have four 4TB drives sat ups as a RAID. I use
    amanda for backups from the SSD to the RAID, analogous, I take it, to
    your DRIVE1. I have three USB 4TB drives which I rotate between my
    offsite location and take with me on trips, analogous to your DRIVE2. I
    use rsnapshot to back part of the RAID up to the USB drives in rotation.

    My offsite drives do vary in free space, but diffing between the RAID
    and the offsites indicates that the data is intact.

    --
    Does anybody read signatures any more?

    https://charlescurley.com
    https://charlescurley.com/blog/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Christensen@21:1/5 to Default User on Sun Jan 19 05:50:01 2025
    On 1/18/25 17:27, Default User wrote:
    Hi!

    I have two identical 4Tb usb external drives, Western Digital Model WDC WD40NDZW-11A8JS1. My computer is a Dell Inspiron 15 3000 Model 3511 (a
    very modest laptop), from early 2024, running Debian 12 Stable, always
    kept updated.


    Thank you for that information.



    The first drive, Drive 1, is my "backup drive". I backup daily using Borgbackup Version 1.2.4 from the Debian Stable repositories, and rsnapshot Version 1.4.5-1, also from the Debian Stable repositories.

    It also has a whole bunch of other archival programs, data and image
    files on it as well.

    sudo df -h /media/user/DRIVE1
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda1 3.6T 1.5T 2.0T 42% /media/user/DRIVE1

    sudo du -sh /media/user/DRIVE1
    1.5T /media/user/DRIVE1

    I am trying to use the other drive, DRIVE2, as an exact copy of DRIVE1 (except for the unused space on the drive). The only difference between
    the two drives is that I just re-formatted DRIVE2 and set it up so that
    the disk is now LUKS 1 encrypted.

    sudo df -h /media/user/DRIVE2
    Filesystem Size Used Avail
    Use% Mounted on
    /dev/mapper/luks-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 3.6T 1.2T 2.3T
    35% /media/user/DRIVE2

    sudo du -sh /media/user/DRIVE2
    1.2T /media/user/DRIVE2

    Every night, I have been using rsync to copy from DRIVE1 to DRIVE2,
    doing:

    time sudo rsync -avvv --human-readable --delete --numeric-ids --info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media
    /*",
    "/lost+found"} /media/user/DRIVE1/ /media/user/DRIVE2/ ; date

    And each time, DRIVE2 would be using more and more space, much more
    than DRIVE1!

    Finally, I decided to re-format DRIVE2 and set it up for LUKS. Then I
    tied again to use rsync to write everything from DRIVE1 to DRIVE2, all
    at once. It ran for 39 hours, using more and more space until it ran
    out of space, without finishing.

    So, I deleted everything from DRIVE2 (which surprisingly took many
    hours), keeping it set up for LUKS, and used rsync again to copy
    everything from DRIVE1 to DRIVE2, this time trying:

    time sudo rsync -aHAXUSxvvv --human-readable --delete --numeric-ids -- info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media /*","/lost+found"} /media/user/DRIVE1/ /media/user/DRIVE2/ ; date

    which again took many hours, so it was done over multiple sessions.

    (BTW, I don't know if the -U option helps; it may just screw things
    up.)

    Now, DRIVE2 is still using considerably LESS space than DRIVE1!

    Is that due to using -H for hard links? Or -S for sparse files? Or
    both? Or neither?

    I have not yet done any further copying with rsync to see if the rate
    of change to DRIVE1 and DRIVE2 will be equal. But I really doubt it.

    So, back to the original question: what in the world am I supposed to
    do to have rsync copy so that the size change in the two drives is
    equal, and DRIVE2 has (theoretically) the same data, taking up the same space, as DRIVE1?



    IF:

    You have two HDD's with the same number of blocks (sectors).


    AND

    "I am trying to use the other drive, DRIVE2, as an exact copy of DRIVE1".


    THEN:

    1. Unmount DRIVE1

    2. Unmount DRIVE2.

    3. Use dd(1) to copy *ALL* of the blocks of DRIVE1 to DRIVE2.

    4. Mount DRIVE1 *READONLY*, run df(1), run du(1), and unmount DRIVE1.

    5. Mount DRIVE2 *READONLY*, run df(1), run du(1), and unmount DRIVE2.

    6. The outputs of #4 and #5 should be identical.


    David

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From eben@gmx.us@21:1/5 to Default User on Sun Jan 19 06:50:01 2025
    On 1/18/25 22:21, Default User wrote:
    Hi, Eben!

    I hate to sound stupid, but how would I do that. I have never used mkfs before.

    I've never used LUKS before, so we're even. With a non-encrypted
    filesystem, you would
    unmount the partition
    mkfs -t whatever <other options> /dev/whatever
    mount it again

    Or whatever you did the first time to format it? Do it again. Make sure
    it's unmounted first.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From tomas@tuxteam.de@21:1/5 to Default User on Sun Jan 19 08:50:01 2025
    On Sat, Jan 18, 2025 at 08:27:17PM -0500, Default User wrote:
    Hi!

    [...]

    Every night, I have been using rsync to copy from DRIVE1 to DRIVE2,
    doing:

    time sudo rsync -avvv --human-readable --delete --numeric-ids --info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media
    /*",
    "/lost+found"} /media/user/DRIVE1/ /media/user/DRIVE2/ ; date

    And each time, DRIVE2 would be using more and more space, much more
    than DRIVE1!

    I can think about two things which migt keep rsync from deleting
    stuff in your target dir.

    - Errors. Look for "skipping file deletion" in the logs.
    You might want to add --ignore-errors, but this obviously
    has a downside -- I'd tend to understand why there are
    errors :)
    - files piling up in excluded subdirs (however they might end
    up there in the first place). --delete-excluded will take
    care of that (although I'd be very curious and look into
    those excluded dirs first).

    Cheers
    --

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCZ4ysRwAKCRAFyCz1etHa RiVvAJ9O3cvJYi+fF6j93M34Awz05Fc6IgCeMX9hQegNxuOXf6Fp4qFHg+UDqA8=
    =FmQd
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michel Verdier@21:1/5 to eben@gmx.us on Sun Jan 19 10:30:01 2025
    On 2025-01-19, eben@gmx.us wrote:

    I've never used LUKS before, so we're even. With a non-encrypted
    filesystem, you would
    unmount the partition
    mkfs -t whatever <other options> /dev/whatever
    mount it again

    It's the same with luks and the device used is a mapping in /dev/mapper

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Curley@21:1/5 to Charles Curley on Sun Jan 19 15:30:01 2025
    On Sat, 18 Jan 2025 21:01:14 -0700
    Charles Curley <charlescurley@charlescurley.com> wrote:

    I suggest that instead of using rsync directly you use rsnapshot. You
    can set it up so that it only copies if DRIVE2 is there. The cron
    entries let it happen automatically.

    Another advantage to rsnapshot is that you don't have to fiddle with
    all of rsync's options. They can get quite fiddly.

    --
    Does anybody read signatures any more?

    https://charlescurley.com
    https://charlescurley.com/blog/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eduardo M KALINOWSKI@21:1/5 to All on Sun Jan 19 17:10:01 2025
    Em 19/01/2025 08:57, David escreveu:
    On Sun, 19 Jan 2025 at 02:51, Default User <hunguponcontent@gmail.com> wrote:
    time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids --
    info=progress2,stats2,name2 --
    exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media
    /*","/lost+found"} /media/user/DRIVE1/ /media/user/DRIVE2/ ; date

    Also, your use of --exclude looks completely wrong to me.

    I do not see anything in 'man rsync' (version 3.2.7) that agrees with your use of brace characters {}.

    In fact the manpage says: "--exclude options take one rule/pattern each",
    as I have shown above, which is not what you have.

    That's a shell feature, it will expand to multiple --exclude options.

    That does seem somewhat complicated, though. --exclude=/dev should be
    more concise (and perhaps maybe slightly more efficient).

    And for those specific exclusions (except /lost+found),
    --one-file-system is probably even better, unless there is a second
    filesystem mounted somewhere inside /media/user/DRIVE1 (which seems
    unlikely).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Eduardo M KALINOWSKI on Sun Jan 19 17:30:01 2025
    On Sun, Jan 19, 2025 at 12:43:51 -0300, Eduardo M KALINOWSKI wrote:
    Em 19/01/2025 08:57, David escreveu:
    On Sun, 19 Jan 2025 at 02:51, Default User <hunguponcontent@gmail.com> wrote:
    time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids -- info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media /*","/lost+found"} /media/user/DRIVE1/ /media/user/DRIVE2/ ; date

    In fact the manpage says: "--exclude options take one rule/pattern each", as I have shown above, which is not what you have.

    That's a shell feature, it will expand to multiple --exclude options.

    It's called brace expansion, if you're looking for it in the bash manual.
    In this case, it could be written more concisely:

    --exclude=/{dev,proc,sys,tmp,run,mnt,media}/\* --exclude=/lost+found

    The only character that needs to be quoted is the *, so there's no
    need to include all those extra double quotes. The starting and ending
    slash, and the ending *, are common to all the patterns except for
    /lost+found, so those can be moved outside of the braces. Then, since

    And for those specific exclusions (except /lost+found), --one-file-system is probably even better, unless there is a second filesystem mounted somewhere inside /media/user/DRIVE1 (which seems unlikely).

    They might have separate /var and /usr or something. Who knows.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to David on Mon Jan 20 01:40:02 2025
    On Mon, Jan 20, 2025 at 00:08:54 +0000, David wrote:
    I would have recognised this
    echo a{1..5}b
    as brace expansion, but I hadn't absorbed the extra glorious
    capabilities of its commas.

    The commas were the original form. The .. range feature was added in
    bash version 3.0.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)