• USB SSD randomly unmounting

    From Pancho@3:770/3 to All on Fri Jul 7 11:45:24 2023
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted
    again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now
    stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Jan Panteltje@3:770/3 to Pancho.Jones@proton.me on Fri Jul 7 11:36:24 2023
    On a sunny day (Fri, 7 Jul 2023 11:45:24 +0100) it happened Pancho <Pancho.Jones@proton.me> wrote in <u88qc5$1a435$1@dont-email.me>:

    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA >adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted >again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now >stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    I had a similar problem with an old Pi,
    and it turned out to be the power supply module,
    now replaced with original raspi supply, no more problems.
    What did / does
    dmesg
    show?
    I sometimes see this on my Pi4 8GB:
    [1484748.627699] hwmon hwmon1: Voltage normalised
    [1484750.707706] hwmon hwmon1: Undervoltage detected!
    [1484754.867040] hwmon hwmon1: Voltage normalised
    that is with a 3.8 TB USB Toshiba harddisk on a Siecom USB hub,
    so far no crashes... Runs on a UPS..

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to Computer Nerd Kev on Fri Jul 7 13:58:34 2023
    On 07/07/2023 13:44, Computer Nerd Kev wrote:
    Pancho <Pancho.Jones@proton.me> wrote:
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA
    adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted
    again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now
    stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed.

    That isn't a normal respose in Linux. For ext file systems the
    behaviour is:

    errors={continue|remount-ro|panic}
    Define the behavior when an error is encountered. (Either ig-
    nore errors and just mark the filesystem erroneous and continue,
    or remount the filesystem read-only, or panic and halt the sys-
    tem.) The default is set in the filesystem superblock, and can
    be changed using tune2fs(8).
    (from EXT4(5))

    For FAT filesystems it's:

    errors={panic|continue|remount-ro}
    Specify FAT behavior on critical errors: panic, continue without
    doing anything, or remount the partition in read-only mode (de-
    fault behavior).
    (from MOUNT(8))

    So between continuing and halting the system, the only other option
    should be remounting read-only, not unmounting the filesystem.

    Perhaps there was some service unmounting it automatically because
    it thought the drive had been disconnected? Anyway it shouldn't be
    able to happen due to read/write errors.

    Thx, I was just guessing. I posted a bit of sys
    log in response to Jan. It didn't make much sense to me, but maybe it
    will to someone else.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Computer Nerd Kev@3:770/3 to Pancho on Fri Jul 7 22:44:25 2023
    Pancho <Pancho.Jones@proton.me> wrote:
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed.

    That isn't a normal respose in Linux. For ext file systems the
    behaviour is:

    errors={continue|remount-ro|panic}
    Define the behavior when an error is encountered. (Either ig-
    nore errors and just mark the filesystem erroneous and continue,
    or remount the filesystem read-only, or panic and halt the sys-
    tem.) The default is set in the filesystem superblock, and can
    be changed using tune2fs(8).
    (from EXT4(5))

    For FAT filesystems it's:

    errors={panic|continue|remount-ro}
    Specify FAT behavior on critical errors: panic, continue without
    doing anything, or remount the partition in read-only mode (de-
    fault behavior).
    (from MOUNT(8))

    So between continuing and halting the system, the only other option
    should be remounting read-only, not unmounting the filesystem.

    Perhaps there was some service unmounting it automatically because
    it thought the drive had been disconnected? Anyway it shouldn't be
    able to happen due to read/write errors.

    --
    __ __
    #_ < |\| |< _#

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to Jan Panteltje on Fri Jul 7 13:55:10 2023
    On 07/07/2023 12:36, Jan Panteltje wrote:
    On a sunny day (Fri, 7 Jul 2023 11:45:24 +0100) it happened Pancho <Pancho.Jones@proton.me> wrote in <u88qc5$1a435$1@dont-email.me>:

    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA
    adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted
    again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now
    stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    I had a similar problem with an old Pi,
    and it turned out to be the power supply module,
    now replaced with original raspi supply, no more problems.
    What did / does
    dmesg
    show?
    I sometimes see this on my Pi4 8GB:
    [1484748.627699] hwmon hwmon1: Voltage normalised
    [1484750.707706] hwmon hwmon1: Undervoltage detected!
    [1484754.867040] hwmon hwmon1: Voltage normalised
    that is with a 3.8 TB USB Toshiba harddisk on a Siecom USB hub,
    so far no crashes... Runs on a UPS..



    I don't thing undervoltage is the problem.


    Here is a bit of /var/log/syslog.1. I think is relevant. rsnapshot is
    good, I think the line after is things going bad.

    Jun 25 16:00:15 rpi4 rsnapshot[1261324]: /usr/bin/rsnapshot alpha:
    completed successfully
    Jun 25 16:05:01 rpi4 CRON[1261432]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Jun 25 16:05:28 rpi4 kernel: [6195820.664216] sd 0:0:0:0: [sda] tag#20 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD OUT
    Jun 25 16:05:28 rpi4 kernel: [6195820.664245] sd 0:0:0:0: [sda] tag#20
    CDB: Write(10) 2a 00 0d 14 09 5f 00 00 08 00
    Jun 25 16:05:28 rpi4 kernel: [6195820.664781] sd 0:0:0:0: [sda] tag#19 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD OUT
    Jun 25 16:05:28 rpi4 kernel: [6195820.664794] sd 0:0:0:0: [sda] tag#19
    CDB: Write(10) 2a 00 0d 14 09 7f 00 00 08 00
    Jun 25 16:05:28 rpi4 kernel: [6195820.665104] sd 0:0:0:0: [sda] tag#18 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD OUT
    Jun 25 16:05:28 rpi4 kernel: [6195820.665115] sd 0:0:0:0: [sda] tag#18
    CDB: Write(10) 2a 00 25 c9 01 8f 00 00 18 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.024416] sd 0:0:0:0: [sda] tag#27 uas_eh_abort_handler 0 uas-tag 11 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.024451] sd 0:0:0:0: [sda] tag#27
    CDB: Write(10) 2a 00 0b f3 35 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.024927] sd 0:0:0:0: [sda] tag#26 uas_eh_abort_handler 0 uas-tag 10 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.024942] sd 0:0:0:0: [sda] tag#26
    CDB: Write(10) 2a 00 0b f3 31 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.025287] sd 0:0:0:0: [sda] tag#25 uas_eh_abort_handler 0 uas-tag 9 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.025301] sd 0:0:0:0: [sda] tag#25
    CDB: Write(10) 2a 00 0b f3 2d 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.025639] sd 0:0:0:0: [sda] tag#24 uas_eh_abort_handler 0 uas-tag 8 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.025652] sd 0:0:0:0: [sda] tag#24
    CDB: Write(10) 2a 00 0b f3 29 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.025991] sd 0:0:0:0: [sda] tag#23 uas_eh_abort_handler 0 uas-tag 7 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.026004] sd 0:0:0:0: [sda] tag#23
    CDB: Write(10) 2a 00 0b f3 25 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.026327] sd 0:0:0:0: [sda] tag#22 uas_eh_abort_handler 0 uas-tag 6 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.026340] sd 0:0:0:0: [sda] tag#22
    CDB: Write(10) 2a 00 0b f3 21 5f 00 04 00 00
    Jun 25 16:05:43 rpi4 kernel: [6195836.026677] sd 0:0:0:0: [sda] tag#21 uas_eh_abort_handler 0 uas-tag 5 inflight: CMD OUT
    Jun 25 16:05:43 rpi4 kernel: [6195836.026690] sd 0:0:0:0: [sda] tag#21
    CDB: Write(10) 2a 00 0b f3 21 57 00 00 08 00
    Jun 25 16:05:46 rpi4 kernel: [6195838.328427] sd 0:0:0:0: [sda] tag#12 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
    Jun 25 16:05:46 rpi4 kernel: [6195838.328464] sd 0:0:0:0: [sda] tag#12
    CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
    Jun 25 16:05:46 rpi4 kernel: [6195838.344423] scsi host0: uas_eh_device_reset_handler start
    Jun 25 16:05:51 rpi4 kernel: [6195843.437090] usb 2-1: Disable of device-initiated U1 failed.
    Jun 25 16:05:56 rpi4 kernel: [6195848.556846] usb 2-1: Disable of device-initiated U2 failed.
    Jun 25 16:05:56 rpi4 kernel: [6195848.685697] usb 2-1: reset SuperSpeed
    USB device number 2 using xhci_hcd
    Jun 25 16:05:56 rpi4 kernel: [6195848.705275] usb 2-1: device firmware
    changed
    Jun 25 16:05:56 rpi4 kernel: [6195848.713529] scsi host0: uas_eh_device_reset_handler FAILED err -19
    Jun 25 16:05:56 rpi4 kernel: [6195848.713558] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713569] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713579] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713587] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713596] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713604] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713612] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713620] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713628] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713637] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713645] sd 0:0:0:0: Device
    offlined - not ready after error recovery
    Jun 25 16:05:56 rpi4 kernel: [6195848.713681] sd 0:0:0:0: [sda] tag#12
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=70s
    Jun 25 16:05:56 rpi4 kernel: [6195848.713696] sd 0:0:0:0: [sda] tag#12
    CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.713714] blk_update_request: I/O
    error, dev sda, sector 487857615 op 0x1:(WRITE) flags 0x800 phys_seg 1
    prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.724869] Aborting journal on device sda1-8.
    Jun 25 16:05:56 rpi4 kernel: [6195848.724965] usb 2-1: USB disconnect,
    device number 2
    Jun 25 16:05:56 rpi4 kernel: [6195848.727015] sd 0:0:0:0: [sda] tag#21
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727033] sd 0:0:0:0: [sda] tag#21
    CDB: Write(10) 2a 00 0b f3 21 57 00 00 08 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727042] blk_update_request: I/O
    error, dev sda, sector 200483159 op 0x1:(WRITE) flags 0x0 phys_seg 1
    prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727056] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 2885595 starting
    block 25060395)
    Jun 25 16:05:56 rpi4 kernel: [6195848.727071] Buffer I/O error on device
    sda1, logical block 25052203
    Jun 25 16:05:56 rpi4 kernel: [6195848.727120] sd 0:0:0:0: [sda] tag#22
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727130] sd 0:0:0:0: [sda] tag#22
    CDB: Write(10) 2a 00 0b f3 21 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727137] blk_update_request: I/O
    error, dev sda, sector 200483167 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727163] sd 0:0:0:0: [sda] tag#23
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727172] sd 0:0:0:0: [sda] tag#23
    CDB: Write(10) 2a 00 0b f3 25 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727179] blk_update_request: I/O
    error, dev sda, sector 200484191 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727204] sd 0:0:0:0: [sda] tag#24
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727212] sd 0:0:0:0: [sda] tag#24
    CDB: Write(10) 2a 00 0b f3 29 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727219] blk_update_request: I/O
    error, dev sda, sector 200485215 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727243] sd 0:0:0:0: [sda] tag#25
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727251] sd 0:0:0:0: [sda] tag#25
    CDB: Write(10) 2a 00 0b f3 2d 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727258] blk_update_request: I/O
    error, dev sda, sector 200486239 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727279] sd 0:0:0:0: [sda] tag#26
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727287] sd 0:0:0:0: [sda] tag#26
    CDB: Write(10) 2a 00 0b f3 31 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727294] blk_update_request: I/O
    error, dev sda, sector 200487263 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727317] sd 0:0:0:0: [sda] tag#27
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=42s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727325] sd 0:0:0:0: [sda] tag#27
    CDB: Write(10) 2a 00 0b f3 35 5f 00 04 00 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727332] blk_update_request: I/O
    error, dev sda, sector 200488287 op 0x1:(WRITE) flags 0x4000 phys_seg
    128 prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727351] sd 0:0:0:0: [sda] tag#18
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=58s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727359] sd 0:0:0:0: [sda] tag#18
    CDB: Write(10) 2a 00 25 c9 01 8f 00 00 18 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727366] blk_update_request: I/O
    error, dev sda, sector 633930127 op 0x1:(WRITE) flags 0x0 phys_seg 3
    prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727376] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 12588833 starting
    block 79241266)
    Jun 25 16:05:56 rpi4 kernel: [6195848.727387] Buffer I/O error on device
    sda1, logical block 79233074
    Jun 25 16:05:56 rpi4 kernel: [6195848.727417] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 12588833 starting
    block 79241268)
    Jun 25 16:05:56 rpi4 kernel: [6195848.727428] Buffer I/O error on device
    sda1, logical block 79233075
    Jun 25 16:05:56 rpi4 kernel: [6195848.727437] Buffer I/O error on device
    sda1, logical block 79233076
    Jun 25 16:05:56 rpi4 kernel: [6195848.727460] sd 0:0:0:0: [sda] tag#19
    FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=58s
    Jun 25 16:05:56 rpi4 kernel: [6195848.727468] sd 0:0:0:0: [sda] tag#19
    CDB: Write(10) 2a 00 0d 14 09 7f 00 00 08 00
    Jun 25 16:05:56 rpi4 kernel: [6195848.727475] blk_update_request: I/O
    error, dev sda, sector 219416959 op 0x1:(WRITE) flags 0x0 phys_seg 1
    prio class 0
    Jun 25 16:05:56 rpi4 kernel: [6195848.727485] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 12588834 starting
    block 27427120)
    Jun 25 16:05:56 rpi4 kernel: [6195848.727496] Buffer I/O error on device
    sda1, logical block 27418928
    Jun 25 16:05:56 rpi4 kernel: [6195848.727513] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 12588834 starting
    block 27427116)
    Jun 25 16:05:56 rpi4 kernel: [6195848.727523] Buffer I/O error on device
    sda1, logical block 27418924
    Jun 25 16:05:56 rpi4 kernel: [6195848.727599] sd 0:0:0:0: rejecting I/O
    to offline device
    Jun 25 16:05:56 rpi4 kernel: [6195848.729583] EXT4-fs (sda1):
    ext4_writepages: jbd2_start: 3115 pages, ino 2885595; err -30
    Jun 25 16:05:56 rpi4 kernel: [6195848.729637] Buffer I/O error on dev
    sda1, logical block 60850176, lost sync page write
    Jun 25 16:05:56 rpi4 kernel: [6195848.729648] EXT4-fs error (device
    sda1): ext4_journal_check_start:83: comm MQTT-rpi3a: Detected aborted
    journal
    Jun 25 16:05:56 rpi4 kernel: [6195848.729663] JBD2: Error -5 detected
    when updating journal superblock for sda1-8.
    Jun 25 16:05:56 rpi4 kernel: [6195848.729731] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:56 rpi4 kernel: [6195848.729771] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:56 rpi4 kernel: [6195848.729780] EXT4-fs (sda1): Remounting filesystem read-only
    Jun 25 16:05:56 rpi4 kernel: [6195848.930557] EXT4-fs warning (device
    sda1): ext4_end_bio:344: I/O error 10 writing to inode 2885595 starting
    block 25061375)
    Jun 25 16:05:56 rpi4 kernel: [6195848.930642] EXT4-fs (sda1): failed to
    convert unwritten extents to written extents -- potential data loss!
    (inode 2885595, error -30)
    Jun 25 16:05:56 rpi4 kernel: [6195848.943721] Buffer I/O error on device
    sda1, logical block 25052204
    Jun 25 16:05:56 rpi4 kernel: [6195848.950710] Buffer I/O error on device
    sda1, logical block 25052205
    Jun 25 16:05:56 rpi4 kernel: [6195848.957371] Buffer I/O error on device
    sda1, logical block 25052206
    Jun 25 16:05:56 rpi4 kernel: [6195848.964194] Buffer I/O error on device
    sda1, logical block 25052207
    Jun 25 16:05:57 rpi4 kernel: [6195849.358845] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #2949122: comm tail: reading
    directory lblock 0
    Jun 25 16:05:57 rpi4 kernel: [6195849.358841] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #2949122: comm tail: reading
    directory lblock 0
    Jun 25 16:05:57 rpi4 kernel: [6195849.358919] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:57 rpi4 kernel: [6195849.388672] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:57 rpi4 kernel: [6195849.394937] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:57 rpi4 kernel: [6195849.402628] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:57 rpi4 kernel: [6195849.859353] EXT4-fs error (device
    sda1): ext4_wait_block_bitmap:531: comm MQTT-rpi3a: Cannot read block
    bitmap - block_group = 2418, block_bitmap = 79167490
    Jun 25 16:05:57 rpi4 kernel: [6195849.873920] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:57 rpi4 kernel: [6195849.881928] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:57 rpi4 kernel: [6195849.881954] EXT4-fs error (device
    sda1): ext4_discard_preallocations:5036: comm MQTT-rpi3a: Error -5
    loading buddy information for 2418
    Jun 25 16:05:57 rpi4 kernel: [6195849.901153] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:57 rpi4 kernel: [6195849.908875] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:57 rpi4 systemd[1]: docker-000b2ff9ea29a56854ce883f927b06ef2a3445ee58dc2d4d0638abe4e9d043c0.scope: Deactivated successfully.
    Jun 25 16:05:57 rpi4 systemd[1]: docker-000b2ff9ea29a56854ce883f927b06ef2a3445ee58dc2d4d0638abe4e9d043c0.scope: Consumed 4h 33min 20.171s CPU time.
    Jun 25 16:05:58 rpi4 kernel: [6195850.370638] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #2949122: comm tail: reading
    directory lblock 0
    Jun 25 16:05:58 rpi4 kernel: [6195850.382076] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:58 rpi4 kernel: [6195850.389058] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #2949122: comm tail: reading
    directory lblock 0
    Jun 25 16:05:58 rpi4 kernel: [6195850.390048] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:58 rpi4 kernel: [6195850.407332] Buffer I/O error on dev
    sda1, logical block 0, lost sync page write
    Jun 25 16:05:58 rpi4 kernel: [6195850.415395] EXT4-fs (sda1): I/O error
    while writing superblock
    Jun 25 16:05:58 rpi4 containerd[909]: time="2023-06-25T16:05:58.855133545+01:00" level=info msg="shim
    disconnected" id=000b2ff9ea29a56854ce883f927b06ef2a3445ee58dc2d4d0638abe4e9d043c0
    Jun 25 16:05:58 rpi4 containerd[909]: time="2023-06-25T16:05:58.875593006+01:00" level=warning msg="cleaning
    up after shim disconnected" id=000b2ff9ea29a56854ce883f927b06ef2a3445ee58dc2d4d0638abe4e9d043c0 namespace=moby
    Jun 25 16:05:58 rpi4 containerd[909]: time="2023-06-25T16:05:58.875654302+01:00" level=info msg="cleaning up
    dead shim"
    Jun 25 16:05:58 rpi4 dockerd[218433]: time="2023-06-25T16:05:58.873599939+01:00" level=info msg="ignoring
    event" container=000b2ff9ea29a56854ce883f927b06ef2a3445ee58dc2d4d0638abe4e9d043c0 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
    Jun 25 16:05:58 rpi4 containerd[909]: time="2023-06-25T16:05:58.976245851+01:00" level=warning msg="cleanup
    warnings time=\"2023-06-25T16:05:58+01:00\" level=info msg=\"starting
    signal loop\" namespace=moby pid=1261463 runtime=io.containerd.runc.v2\n"
    Jun 25 16:05:59 rpi4 kernel: [6195851.105773] EXT4-fs warning (device
    sda1): htree_dirblock_to_tree:1072: inode #3145730: lblock 0: comm
    python3: error -5 reading directory block
    Jun 25 16:05:59 rpi4 kernel: [6195851.139008] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #5898243: comm dockerd: reading
    directory lblock 0
    Jun 25 16:05:59 rpi4 dockerd[218433]: time="2023-06-25T16:05:59.100406766+01:00" level=warning msg="failed to
    get endpoint_count map for scope local: open /mnt/ssd/var/lib/docker/network/files/local-kv.db: input/output error"
    Jun 25 16:05:59 rpi4 kernel: [6195851.184699] sd 0:0:0:0: [sda]
    Synchronizing SCSI cache
    Jun 25 16:05:59 rpi4 kernel: [6195851.195331] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #5898243: comm dockerd: reading
    directory lblock 0
    Jun 25 16:05:59 rpi4 kernel: [6195851.216288] EXT4-fs error (device
    sda1): __ext4_find_entry:1663: inode #5898243: comm dockerd: reading
    directory lblock 0
    Jun 25 16:05:59 rpi4 kernel: [6195851.285295] sd 0:0:0:0: [sda]
    Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR
    driverbyte=DRIVER_OK
    Jun 25 16:05:59 rpi4 systemd[1]: Unmounting /mnt/ssd/var/lib/docker/overlay2/5629a6e8d4bea9ddc2810428b2930991bd61f40a11a9ed9dc92b92c8b84f7b09/merged...
    Jun 25 16:05:59 rpi4 dockerd[218433]: time="2023-06-25T16:05:59.142223677+01:00" level=warning msg="failed to
    get endpoint_count map for scope local: open /mnt/ssd/var/lib/docker/network/files/local-kv.db: input/output error"
    Jun 25 16:05:59 rpi4 dockerd[218433]: time="2023-06-25T16:05:59.162758619+01:00" level=warning msg="failed to
    get endpoint_count map for scope local: open /mnt/ssd/var/lib/docker/network/files/local-kv.db: input/output error"
    Jun 25 16:05:59 rpi4 multipathd[470]: sda: path already removed



    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Vincent Coen@2:250/1 to Pancho on Fri Jul 7 14:32:21 2023
    Hello Pancho!

    Friday July 07 2023 11:45, you wrote to All:

    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to
    SATA adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I
    switched off power and restarted, and it worked OK, for a bit, but it dismounted again, after about 24 hours. I changed the USB/SATA cable,
    but the problem persisted.

    As this was my main server, which I needed to work, I bought a new
    SSD, and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD
    now stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted.
    Previously, It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    Just a weak guess but do you have the package linux-utils installed that has fstrim. It might be a slightly different name for your distro as you have not specified what you are using.

    If so run sudo fstrim -av and see what you get once it is online.

    Let it run for 30 minutes then rerun and see what the size is (you should ignore the first one after a reboot but make a note of the size which should
    be the same as the partition size - more or less).

    This does not work when using a M.2 device and least here - it may be treating it differently but no idea why.

    I use fstrim on all Linux system that has a SSD installed and run it every 24 hours on system that are up 24/7 and with one every 12 hours because it get a lot of new or amended data in that period.

    The above is to service garbage collection. This is not needed for Windows as that does it internally and very well.

    Vincent

    --- Mageia Linux v8 X64/Mbse v1.0.8.3/GoldED+/LNX 1.1.5-b20180707
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1)
  • From The Natural Philosopher@3:770/3 to Pancho on Fri Jul 7 14:44:08 2023
    On 07/07/2023 11:45, Pancho wrote:
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?


    compare its SMART with the good one...
    --
    There’s a mighty big difference between good, sound reasons and reasons
    that sound good.

    Burton Hillis (William Vaughn, American columnist)

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Theo@3:770/3 to Pancho on Fri Jul 7 15:30:42 2023
    Pancho <Pancho.Jones@proton.me> wrote:
    Here is a bit of /var/log/syslog.1. I think is relevant. rsnapshot is
    good, I think the line after is things going bad.

    Jun 25 16:00:15 rpi4 rsnapshot[1261324]: /usr/bin/rsnapshot alpha:
    completed successfully
    Jun 25 16:05:01 rpi4 CRON[1261432]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
    Jun 25 16:05:28 rpi4 kernel: [6195820.664216] sd 0:0:0:0: [sda] tag#20 uas_eh_abort_handler 0 uas-tag 4 inflight: CMD OUT

    That looks sick. uas is USB Attached SCSI, so the above means the Linux
    kernel tried to issue a UAS command and for some reason it failed.

    Eventually the errors stack up and Linux tries to do things like resetting
    the device in the hope it'll fix it, but they don't.

    The first thing I'd do is try updating the firmware: https://wiki.gentoo.org/wiki/Samsung_SSD_Firmware

    That may not work on a Pi as you can't run x86 binaries, so you may have to resort to the Samsung Magician Windows tool. As well as firmware updates,
    that supposedly has a health check feature. Maybe that will tell you what's going on.

    If you're been doing a lot of security camera writes it's possible it's worn out the SSD.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From crn@nospam.com@3:770/3 to Pancho on Fri Jul 7 16:12:59 2023
    Pancho <Pancho.Jones@proton.me> wrote:
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    Badblocks

    --
    http://www.netunix.com/

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Jim Jackson@3:770/3 to crn@nospam.com on Sat Jul 8 09:40:31 2023
    On 2023-07-07, crn@nospam.com <crn@nospam.com> wrote:
    Pancho <Pancho.Jones@proton.me> wrote:
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA
    adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted
    again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    As this was my main server, which I needed to work, I bought a new SSD,
    and copied everything across. Everything all now works fine.

    The thing is, I can't now see anything wrong with the problematic SSD.
    fsck says it is OK. Smartctl says it is OK, but can't run a long test
    (the long test always says "Aborted by host" 90% remaining). The SSD now
    stays mounted on another machine, still USB.

    My guess is there was something like a bad block, which caused the SSD
    to dismount when it was accessed. Now that the SSD isn't being used
    heavily, the problem just doesn't show up, it stays mounted. Previously,
    It was being used for a security camera, so a fair bit of writing.

    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    Badblocks


    Don't SSD's reserve some spare "blocks" (whatever they are called) that
    get used to replace faulty ones. Won't smartctl tell you if that has
    happenned?

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From druck@3:770/3 to Jim Jackson on Sat Jul 8 20:42:49 2023
    On 08/07/2023 10:40, Jim Jackson wrote:
    On 2023-07-07, crn@nospam.com <crn@nospam.com> wrote:
    Pancho <Pancho.Jones@proton.me> wrote:
    I know I should just bin the drive, but I'm curious, is there a better
    way of testing it, finding a fault?

    Badblocks


    Don't SSD's reserve some spare "blocks" (whatever they are called) that
    get used to replace faulty ones. Won't smartctl tell you if that has happenned?

    smartctl was designed for hard discs which are far simpler, only having
    sectors on spinning rust, it may or may not show the relocated sector
    count for SSDs, but this might not be the whole story.

    An SSD may have multiple types of flash, with the main bulk being slower
    and cheaper 2, 3 or 4 bits per cell, but also a cache of faster
    expensive single bit per cell flash. There is also the storage which
    contains the crucial mapping of logical addresses to flash blocks.

    A failure of a certain percentage of the main flash can normally be accommodated by over provisioning, but a failure of the cache or mapping
    store may cause the SSD to go in to read only mode.

    Sometimes a complete reformat of flash memory devices can appear to
    temporarily clear problems, but the wear life doesn't magically come
    back, and would you want to trust it again?

    ---druck

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to Theo on Sat Jul 8 20:52:41 2023
    On 7/7/23 15:30, Theo wrote:
    Pancho <Pancho.Jones@proton.me> wrote:
    Here is a bit of /var/log/syslog.1. I think is relevant. rsnapshot is
    good, I think the line after is things going bad.

    Jun 25 16:00:15 rpi4 rsnapshot[1261324]: /usr/bin/rsnapshot alpha:
    completed successfully
    Jun 25 16:05:01 rpi4 CRON[1261432]: (root) CMD (command -v debian-sa1 >
    /dev/null && debian-sa1 1 1)
    Jun 25 16:05:28 rpi4 kernel: [6195820.664216] sd 0:0:0:0: [sda] tag#20
    uas_eh_abort_handler 0 uas-tag 4 inflight: CMD OUT

    That looks sick. uas is USB Attached SCSI, so the above means the Linux kernel tried to issue a UAS command and for some reason it failed.

    Eventually the errors stack up and Linux tries to do things like resetting the device in the hope it'll fix it, but they don't.

    The first thing I'd do is try updating the firmware: https://wiki.gentoo.org/wiki/Samsung_SSD_Firmware

    That may not work on a Pi as you can't run x86 binaries, so you may have to resort to the Samsung Magician Windows tool. As well as firmware updates, that supposedly has a health check feature. Maybe that will tell you what's going on.


    I upgraded the firmware with Samsung Magician. Samsung Magician does not
    have diagnostics for the EVO 850. All in all, it is pretty crap, for
    anything apart from the firmware update.


    If you're been doing a lot of security camera writes it's possible it's worn out the SSD.


    SMART says 83% healthy, meaning the Wear Levelling Count. The drive has
    59 TB written, which is less than the warranty 150 TBW. These stats were
    better than the system SSD of the Windows machine I put it into.

    The warranty is also 5 years. Coincidentally, this problem seems to have occurred at almost exactly 5 years Power-on Hours, but I guess the
    warranty was ownership time, not power on time.

    One suggestion to check health was to clone the drive, and look for
    errors, but I'd already used rsync to copy the entire drive.

    It all seems like more hard work than it should be. I'll see if I can
    find a simple use for it where I care even less about reliability. Some
    kind of cache or sync drive. Maybe I'll see if I can run 3 drives off
    the same rpi4 :-).

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From John Aldridge@3:770/3 to All on Tue Jul 18 17:21:06 2023
    In article <u88tbp$26mf3$1@solani.org>, alien@comet.invalid says...
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA >adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted >again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    I had a similar problem with an old Pi,
    and it turned out to be the power supply module,
    now replaced with original raspi supply, no more problems.

    FWIW, I've seen behaviour like that too. In the case I saw, it was a USB
    stick which would dismount every few minutes. And the fix, like yours,
    was to replace a presumably dodgy PSU with an offical RPi one.

    John

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to John Aldridge on Tue Jul 18 18:01:27 2023
    On 18/07/2023 17:21, John Aldridge wrote:
    In article <u88tbp$26mf3$1@solani.org>, alien@comet.invalid says...
    I had an 500 GB Samsung EVO 850, connected to an rPi4 via a USB to SATA
    adapter, shared via Samba. It had worked this way for years.

    A couple of weeks ago it went offline, and wouldn't remount. I switched
    off power and restarted, and it worked OK, for a bit, but it dismounted
    again, after about 24 hours. I changed the USB/SATA cable, but the
    problem persisted.

    I had a similar problem with an old Pi,
    and it turned out to be the power supply module,
    now replaced with original raspi supply, no more problems.

    FWIW, I've seen behaviour like that too. In the case I saw, it was a USB stick which would dismount every few minutes. And the fix, like yours,
    was to replace a presumably dodgy PSU with an offical RPi one.

    John

    I too, have had problems with power and USB drives, one of my SATA
    adapters is powered, not the one for the drive that failed. I think the
    drive also failed in the powered adapter too, but I wasn't methodical
    enough to be sure. I was getting uptight about losing some of my
    essential services.

    In this instance, I suspect it isn't the problem, given it has worked
    for years, and I have a 20 amp USB power supply, no under voltage
    entries in the log.

    Anyway, I umed and ahed about it. I couldn't decide if the SSD is crap
    or not. The new, working, SSD replacement, is lower power.

    So earlier today, I ordered a USB/SATA dual docking station. I'm going
    to put the old drive in there and just use it for CCTV, at a higher
    refresh rate than before, because I no longer care about it.

    We'll see what happens :-).

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)