• Is there a test suite for Pi2/armv7 ?

    From bp@www.zefox.net@3:770/3 to All on Wed Jun 19 21:56:02 2024
    I've got a couple of old-ish Pi2 v1.1 32 bit hosts. Both
    have started to behave strangely, one having trouble booting
    and the other reporting what could be thought of as "off by one"
    errors that cause processes to report errors.

    They're the first two Pi2s purchased, in 2015. They've been
    powered on close to 24/7 since and I'm starting to wonder
    if they're wearing out. I didn't think it possible, but I'm
    running out of other ideas that make sense.

    Is there a hardware test suite for the Raspberry Pi that can
    identify faulty hardware? I know, this sounds a bit like the
    halting problem, which is insoluble, but I think it's slightly
    more tractable, maybe. Perhaps what I'm looking for is a kind
    of fuzzing test, though fuzzing usually tests software error
    handling and I'm looking for hardware errors (I think!).

    Test suites for i386 PCs used to be common, especially for
    memory. Something similar for the Pi2 would be a start.

    Thanks for reading,

    bob prohaska

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to The Natural Philosopher on Thu Jun 20 07:12:26 2024
    On 20/06/2024 06:49, The Natural Philosopher wrote:

    Test suites for i386 PCs used to be common, especially for
    memory. Something similar for the Pi2 would be a start.

    Thanks for reading,

    bob prohaska



    The only item that I would suspect has 'wear' issues would be the SD cards.




    We always suspect SD cards, rightly so, but it could be other stuff. It
    is quite common for routers to die of old age.

    A quick google reveals there are test tools, such as memtest, or maybe stress-ng.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to bp@www.zefox.net on Thu Jun 20 06:49:05 2024
    On 19/06/2024 22:56, bp@www.zefox.net wrote:
    I've got a couple of old-ish Pi2 v1.1 32 bit hosts. Both
    have started to behave strangely, one having trouble booting
    and the other reporting what could be thought of as "off by one"
    errors that cause processes to report errors.

    They're the first two Pi2s purchased, in 2015. They've been
    powered on close to 24/7 since and I'm starting to wonder
    if they're wearing out. I didn't think it possible, but I'm
    running out of other ideas that make sense.

    Is there a hardware test suite for the Raspberry Pi that can
    identify faulty hardware? I know, this sounds a bit like the
    halting problem, which is insoluble, but I think it's slightly
    more tractable, maybe. Perhaps what I'm looking for is a kind
    of fuzzing test, though fuzzing usually tests software error
    handling and I'm looking for hardware errors (I think!).

    Test suites for i386 PCs used to be common, especially for
    memory. Something similar for the Pi2 would be a start.

    Thanks for reading,

    bob prohaska



    The only item that I would suspect has 'wear' issues would be the SD cards.


    --
    “Some people like to travel by train because it combines the slowness of
    a car with the cramped public exposure of 
an airplane.”

    Dennis Miller

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Richard Kettlewell@3:770/3 to bp@www.zefox.net on Thu Jun 20 09:05:56 2024
    <bp@www.zefox.net> writes:
    I've got a couple of old-ish Pi2 v1.1 32 bit hosts. Both
    have started to behave strangely, one having trouble booting
    and the other reporting what could be thought of as "off by one"
    errors that cause processes to report errors.

    I don’t know of a test suite as such but if it can’t reliably boot a
    Linux kernel from known-good storage (e.g. fresh install on a uSD card)
    then you’ve got the core of your answer: it’s broken.

    As others have said, SD card failure should be right at the top of the
    list of possible causes.

    “Processes reporting errors” is harder to assess with so little information, certainly there’s not enough presented here to assume a
    hardware fault.

    They're the first two Pi2s purchased, in 2015. They've been
    powered on close to 24/7 since and I'm starting to wonder
    if they're wearing out. I didn't think it possible, but I'm
    running out of other ideas that make sense.

    Is there a hardware test suite for the Raspberry Pi that can
    identify faulty hardware? I know, this sounds a bit like the
    halting problem, which is insoluble, but I think it's slightly
    more tractable, maybe. Perhaps what I'm looking for is a kind
    of fuzzing test, though fuzzing usually tests software error
    handling and I'm looking for hardware errors (I think!).

    Fuzzing is a technique for discovering bugs you didn’t yet know
    existed. In this case it seems you can already reproduce the issues...

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Pancho on Thu Jun 20 08:44:33 2024
    On 20/06/2024 07:12, Pancho wrote:
    On 20/06/2024 06:49, The Natural Philosopher wrote:

    Test suites for i386 PCs used to be common, especially for
    memory. Something similar for the Pi2 would be a start.

    Thanks for reading,

    bob prohaska



    The only item that I would suspect has 'wear' issues would be the SD
    cards.




    We always suspect SD cards, rightly so, but it could be other stuff. It
    is quite common for routers to die of old age.


    Only time I have had routers fail is after lightning strikes on or near
    DSL lines

    A quick google reveals there are test tools, such as memtest, or maybe stress-ng.

    Well all semiconductors are subject to dopant migration and/or
    contamination after sealing failures. Especially at elevated
    temperatures. But since nothing on a Pi is 'user replaceable' knowing
    what is faulty is of academic interest only.

    SD cards are cheap and so are Pis. selective replacement would seem to
    be a simpler method to achieve a working setup again.


    --
    "It is an established fact to 97% confidence limits that left wing
    conspirators see right wing conspiracies everywhere"

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Single Stage to Orbit@3:770/3 to The Natural Philosopher on Thu Jun 20 10:47:21 2024
    On Thu, 2024-06-20 at 06:49 +0100, The Natural Philosopher wrote:
    The only item that I would suspect has 'wear' issues would be the SD
    cards.

    We have smartctl for SATA and NVME drives, that tells us how healthy
    these devices are. Is there something similar for SD cards?
    --
    Tactical Nuclear Kittens

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Pancho@3:770/3 to The Natural Philosopher on Thu Jun 20 10:58:13 2024
    On 20/06/2024 08:44, The Natural Philosopher wrote:
    On 20/06/2024 07:12, Pancho wrote:
    On 20/06/2024 06:49, The Natural Philosopher wrote:

    Test suites for i386 PCs used to be common, especially for
    memory. Something similar for the Pi2 would be a start.

    Thanks for reading,

    bob prohaska



    The only item that I would suspect has 'wear' issues would be the SD
    cards.




    We always suspect SD cards, rightly so, but it could be other stuff.
    It is quite common for routers to die of old age.


    Only time I have had routers fail is after lightning strikes on or near
    DSL lines



    I've had failures to: ADSL router, VDSL modem, cable router. I dunno
    what goes wrong, but these are devices running close to design capacity
    for many years. Whereas most desktop PCs rarely break a sweat. We know electronic components like electrolytic capacitors do wear out.

    Which reminds me, an even more common failure is USB wall warts, that
    probably should show up in dmesg or journalctl as an "under voltage" error.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Computer Nerd Kev@3:770/3 to Single Stage to Orbit on Fri Jun 21 09:08:06 2024
    Single Stage to Orbit <alex.buell@munted.eu> wrote:
    On Thu, 2024-06-20 at 06:49 +0100, The Natural Philosopher wrote:
    The only item that I would suspect has 'wear' issues would be the SD
    cards.

    We have smartctl for SATA and NVME drives, that tells us how healthy
    these devices are. Is there something similar for SD cards?

    Only tools that will write data to them and then read it back to
    check if the data is still correct. There's no standard like SMART
    for them the SD card controller to report stats back over the card
    interface.

    --
    __ __
    #_ < |\| |< _#

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to Computer Nerd Kev on Tue Jul 16 00:57:04 2024
    On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:

    There's no standard like SMART for them
    the SD card controller to report stats back over the card interface.

    SMART isn’t much use, anyway. I test my storage devices for actual I/O errors.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Lawrence D'Oliveiro on Tue Jul 16 10:31:11 2024
    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
    On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:

    There's no standard like SMART for them
    the SD card controller to report stats back over the card interface.

    SMART isn’t much use, anyway. I test my storage devices for actual I/O errors.

    That's what you use SMART *for*. Amongst other things


    --
    Karl Marx said religion is the opium of the people.
    But Marxism is the crack cocaine.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to The Natural Philosopher on Wed Jul 17 01:30:04 2024
    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual I/O
    errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Lawrence D'Oliveiro on Wed Jul 17 10:35:01 2024
    On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:
    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual I/O >>> errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    So does SMART.

    And with SSDS you dont want to be actually doing any superfluous read or
    write ops
    I am curious. Have you ever used SMART at all?


    --
    “It is not the truth of Marxism that explains the willingness of intellectuals to believe it, but the power that it confers on
    intellectuals, in their attempts to control the world. And since...it is
    futile to reason someone out of a thing that he was not reasoned into,
    we can conclude that Marxism owes its remarkable power to survive every criticism to the fact that it is not a truth-directed but a
    power-directed system of thought.”
    Sir Roger Scruton

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From druck@3:770/3 to All on Wed Jul 17 12:10:22 2024
    T24gMTYvMDcvMjAyNCAwMTo1NywgTGF3cmVuY2UgRCdPbGl2ZWlybyB3cm90ZToNCj4gT24g MjEgSnVuIDIwMjQgMDk6MDg6MDYgKzEwMDAsIENvbXB1dGVyIE5lcmQgS2V2IHdyb3RlOg0K PiANCj4+IFRoZXJlJ3Mgbm8gc3RhbmRhcmQgbGlrZSBTTUFSVCBmb3IgdGhlbQ0KPj4gdGhl IFNEIGNhcmQgY29udHJvbGxlciB0byByZXBvcnQgc3RhdHMgYmFjayBvdmVyIHRoZSBjYXJk IGludGVyZmFjZS4NCj4gDQo+IFNNQVJUIGlzbuKAmXQgbXVjaCB1c2UsIGFueXdheS4gSSB0 ZXN0IG15IHN0b3JhZ2UgZGV2aWNlcyBmb3IgYWN0dWFsIEkvTw0KPiBlcnJvcnMuDQoNCkJ5 IHRoZSB0aW1lIGFueSB0eXBlIG9mIHN0b3JhZ2UgZGV2aWNlIGlzIHJlcG9ydGluZyBlcnJv cnMgZHVyaW5nIGFjdHVhbCANCnVzZSwgaXQncyBhbHJlYWR5IGluIGEgcmVhbGx5IGJhZCB3 YXksIGFuZCBzaG91bGQgaGF2ZSBiZWVuIHJlcGxhY2VkLg0KDQpCb3RoIHNwaW5uaW5nIGRp c2NzIGFuZCBmbGFzaCBtZWRpYSBhcmUgb3ZlciBwcm92aXNpb25lZCB3aXRoIGEgbnVtYmVy IA0Kb2Ygc3BhcmUgc2VjdG9ycy9ibG9ja3Mgd2hpY2ggdGhleSB3aWxsIHNpbGVudGx5IG1h cCBpbiwgZWl0aGVyIG92ZXIgDQpzZWN0b3JzIHdoaWNoIGhhdmUgc3RhcnRlZCBnaXZpbmcg cmVhZCBlcnJvcnMsIG9yIGFueSBmbGFzaCBibG9ja3Mgd2hpY2ggDQpoYXZlIHJlYWNoZWQg dGhlaXIgd3JpdGUgbGltaXRzIGFuZCBjb3VsZCBiZSB1bnJlbGlhYmxlLg0KDQpUaGUgU01B UlQgaW5mb3JtYXRpb24gb24gdGhlIGRyaXZlIHdpbGwgdGVsbCB5b3Ugd2hlbiB0aGlzIGhh cHBlbnMsIGxvbmcgDQpiZWZvcmUgdGhlIE9TIGZpbmRzIHRoZSBkaXNjIGhhcyBzdGFydGVk IHRvIGJlIGNvbWUgY29ycnVwdGVkLiBVc2UgdGhpcyANCmFzIHRoZSBmaXJzdCB3YXJuaW5n IHRvIHJlcGxhY2UgdGhlIGRpc2MgYmVmb3JlIGRhdGEgbG9zcyBvciBjb21wbGV0ZSANCmZh aWx1cmUuDQoNCi0tLWRydWNrDQo=

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to druck on Wed Jul 17 13:43:07 2024
    On 17/07/2024 12:10, druck wrote:
    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
    On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:

    There's no standard like SMART for them
    the SD card controller to report stats back over the card interface.

    SMART isn’t much use, anyway. I test my storage devices for actual I/O
    errors.

    By the time any type of storage device is reporting errors during actual
    use, it's already in a really bad way, and should have been replaced.

    Both spinning discs and flash media are over provisioned with a number
    of spare sectors/blocks which they will silently map in, either over
    sectors which have started giving read errors, or any flash blocks which
    have reached their write limits and could be unreliable.

    The SMART information on the drive will tell you when this happens, long before the OS finds the disc has started to be come corrupted. Use this
    as the first warning to replace the disc before data loss or complete failure.

    ---druck
    Exactly.
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
    UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x0032 100 100 050 Old_age
    Always - 0
    5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age
    Always - 0
    9 Power_On_Hours 0x0032 100 100 050 Old_age
    Always - 23833
    12 Power_Cycle_Count 0x0032 100 100 050 Old_age
    Always - 113
    160 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 0
    161 Unknown_Attribute 0x0033 100 100 050 Pre-fail Always
    - 100
    163 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 8
    164 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 70687
    165 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 326
    166 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 115
    167 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 142
    168 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 7000
    169 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 98
    175 Program_Fail_Count_Chip 0x0032 100 100 050 Old_age Always
    - 0
    176 Erase_Fail_Count_Chip 0x0032 100 100 050 Old_age Always
    - 0
    177 Wear_Leveling_Count 0x0032 100 100 050 Old_age Always
    - 0
    178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 050 Old_age Always
    - 0
    181 Program_Fail_Cnt_Total 0x0032 100 100 050 Old_age Always
    - 0
    182 Erase_Fail_Count_Total 0x0032 100 100 050 Old_age Always
    - 0
    192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always
    - 51
    194 Temperature_Celsius 0x0022 100 100 050 Old_age Always
    - 40
    195 Hardware_ECC_Recovered 0x0032 100 100 050 Old_age Always
    - 1397810
    196 Reallocated_Event_Count 0x0032 100 100 050 Old_age Always
    - 0
    197 Current_Pending_Sector 0x0032 100 100 050 Old_age Always
    - 0
    198 Offline_Uncorrectable 0x0032 100 100 050 Old_age Always
    - 0
    199 UDMA_CRC_Error_Count 0x0032 100 100 050 Old_age Always
    - 0
    232 Available_Reservd_Space 0x0032 100 100 050 Old_age Always
    - 100
    241 Total_LBAs_Written 0x0030 100 100 050 Old_age
    Offline - 247432
    242 Total_LBAs_Read 0x0030 100 100 050 Old_age
    Offline - 56254
    245 Unknown_Attribute 0x0032 100 100 050 Old_age Always
    - 346410

    Raw error read read is there, also something you wouldn't find outside
    SMART - reallocated sector count plus erase fail and wear levelling counts




    --
    Ideas are more powerful than guns. We would not let our enemies have
    guns, why should we let them have ideas?

    Josef Stalin

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to The Natural Philosopher on Sun Jul 21 08:05:20 2024
    On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:

    On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:

    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual
    I/O errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    So does SMART.

    No, it extrapolates from its internal firmware behaviour. It tries to
    predict failures before they happen.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to druck on Sun Jul 21 08:06:21 2024
    On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:

    There's no standard like SMART for them the SD card controller to
    report stats back over the card interface.

    SMART isn’t much use, anyway. I test my storage devices for actual I/O
    errors.

    By the time any type of storage device is reporting errors during actual
    use, it's already in a really bad way, and should have been replaced.

    This is why you have redundant systems. That’s how the pros do it.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Lawrence D'Oliveiro on Sun Jul 21 09:31:03 2024
    On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:

    There's no standard like SMART for them the SD card controller to
    report stats back over the card interface.

    SMART isn’t much use, anyway. I test my storage devices for actual I/O >>> errors.

    By the time any type of storage device is reporting errors during actual
    use, it's already in a really bad way, and should have been replaced.

    This is why you have redundant systems. That’s how the pros do it.

    No. Its why you use SMART.

    AND redundant storage, which SSDs already have built in


    --
    WOKE is an acronym... Without Originality, Knowledge or Education.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Lawrence D'Oliveiro on Sun Jul 21 09:29:51 2024
    On 21/07/2024 09:05, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:

    On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:

    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual >>>>> I/O errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    So does SMART.

    No, it extrapolates from its internal firmware behaviour. It tries to
    predict failures before they happen.

    No it doesn't. It predicts from stored values of errors. I/O amongst them

    Do try to keep up.


    --
    If I had all the money I've spent on drink...
    ..I'd spend it on drink.

    Sir Henry (at Rawlinson's End)

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Ahem A Rivet's Shot@3:770/3 to The Natural Philosopher on Sun Jul 21 10:22:38 2024
    On Sun, 21 Jul 2024 09:31:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:

    By the time any type of storage device is reporting errors during
    actual use, it's already in a really bad way, and should have been
    replaced.

    This is why you have redundant systems. That’s how the pros do it.

    No. Its why you use SMART.

    AND redundant storage, which SSDs already have built in

    Yes but the kind of redundancy I thought of first was some kind of
    RAID or live backup (I do both and monitor SMART) so that there isn't just
    one copy of the data ready to be used.

    --
    Steve O'Hara-Smith
    Odds and Ends at http://www.sohara.org/
    For forms of government let fools contest
    Whate're is best administered is best - Alexander Pope

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Ahem A Rivet's Shot on Sun Jul 21 10:44:03 2024
    On 21/07/2024 10:22, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 09:31:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:

    By the time any type of storage device is reporting errors during
    actual use, it's already in a really bad way, and should have been
    replaced.

    This is why you have redundant systems. That’s how the pros do it.

    No. Its why you use SMART.

    AND redundant storage, which SSDs already have built in

    Yes but the kind of redundancy I thought of first was some kind of
    RAID or live backup (I do both and monitor SMART) so that there isn't just one copy of the data ready to be used.

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not
    archive data


    --
    Labour - a bunch of rich people convincing poor people to vote for rich
    people by telling poor people that "other" rich people are the reason
    they are poor.

    Peter Thompson

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Ahem A Rivet's Shot@3:770/3 to The Natural Philosopher on Sun Jul 21 11:47:53 2024
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not archive data

    You have a mirror - that's RAID. RAID is about smoothly surviving
    drive failures. With any storage system there are two important factors -
    mean time to data loss and probability of data unavailability.

    My NAS has two mirrored 1TB NVMe SSDs and two 10TB mirrored hard
    discs holding ZFS filesystems with regular snapshots enabled - the archive server has four 4TB hard discs in a ZFS RAIDZ1 (FEC RAID with 3 data and 1 parity) - it's also in a different building. There is a continuous cycle to
    the archive server keeping the archive up to date within a minute or two.

    End result
    - Fast and slow (relatively) stores
    - Snapshots for protection against silliness or corruption
    - RAID for protection against drive failure
    - Archive for protection against machine/building loss

    According to an MTTDL calculator I found once (no idea how
    trustworthy it is) I should be good for about a century.

    --
    Steve O'Hara-Smith
    Odds and Ends at http://www.sohara.org/
    For forms of government let fools contest
    Whate're is best administered is best - Alexander Pope

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Ahem A Rivet's Shot on Sun Jul 21 15:52:36 2024
    On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not
    archive data

    You have a mirror - that's RAID.
    No, it is not RAID.

    I back up once a night. In between if I erase a file its still there on
    the backup
    That is not RAID



    --
    Gun Control: The law that ensures that only criminals have guns.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Ahem A Rivet's Shot@3:770/3 to The Natural Philosopher on Sun Jul 21 16:06:51 2024
    On Sun, 21 Jul 2024 15:52:36 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not
    archive data

    You have a mirror - that's RAID.
    No, it is not RAID.

    I back up once a night. In between if I erase a file its still there on
    the backup
    That is not RAID

    Ah - it's not a mirror then either it's a backup copy.

    --
    Steve O'Hara-Smith
    Odds and Ends at http://www.sohara.org/
    For forms of government let fools contest
    Whate're is best administered is best - Alexander Pope

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Ahem A Rivet's Shot on Sun Jul 21 18:38:13 2024
    On 21/07/2024 16:06, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 15:52:36 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not >>>> archive data

    You have a mirror - that's RAID.
    No, it is not RAID.

    I back up once a night. In between if I erase a file its still there on
    the backup
    That is not RAID

    Ah - it's not a mirror then either it's a backup copy.


    --
    "In our post-modern world, climate science is not powerful because it is
    true: it is true because it is powerful."

    Lucas Bergkamp

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From druck@3:770/3 to Ahem A Rivet's Shot on Sun Jul 21 21:28:27 2024
    On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not
    archive data

    You have a mirror - that's RAID. RAID is about smoothly surviving
    drive failures. With any storage system there are two important factors - mean time to data loss and probability of data unavailability.

    Ignoring whether its RAID or not, mirroring will protect you against a
    random failure of one of the drives, which was more useful in the
    spinning rust days when random mechanical failures were an issue.

    With SSD, write life is the main issue, and if you have two identical
    mirrored drives, you may find any write life issues, which are not
    random, occur at exactly the same time.

    So with any type of mirrored arrangement, make sure they are different
    makes or models of drive, so it is less likely they fail together.

    ---druck

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to druck on Mon Jul 22 09:50:21 2024
    On 21/07/2024 21:28, druck wrote:
    On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
    On Sun, 21 Jul 2024 10:44:03 +0100
    The Natural Philosopher <tnp@invalid.invalid> wrote:

    Oh indeed. My new server will feature two SMART enabled SSDs...one a
    mirror of the other.
    I am not interested in RAID. RAID increases availability, but does not
    archive data

        You have a mirror - that's RAID. RAID is about smoothly surviving
    drive failures. With any storage system there are two important factors -
    mean time to data loss and probability of data unavailability.

    Ignoring whether its RAID or not, mirroring will protect you against a
    random failure of one of the drives, which was more useful in the
    spinning rust days when random mechanical failures were an issue.

    I agree.

    With SSD, write life is the main issue, and if you have two identical mirrored drives, you may find any write life issues, which are not
    random, occur at exactly the same time.

    But never at exactly the SAME time.

    Remember the primary drive gets written to all day long as stuff like
    this post is downmloaded, read and deleted.

    The secondary gets a once a day rsync,

    And, whilst I have never had an SSD wear out fromn writes in the past 8
    years I have had one fail in a quite different way shortly after purchase.


    So with any type of mirrored arrangement, make sure they are different
    makes or models of drive, so it is less likely they fail together.


    They are not subject to the same usage pattern, and they are not made
    from the same components.

    As long as they dont fail within 24 hours of each other


    --
    In a Time of Universal Deceit, Telling the Truth Is a Revolutionary Act.

    - George Orwell

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to The Natural Philosopher on Wed Jul 24 00:33:18 2024
    On Sun, 21 Jul 2024 09:29:51 +0100, The Natural Philosopher wrote:

    On 21/07/2024 09:05, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:

    On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:

    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual >>>>>> I/O errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    So does SMART.

    No, it extrapolates from its internal firmware behaviour. It tries to
    predict failures before they happen.

    No it doesn't. It predicts ...

    s/predicts/tries to predict/. It’s not a prophet, you know.

    And it only catches about 30% of failures.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to The Natural Philosopher on Wed Jul 24 00:34:03 2024
    On Sun, 21 Jul 2024 09:31:03 +0100, The Natural Philosopher wrote:

    On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:

    This is why you have redundant systems. That’s how the pros do it.

    No. Its why you use SMART.

    Companies whose business it is to ensure data integrity do not rely on
    SMART.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to Lawrence D'Oliveiro on Wed Jul 24 07:34:21 2024
    On 24/07/2024 01:33, Lawrence D'Oliveiro wrote:
    On Sun, 21 Jul 2024 09:29:51 +0100, The Natural Philosopher wrote:

    On 21/07/2024 09:05, Lawrence D'Oliveiro wrote:
    On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:

    On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:

    On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:

    On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:

    SMART isn’t much use, anyway. I test my storage devices for actual >>>>>>> I/O errors.

    That's what you use SMART *for*.

    No, I test doing actual I/O.

    So does SMART.

    No, it extrapolates from its internal firmware behaviour. It tries to
    predict failures before they happen.

    No it doesn't. It predicts ...

    s/predicts/tries to predict/. It’s not a prophet, you know.

    And it only catches about 30% of failures.

    *plonk*

    --
    "Corbyn talks about equality, justice, opportunity, health care, peace, community, compassion, investment, security, housing...."
    "What kind of person is not interested in those things?"

    "Jeremy Corbyn?"

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From druck@3:770/3 to All on Wed Jul 24 21:35:02 2024
    T24gMjQvMDcvMjAyNCAwMTozNCwgTGF3cmVuY2UgRCdPbGl2ZWlybyB3cm90ZToNCj4gT24g U3VuLCAyMSBKdWwgMjAyNCAwOTozMTowMyArMDEwMCwgVGhlIE5hdHVyYWwgUGhpbG9zb3Bo ZXIgd3JvdGU6DQo+IA0KPj4gT24gMjEvMDcvMjAyNCAwOTowNiwgTGF3cmVuY2UgRCdPbGl2 ZWlybyB3cm90ZToNCj4+DQo+Pj4gVGhpcyBpcyB3aHkgeW91IGhhdmUgcmVkdW5kYW50IHN5 c3RlbXMuIFRoYXTigJlzIGhvdyB0aGUgcHJvcyBkbyBpdC4NCj4+DQo+PiBOby4gSXRzIHdo eSB5b3UgdXNlIFNNQVJULg0KPiANCj4gQ29tcGFuaWVzIHdob3NlIGJ1c2luZXNzIGl0IGlz IHRvIGVuc3VyZSBkYXRhIGludGVncml0eSBkbyBub3QgcmVseSBvbg0KPiBTTUFSVC4NCg0K Tm8sIHRoZXkgdXNlIGhhcmR3YXJlIFJBSUQgZm9yIHJlZHVuZGFuY3ksIGV4dGVuc2l2ZSBw ZXJmb3JtYW5jZSANCm1vbml0b3JpbmcsIGFuZCByZXRpcmUgbW9zdCBkaXNrcyBiZWZvcmUg dGhleSBmYWlsIGJhc2VkIG9uIHRoZSBzbWFsbCANCnBlcmNlbnRhZ2Ugb2YgZmFpbHVyZXMg b2YgdGhvdXNhbmRzIG9mIG90aGVyIGRpc2NzIG9mIHRoZSBzYW1lIHR5cGUuDQoNCkJ1dCB0 aGF0J3Mgbm90IHdoYXQgdGhlIHR5cGljYWwgcGVyc29uIHdpdGggYSBSYXNwYmVycnkgUGkg YW5kIGEgY291cGxlIA0Kb2YgZGlzY3MgaXMgYWJsZSB0byBkby4gVGhlIFNNQVJUIGluZm9y bWF0aW9uIGdpdmVzIHZhbHVhYmxlIHdhcm5pbmcgb2YgDQpwb3RlbnRpYWwgZmFpbHVyZXMs IHRvIGlnbm9yZSBpdCB3b3VsZCBiZSB0byBlbXBsb3kgdGhlIFNUVVBJRCBmZWF0dXJlIA0K b2YgdGhlIHVzZXIuDQoNCi0tLWRydWNrDQo=

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to druck on Thu Jul 25 00:45:40 2024
    On Wed, 24 Jul 2024 21:35:02 +0100, druck wrote:

    On 24/07/2024 01:34, Lawrence D'Oliveiro wrote:

    Companies whose business it is to ensure data integrity do not rely on
    SMART.

    No, they use hardware RAID for redundancy, extensive performance
    monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.

    Actually, no. They wait until the disks actually fail before replacing
    them.

    But that's not what the typical person with a Raspberry Pi and a couple
    of discs is able to do. The SMART information gives valuable warning of potential failures, to ignore it would be to employ the STUPID feature
    of the user.

    Unfortunately, SMART only catches about 30% of potential failures. That’s
    why relying on it is not smart.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From David Higton@3:770/3 to Lawrence D'Oliveiro on Thu Jul 25 20:20:38 2024
    In message <v7s77k$1v8pi$3@dont-email.me>
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Wed, 24 Jul 2024 21:35:02 +0100, druck wrote:

    On 24/07/2024 01:34, Lawrence D'Oliveiro wrote:

    Companies whose business it is to ensure data integrity do not rely on SMART.

    No, they use hardware RAID for redundancy, extensive performance monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.

    Actually, no. They wait until the disks actually fail before replacing
    them.

    Anyone with any sense would replace them before the bathtub failure curve starts to rise, which is usually not long after the end of the warranty
    period.

    But that's not what the typical person with a Raspberry Pi and a couple
    of discs is able to do. The SMART information gives valuable warning of potential failures, to ignore it would be to employ the STUPID feature of the user.

    Unfortunately, SMART only catches about 30% of potential failures. That's
    why relying on it is not smart.

    It's smarter than catching 0% of potential failures by waiting until they
    have already happened.

    David

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Single Stage to Orbit@3:770/3 to David Higton on Thu Jul 25 21:20:00 2024
    On Thu, 2024-07-25 at 20:20 +0100, David Higton wrote:
    Actually, no. They wait until the disks actually fail before
    replacing
    them.

    Anyone with any sense would replace them before the bathtub failure
    curve starts to rise, which is usually not long after the end of the
    warranty period.

    In my previous job, we replaced them when they failed, not before.
    Customers don't like paying for hardware that will sit there doing
    nothing.
    --
    Tactical Nuclear Kittens

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From The Natural Philosopher@3:770/3 to David Higton on Fri Jul 26 09:18:30 2024
    On 25/07/2024 20:20, David Higton wrote:
    Anyone with any sense would replace them before the bathtub failure curve starts to rise, which is usually not long after the end of the warranty period.

    It entirely depends on usage pattern,. A disk which is being accessed
    and having huge amounts of data written and erased, yes.

    A disk in a desktop computer that loads the OS and does bugger all.
    Many times longer.



    --
    “Ideas are inherently conservative. They yield not to the attack of
    other ideas but to the massive onslaught of circumstance"

    - John K Galbraith

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From Lawrence D'Oliveiro@3:770/3 to David Higton on Sun Jul 28 07:46:04 2024
    On Thu, 25 Jul 2024 20:20:38 +0100, David Higton wrote:

    Anyone with any sense would replace them before the bathtub failure
    curve starts to rise, which is usually not long after the end of the
    warranty period.

    BackBlaze is a company whose business is data integrity. Every 3 months
    they publish a blog post on the reliability stats of the drives that they
    use. They don’t replace drives until they fail. Individual failures have essentially zero impact on their business, while replacing drives is a
    cost.

    Unfortunately, SMART only catches about 30% of potential failures.
    That's why relying on it is not smart.

    It's smarter than catching 0% of potential failures by waiting until
    they have already happened.

    The problem is being caught by surprise 70% of the time.

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)
  • From druck@3:770/3 to druck on Wed Aug 7 15:52:30 2024
    On 24/07/2024 21:35, druck wrote:
    No, they use hardware RAID for redundancy, extensive performance
    monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.

    This is informative https://www.theregister.com/2024/08/06/backblaze_sees_drive_failure_rates/

    ---druck

    --- SoupGate-Win32 v1.05
    * Origin: Agency HUB, Dunedin - New Zealand | Fido<>Usenet Gateway (3:770/3)