Test suites for i386 PCs used to be common, especially forThe only item that I would suspect has 'wear' issues would be the SD cards.
memory. Something similar for the Pi2 would be a start.
Thanks for reading,
bob prohaska
I've got a couple of old-ish Pi2 v1.1 32 bit hosts. Both
have started to behave strangely, one having trouble booting
and the other reporting what could be thought of as "off by one"
errors that cause processes to report errors.
They're the first two Pi2s purchased, in 2015. They've been
powered on close to 24/7 since and I'm starting to wonder
if they're wearing out. I didn't think it possible, but I'm
running out of other ideas that make sense.
Is there a hardware test suite for the Raspberry Pi that can
identify faulty hardware? I know, this sounds a bit like the
halting problem, which is insoluble, but I think it's slightly
more tractable, maybe. Perhaps what I'm looking for is a kind
of fuzzing test, though fuzzing usually tests software error
handling and I'm looking for hardware errors (I think!).
Test suites for i386 PCs used to be common, especially for
memory. Something similar for the Pi2 would be a start.
Thanks for reading,
bob prohaska
I've got a couple of old-ish Pi2 v1.1 32 bit hosts. Both
have started to behave strangely, one having trouble booting
and the other reporting what could be thought of as "off by one"
errors that cause processes to report errors.
They're the first two Pi2s purchased, in 2015. They've been
powered on close to 24/7 since and I'm starting to wonder
if they're wearing out. I didn't think it possible, but I'm
running out of other ideas that make sense.
Is there a hardware test suite for the Raspberry Pi that can
identify faulty hardware? I know, this sounds a bit like the
halting problem, which is insoluble, but I think it's slightly
more tractable, maybe. Perhaps what I'm looking for is a kind
of fuzzing test, though fuzzing usually tests software error
handling and I'm looking for hardware errors (I think!).
On 20/06/2024 06:49, The Natural Philosopher wrote:
Test suites for i386 PCs used to be common, especially forThe only item that I would suspect has 'wear' issues would be the SD
memory. Something similar for the Pi2 would be a start.
Thanks for reading,
bob prohaska
cards.
We always suspect SD cards, rightly so, but it could be other stuff. It
is quite common for routers to die of old age.
A quick google reveals there are test tools, such as memtest, or maybe stress-ng.
The only item that I would suspect has 'wear' issues would be the SD
cards.
On 20/06/2024 07:12, Pancho wrote:
On 20/06/2024 06:49, The Natural Philosopher wrote:
Test suites for i386 PCs used to be common, especially forThe only item that I would suspect has 'wear' issues would be the SD
memory. Something similar for the Pi2 would be a start.
Thanks for reading,
bob prohaska
cards.
We always suspect SD cards, rightly so, but it could be other stuff.
It is quite common for routers to die of old age.
Only time I have had routers fail is after lightning strikes on or near
DSL lines
On Thu, 2024-06-20 at 06:49 +0100, The Natural Philosopher wrote:
The only item that I would suspect has 'wear' issues would be the SD
cards.
We have smartctl for SATA and NVME drives, that tells us how healthy
these devices are. Is there something similar for SD cards?
There's no standard like SMART for them
the SD card controller to report stats back over the card interface.
On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:
There's no standard like SMART for them
the SD card controller to report stats back over the card interface.
SMART isn’t much use, anyway. I test my storage devices for actual I/O errors.
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual I/O
errors.
That's what you use SMART *for*.
On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual I/O >>> errors.
That's what you use SMART *for*.
No, I test doing actual I/O.
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:Exactly.
On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:
There's no standard like SMART for them
the SD card controller to report stats back over the card interface.
SMART isn’t much use, anyway. I test my storage devices for actual I/O
errors.
By the time any type of storage device is reporting errors during actual
use, it's already in a really bad way, and should have been replaced.
Both spinning discs and flash media are over provisioned with a number
of spare sectors/blocks which they will silently map in, either over
sectors which have started giving read errors, or any flash blocks which
have reached their write limits and could be unreliable.
The SMART information on the drive will tell you when this happens, long before the OS finds the disc has started to be come corrupted. Use this
as the first warning to replace the disc before data loss or complete failure.
---druck
On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:
On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual
I/O errors.
That's what you use SMART *for*.
No, I test doing actual I/O.
So does SMART.
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:
There's no standard like SMART for them the SD card controller to
report stats back over the card interface.
SMART isn’t much use, anyway. I test my storage devices for actual I/O
errors.
By the time any type of storage device is reporting errors during actual
use, it's already in a really bad way, and should have been replaced.
On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
On 21 Jun 2024 09:08:06 +1000, Computer Nerd Kev wrote:
There's no standard like SMART for them the SD card controller to
report stats back over the card interface.
SMART isn’t much use, anyway. I test my storage devices for actual I/O >>> errors.
By the time any type of storage device is reporting errors during actual
use, it's already in a really bad way, and should have been replaced.
This is why you have redundant systems. That’s how the pros do it.
On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:
On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:
On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual >>>>> I/O errors.
That's what you use SMART *for*.
No, I test doing actual I/O.
So does SMART.
No, it extrapolates from its internal firmware behaviour. It tries to
predict failures before they happen.
On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:
By the time any type of storage device is reporting errors during
actual use, it's already in a really bad way, and should have been
replaced.
This is why you have redundant systems. That’s how the pros do it.
No. Its why you use SMART.
AND redundant storage, which SSDs already have built in
On Sun, 21 Jul 2024 09:31:03 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
On Wed, 17 Jul 2024 12:10:22 +0100, druck wrote:
By the time any type of storage device is reporting errors during
actual use, it's already in a really bad way, and should have been
replaced.
This is why you have redundant systems. That’s how the pros do it.
No. Its why you use SMART.
AND redundant storage, which SSDs already have built in
Yes but the kind of redundancy I thought of first was some kind of
RAID or live backup (I do both and monitor SMART) so that there isn't just one copy of the data ready to be used.
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not archive data
On Sun, 21 Jul 2024 10:44:03 +0100No, it is not RAID.
The Natural Philosopher <tnp@invalid.invalid> wrote:
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not
archive data
You have a mirror - that's RAID.
On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
On Sun, 21 Jul 2024 10:44:03 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not
archive data
You have a mirror - that's RAID.No, it is not RAID.
I back up once a night. In between if I erase a file its still there on
the backup
That is not RAID
On Sun, 21 Jul 2024 15:52:36 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:
On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
On Sun, 21 Jul 2024 10:44:03 +0100No, it is not RAID.
The Natural Philosopher <tnp@invalid.invalid> wrote:
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not >>>> archive data
You have a mirror - that's RAID.
I back up once a night. In between if I erase a file its still there on
the backup
That is not RAID
Ah - it's not a mirror then either it's a backup copy.
On Sun, 21 Jul 2024 10:44:03 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not
archive data
You have a mirror - that's RAID. RAID is about smoothly surviving
drive failures. With any storage system there are two important factors - mean time to data loss and probability of data unavailability.
On 21/07/2024 11:47, Ahem A Rivet's Shot wrote:
On Sun, 21 Jul 2024 10:44:03 +0100
The Natural Philosopher <tnp@invalid.invalid> wrote:
Oh indeed. My new server will feature two SMART enabled SSDs...one a
mirror of the other.
I am not interested in RAID. RAID increases availability, but does not
archive data
You have a mirror - that's RAID. RAID is about smoothly surviving
drive failures. With any storage system there are two important factors -
mean time to data loss and probability of data unavailability.
Ignoring whether its RAID or not, mirroring will protect you against a
random failure of one of the drives, which was more useful in the
spinning rust days when random mechanical failures were an issue.
With SSD, write life is the main issue, and if you have two identical mirrored drives, you may find any write life issues, which are not
random, occur at exactly the same time.
So with any type of mirrored arrangement, make sure they are different
makes or models of drive, so it is less likely they fail together.
On 21/07/2024 09:05, Lawrence D'Oliveiro wrote:
On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:
On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:
On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual >>>>>> I/O errors.
That's what you use SMART *for*.
No, I test doing actual I/O.
So does SMART.
No, it extrapolates from its internal firmware behaviour. It tries to
predict failures before they happen.
No it doesn't. It predicts ...
On 21/07/2024 09:06, Lawrence D'Oliveiro wrote:
This is why you have redundant systems. That’s how the pros do it.
No. Its why you use SMART.
On Sun, 21 Jul 2024 09:29:51 +0100, The Natural Philosopher wrote:
On 21/07/2024 09:05, Lawrence D'Oliveiro wrote:
On Wed, 17 Jul 2024 10:35:01 +0100, The Natural Philosopher wrote:
On 17/07/2024 02:30, Lawrence D'Oliveiro wrote:
On Tue, 16 Jul 2024 10:31:11 +0100, The Natural Philosopher wrote:
On 16/07/2024 01:57, Lawrence D'Oliveiro wrote:
SMART isn’t much use, anyway. I test my storage devices for actual >>>>>>> I/O errors.
That's what you use SMART *for*.
No, I test doing actual I/O.
So does SMART.
No, it extrapolates from its internal firmware behaviour. It tries to
predict failures before they happen.
No it doesn't. It predicts ...
s/predicts/tries to predict/. It’s not a prophet, you know.
And it only catches about 30% of failures.
On 24/07/2024 01:34, Lawrence D'Oliveiro wrote:
Companies whose business it is to ensure data integrity do not rely on
SMART.
No, they use hardware RAID for redundancy, extensive performance
monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.
But that's not what the typical person with a Raspberry Pi and a couple
of discs is able to do. The SMART information gives valuable warning of potential failures, to ignore it would be to employ the STUPID feature
of the user.
On Wed, 24 Jul 2024 21:35:02 +0100, druck wrote:
On 24/07/2024 01:34, Lawrence D'Oliveiro wrote:
Companies whose business it is to ensure data integrity do not rely on SMART.
No, they use hardware RAID for redundancy, extensive performance monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.
Actually, no. They wait until the disks actually fail before replacing
them.
But that's not what the typical person with a Raspberry Pi and a couple
of discs is able to do. The SMART information gives valuable warning of potential failures, to ignore it would be to employ the STUPID feature of the user.
Unfortunately, SMART only catches about 30% of potential failures. That's
why relying on it is not smart.
Actually, no. They wait until the disks actually fail before
replacing
them.
Anyone with any sense would replace them before the bathtub failure
curve starts to rise, which is usually not long after the end of the
warranty period.
Anyone with any sense would replace them before the bathtub failure curve starts to rise, which is usually not long after the end of the warranty period.
Anyone with any sense would replace them before the bathtub failure
curve starts to rise, which is usually not long after the end of the
warranty period.
Unfortunately, SMART only catches about 30% of potential failures.
That's why relying on it is not smart.
It's smarter than catching 0% of potential failures by waiting until
they have already happened.
No, they use hardware RAID for redundancy, extensive performance
monitoring, and retire most disks before they fail based on the small percentage of failures of thousands of other discs of the same type.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 371 |
Nodes: | 16 (2 / 14) |
Uptime: | 37:35:11 |
Calls: | 7,932 |
Calls today: | 2 |
Files: | 12,998 |
Messages: | 5,805,631 |