Michael wrote:
Second drive gone bad within a few weeks, but it is a 2.5" HDD this time.
I run a couple of smartctl tests and sector 33428384 was reported to have
a read failure; e.g.:
...
# 5 Extended offline Completed: read failure 90% 1351 33428384
# 6 Extended offline Completed: read failure 90% 1350 33428384
# 7 Extended offline Completed: read failure 90% 1350 33428384
# 8 Short offline Completed without error 00% 1350
- # 9 Conveyance offline Completed without error 00% 1350
-
Then I took the drive out of the PC and run some tests again in a USB docking station. First, 'hdparm --read-sector 33428384' returned
success. Then a short offline test returned "Completed without error",
to be followed by a long test with the same result. Interestingly, smartctl now shows:
"3 of 3 failed self-tests are outdated by newer successful extended
offline
self-test # 1"
However, the smarctl Thresholds table is warning "FAILING_NOW":
184 End-to-End_Error 0x0032 099 099 099 Old_age Always FAILING_NOW 1
I'm running a reading test on it now to see if it reports any errors.
Given the results so far, is it worth keeping it around? Perhaps for duplicate non- mission critical data?
I found this command long ago and from what I read, if this reports
zeros, it is considered OK. I'm not familiar with the 184 end-to-end
you show tho. May have to look into that.
smartctl -a /dev/sdX | egrep '(^ID|Reallocated_Sector_Ct|Reported_Uncorrectable_Er|Command_Timeout|Curren t_Pending_Sector|Offline_Uncorrectable)'
When I have a drive that has some sort of errors, I try not to use it
for anything important. I did have one drive that reported some
corrected errors, number 5 in the list, that I used but only after I ran shred on it and the error count didn't change. As we know, bad spots
can be marked bad and the drive knows not to use those. If you do the
same and the error count goes up, I'd ditch the drive. If it is stable,
then maybe use it to play with or something.
Michael wrote:
The 184 End-to-End-Error SMART attribute was developed by Hewlett Packard to check if corruption took place as data was transferred from/to the buffer of the drive. Some drives report it and this one does. I suppose if the RAM data buffer is a bit unreliable, this kind of error will come and go. Bearing in mind this particular drive was in a laptop,
overheating may have also contributed.
I've never seen that one before. It doesn't show up on the drives I
have but another manufacturer may use that. May even be helpful.
I just finished writing to it and no more errors showed up. I'll format it, with a slow read-write test for bad-blocks and recheck the smartclt attributes to see if more errors show up. Then I could play with it with some temporary media files which I don't mind if any are lost and see how it behaves.The drive I mentioned had some small number, single digit number, of
those too. It wasn't much but it did have to correct a error. I'd do
some serious testing and see if it is stable for sure. I guess
badblocks is a good way to go but I've been known to use shred.
Basically, anything that writes to the whole drive should catch errors.
If you can write to it twice and it stays the same, it might be OK.
Basically, test it well until you are comfy that it is stable and can be
at least fairly trusted. I might also add, I stuck a post-it note on it
that it has bad sectors/blocks. So I don't forget. :/
Post back with what you get. Curious to see if it is stable or not. If
so, maybe a drive with a small number of these errors and is stable is
safe to use. Maybe.
Dale
:-) :-)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (1 / 15) |
Uptime: | 160:39:29 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,056 |
Messages: | 6,416,494 |