How to diagnose disk errors when disk appears to be ok?

Posted on

QUESTION :

I have a six-month-old 1TB Seagate drive formatted into 2 NTFS partitions, and the disk appeared to be failing with Windows dropping down from UDMA to PIO mode, reporting Delayed Write Errors, and hanging Explorer when browsing directories. My initial suspicion was that the disk was dying.

However, on further examination it appears that Ubuntu, which doesn’t write to the volume frequently like Windows does, was able to read the disk properly and retrieve all the data intact, saving me from having to use an older backup. Finally, running the Seatools DOS diagnostic reported that the disk has no problems, ie. SMART errors and no bad sectors, apparently.

This, in combination with the relative youth of the disk, suggests that something else is broken. The cable? The PSU? The integrated disk controller? But what would be a good way to diagnose the problem without risking damaging the data? I intend to extract the disk and try it in an external eSATA enclosure and see if the write errors cease, but in the event of the disk appearing to be fine, I would like to be able to confirm what part of the hardware is actually broken here in order to know just what needs replacing.

Are there any good ways to go about this?

ANSWER :

Get a copy of HD Tune , SpeedFan, or Hard Disk Sentinel (my preference) and evaluate the SMART data that’s been stored on the drive. Look in particular at columns like Ultra ATA CRC Error Count, and Reallocated Sector Count. Compare to a known good system.

FYI: SMART errors vary and certain manufacturers use seemingly random data for particular SMART values that sometimes make it appear a drive has rampant errors when in fact all drives of that make/model will have high SMART values (and I’ve seen this on Raw Read Error Rate). So be careful not to jump immediately to conclusions. But if you have Ultra ATA CRC Error Count errors more than 2, and Reallocated Sector Count more than 2, I would feel pretty confident saying something is going wrong. Reallocated Sector Count suggests the drive, Ultra ATA CRC Error Count suggests the cable or controller.

Before you do anything else, back all the data on the disk up. Then you can do whatever you want without risking the data. You should, of course, have a backup anyway, because a disk can fail with on warning whatsoever. But when you have a warning, there’s no excuse for not immediately backing up everything you care about.

“SMART errors” is too vague. There are on the order of 50 SMART attributes that can be monitored, and only some of them may indicate a bad sector.

One advantage of SMART over one-off diagnostics is that it can surface issues over time, that may otherwise be intermittent and elusive.

If you are not monitoring or measuring the attributes, you may be ignoring the information that could answer your question.

There are several monitoring applications available:

http://www.ntfs.com/disk-monitor.htm

http://www.ariolic.com/activesmart

Leave a Reply

Your email address will not be published.