At the risk of providing advice that could ultimately reduce the amount of business where MicroCom's RAID recovery expertise is sought, it's certainly fair to say that if all RAID-5 data storage subsystem operators had a sound understanding of the information shown on the page you're now reading, that we (as well as all our competitors) would be answering a good deal fewer calls for help.
The chances are, however, that if you are reading this page something has already gone awry, multiple drives have gone off-line, your digital asset repository is now threatened, in need of repair, and you're already looking for help. Please read on. You've just found knowledgeable advice that will provide a helpful resource: a refresher on how not to unintentionally cause matters to grow worse. If you're lucky (not to mention smart) and you're doing some research in ADVANCE of actual implementation of remedies to put things back in order, you're taking advantage of the best possible kind of support: preparedness. The following information could be immensely valuable in saving both time and money. As unsuccessful, failed user remedies begin to stack up, the time and expense of professional data recovery goes up right along with them, while the statistical prognosis for complete recovery goes in the opposite direction.
The RAID-5 disk drive array is one of the most prevalent means of providing mass data storage within a fault tolerant repository. In short, this means that if any member of the array, that is, any one of the hard disk drives in the array suffers a complete failure, no data is lost: the drive failure has been "tolerated" (the RAID-5 becomes "degraded" to RAID-0). Other expressions for "fault tolerant" common within this context are "enhanced data availability" as well as more general words in the vein of "increased reliability" and "data loss prevention". This design implementation leads many to think that they might be immune to loss of data. Not so.
An accurate conceptual understanding about RAID-5 architecture and technology, which can be acquired in less than an hour, can save a lot of torment. To maintain the health of your data systems, a good place to start is by gaining a firm grasp of just two facts and one law.
There are two fundamental facts, facts that seem quite obvious on the surface. There is, however, substantial insight obtainable to those who reflect upon and probe the consequences arising from these simple points. These points then merit thoughtful analysis in order to fully and deeply appreciate their full impact and relation to the operation of any RAID-5 subsystem array:
The Two Facts
(1) The array must consist of three or more hard disk drive elements or members.
(2) The reliable availability of stored data can tolerate the failure of no more than one member (in case of RAID-6, the limit is two members).
The One Law
Beyond the seemingly obvious facts, there is ONE fundamental law that should be tattooed on the brain of every computer system operator responsible for RAID-5 data storage:
* A RAID-5 rebuild procedure will ALWAYS FAIL unless ALL of the subsystem components, i.e. the controller and all hard disk drive members are present and accounted for, this means: verifiably working properly. Failed rebuild attempts are lamentable and make successful data recovery more difficult and potentially impossible.
Here is the list of general "Do's" and "Don'ts" you need to know if you are using RAID-5 for data storage. If you're reading this list for education, you'll be wiser for the experience. If you're already dealing with an emergency situation, do read on, but as you'll see, it may be too late for some of the advice provided.
- Fully document your storage array configuration during initial setup, including the physical arrangement and order of connection for all component devices, especially the disk drives.
- Maintain sequence of array members by tagging the physical disk drive units (not simply the tray the drive is mounted in) while everything is fully functional, and certainly, if trouble does arise, BEFORE any trouble-shooting begins.
- Test the subsystem's ability to recover from a drive failure. With all data backed up, remove one of the drives from the subsystem while it's running, and bring it back to full, undegraded operation using a blank hard drive replacement.
- Understand the fundamental concepts behind RAID-5 functioning, so that when any anomaly occurs remedies are always focused on preserving the data.
- Recognize that if as many as two hard drive elements have failed (that means any more than ONE), there is no possibility for regaining access to stored data without high-level data recovery expertise (call MicroCom right away).
- Understand the distinctions between RAID-5 and other common levels of implementation, i.e., RAID-0, RAID-1, RAID-6, and RAID-10; knowing what your subsystem is NOT will help you understand better what it really is.
- Be familiar with any RAID subsystem manufacturer's documentation and technical descriptions. At the same time, don't rely on manufacturer service help to protect your data: their focus is on proper functioning of the equipment and they generally assume your data is backed up.
- Immediately replace a failed hard disk drive element upon fault detection; the replacement must be a pre-tested, known working device, fully conforming to subsystem designer specifications.
- Fully understand what "fault tolerance" really means in terms applicable to your specific subsystem, and always take action without delay to protect its full, complete functioning.
- When and if you begin in-house troubleshooting on failed RAID equipment, be meticulous and get assistance to document each and every remedial step taken.
- Know that rebuilding a RAID reporting a corrupted file system will not repair it, but further corrupt your data, possibly making it unrecoverable.
- As with all data storage, back up regularly!
- Do not think of a RAID-5 as a data recovery solution (it's really only a scheme to increase data availability).
- Never continue remedial actions after it becomes known that more than one disk drive has a fault; if this is the case you must seek professional data recovery help (call MicroCom right away).
- Do not EVER attempt to rebuild a RAID unless ALL of the hard disk drive members are present, accounted for, confirmed to be fully functional individually, connected in the correct sequential order, and that the RAID controller also performs its POST and any other test routines flawlessly.
- Never ignore a RAID-5 data storage subsystem fault warning, or any behavioral anomaly; if a drive has failed, first back up the data in the array, then replace the drive, and then rebuild. Even though your data remains completely available, you are running in a degraded mode: the fault tolerance is lost immediately when one drive malfunctions. Limit your RAID degradation or loss of fault tolerance to the shortest reasonable time.
- Don't remove more than one disk drive at a time from its mounted or physically installed position; this way you can't lose track of the drive connection sequencing.
- Do not allow the RAID controller to execute write operations on any known working drives thus overwriting striping data needed for RAID-5 reconstruction.
- CHKDSK or the like should not be used, and if so, only with extreme caution; successful uses with RAID-5 are very limited; do not guess.
- Don't try disk drives in different "slots" or shuffle the physical ordering of the array unless you are (and you probably cannot be) certain that the subsystem controller can recognize the changes; do not guess.
- Don't keep trying numerous "experiments" in efforts to bring inaccessible data back on line; call MicroCom sooner -- rather than later.
We are often presented with cases where our client incorrectly believes that their data is stored within a RAID-5 subsystem. When it comes to casual talk about RAID equipment in general, "RAID-5" is a very common expression. But, there may be less of them than people think! The number of jobs we receive said to be RAID-5's come to less than the number we actually work on... this happens because what the client thought to be a RAID-5 was actually a different configuration. Using RAID-5 remedies in an attempt to put right a RAID-10 tends strongly to create disastrous outcomes. A word to the wise IT service professional: never assume your client knows what RAID level is in use.
As every IT service professional knows, there are numerous actions that can be taken when working with valuable disk drive data that cannot be undone. Know also that expert consultation at MicroCom is always free -- call us and increase the available brain power you can apply to your problem or emergency situation.
Author: S.E. Fowler / Steve Fowler
Need Expert Answers?
Call Now for RAID Recovery Service you can rely on . . .