RAID data rescue - RAID0, RAID1, RAID5, RAID6
Every day, Attingo data rescue reconstructs data from a variety of RAID systems. Continuous research and development, extensive knowledge of the
function of RAID controllers by means of reverse engineering as well as more than 20 years of experience make Attingo data rescue one of the
leading providers of RAID reconstruction services in Europe.
Data reconstruction of defective SAN systems
We reconstruct data from all types of RAID systems, irrespective of the number of storage media, capacity or manufacturer.
Our specialists will assist you
- via telephone at +43 1 236 01 01, +49 40 54 88 75 60 or +31 252 621 625
- via e-mail to firstname.lastname@example.org
- after you sent us a non-binding diagnosis request online
data rescue and data recovery of RAID manufacturer
Common causes for data loss in RAID systems
Recovery of data following the failure of several storage media in a RAID systemDepending on the utilised RAID level, a RAID system can handle the failure of one or more storage media. For instance, a single storage medium can fail in a RAID5 system; in the case of a RAID6 system, even two can fail. The RAID system is then in the condition “CRITICAL” or “DEGRADED”. Normal access should be possible to the data, however the defective storage medium should be replaced as soon as possible. Often the monitoring via e-mail or SNMP is configured incorrectly or not at all. The failure of a storage medium in the RAID system is then only noticed when it is too late and accessing the data is no longer possible. Attingo data rescue can still reconstruct data from RAID systems when more than the maximum permitted number of storage media is defective or contains invalid data.
Data rescue after fatal RAID rebuildsMost commonly, data loss occurs in RAID systems during the so-called rebuild. When a storage medium fails and it is replaced, the RAID controller calculates the content of the new, replaced storage medium, based on the data of the remaining storage media. For this, the RAID controller must read out all sectors of all hard disks in the RAID system. The probability that an additional hard disk for instance has defective sectors is very high. As soon as the RAID controller recognises the errors, these often dismount the seemingly additional defective storage medium. The RAID system is offline and the data is no longer available. Recovery of the data is required.
RAID controller and RAID firmwareRAID controllers function as small computers and control dozens of megabytes of software in order to facilitate their functionality. Software is susceptible to errors. Bugs in the firmware of RAID controllers are a common cause. Due to years of experience and reverse engineering of practically all controllers available on the market, Attingo has come to the realisation that even expensive brand equipment features faulty software. A common reason for this are firmware updates. With many firmware updates it is recommended to first perform a backup, however only very rarely is the reason for this explained. Sometimes updates cause the RAID system to use a different algorithm for the data storage. As a result of this, all data could be lost. When no data backup has been performed or this message was ignored, data is irrevocably lost.
RAID manufacturer supportSupport call centres are often used by large manufacturers. Their support staff have no technical training, but instead rely upon a catalogue of prepared questions and responses. When the employee believes to have recognized a question of the customer, he simply reads off the answer provided. This often leads to fatal actions: The support department of a large manufacturer for instance routinely makes the recommendation to customers that are experiencing difficulties with RAID systems to simply delete the RAID configuration and create it again. Afterwards, the RAID system should work again. Usually, this is also correct. However, all data will be lost.
RAID system is not availableAlthough hardware damage is often the cause for failures in RAID systems, software damage occurs as well. Especially due to the fact that several virtual servers are run simultaneously on RAID systems, the chance of failure and the damage due to failure is often greater than expected.
Resizing – modifying the RAID levelAn additional hazard for the failure of a RAID system is posed by the so-called resizing. This entails that the RAID level is modified, for instance it is reconfigured from RAID5 to RAID6 or the RAID is expanded by installing additional hard disks in the system in order to obtain a greater storage capacity. These processes are extremely sensitive and problems resulting from these actions are quite common. Even the smallest error in the software or hardware can result in the RAID system becoming non-functional. For this reason, we advise that such processes are only performed with a fully verified backup.
Failed RAID – rebuildAn advantage of RAID systems is that storage media can fail without causing data loss. In RAID5 systems, a single hard disk can fail, in RAID6 systems two hard disks can fail. Again and again, the failure of a single or two hard disks goes unnoticed, either because it is ignored or the monitoring of the RAID system has not been set up correctly. In the majority of the cases we encounter, the RAID systems continue to run in the so-called degraded mode for hours or even days, leading to massive damage of the defective storage medium. Action is only taken, when an additional storage medium fails and the RAID system comes to a standstill. As a rule, the defective storage medium should be replaced when a hard disk has failed. However, this is where the next source of errors hides. In order to perform a rebuild and in order to be able to calculate the newly implemented storage medium, the RAID system must access all other hard disks and furthermore all sectors located on those disks. When even the smallest read error occurs, this process is aborted and the faulty disk is also dismounted. As a result, now the RAID cannot be accessed at all.
Failure reason: Series faultOften hard disks of the same series are installed in RAID systems. For this reason, they are particularly susceptible to series faults. When a hard disk has a fault, it will not take very long until an additional disk will develop the same or a similar issue. For this reason, the probability is very high that an error will occur during a rebuild and the RAID system will no longer function. When a RAID system is running in the so-called degraded mode – i.e. when disks have already failed – and additional disks fail, the RAID controller will dismount these and the RAID system will go offline.
Data loss and insufficient knowledgeAn additional cause for a failure of the RAID system that should not be ignored is the “tinkering” with the settings of the controller and the firmware by the support or the internal/external IT department, without having the proper expert knowledge. In the worst case, incorrect configurations could lead to a physical loss of data, for instance when individual stripes are overwritten. As a result, it would no longer be possible to fully recover the data. For this reason, Attingo strongly recommends not to perform “experiments”, but instead to consult a specialist immediately, who handles this subject matter on a daily basis.
Additional reasonsWhen a RAID system is relocated or it is turned off or on after a longer period, it is possible that the server will no longer start up. A reason for this could be series faults or firmware errors of the hard disks that have been installed.
Additionally, file system checks (chdsk, fsck) or similar verification and reconstruction programs that are performed automatically at the restart of the server and are not aborted soon enough, can lead to fatal damage. This for instance occurs in the case of swapped storage media and is irrespective of the RAID level.
Overview of the most common RAID level
RAID0 systems do not operate redundantly; no data is mirrored. RAID0 systems offer more storage space as well as quicker access and transfer speeds. When a storage medium is defective in a RAID0 system, then the entire system is no longer available. Most RAID0 algorithms distribute the data alternatively in so-called stripes on all storage media in the system. An additional variant are spanned volumes or JBOD, in which the data is written sequentially on the storage media in the system. When data loss occurs in a RAID0 system, then the data can be recovered irrespective of the error cause.
RAID1 performs a mirroring of the data. This entails that the same data is present on each hard disk, irrespective of the fact of how many hard disks have been installed. As a result, data is not immediately lost in the case of a defect of a hard disk. However, the failure of a hard disk often goes unnoticed and instead is only discovered when the RAID systems no longer works.
The advantage of RAID5 is higher access speeds and the redundancy of a hard disk. This means that a storage medium can fail completely without data being lost and the system continues to operate (degraded mode). This is enabled by means of a parity calculation via XOR, however the type of data storage is not standardized. By means of research, Attingo has reverse-engineered practically all controllers and can also simulate these virtually.
In its basic principles, RAID6 is very similar to RAID5 with the main difference that up to two hard disks can fail safely. The controller has much higher hardware requirements, as here the additional parity is calculated by means of highly complex mathematical functions, as a result of which a data rescue is also extremely complex. However, again Attingo has reverse-engineered practically all controllers and can simulate these virtually.
SAN - Storage Area Network or iSCSI StoragesSANs are generally RAID systems, which are connected via iSCSI to servers or workstations. The most common error causes are usually, as described above, bugs in the iSCSI software.
Your next steps
Assistance in the case of data lossIn most cases, data loss leads to a state of emergency. Please contact us! Our job is to reduce the damage you experience resulting from data loss to a minimum.
Our specialists will assist you
Attingo advantage 1
Data rescue in High Priority 24/7In the case of data loss, you can reach us from Mon-Sun, 0-24 hours. Qualified engineers will handle your data rescue case around the clock.
24/7 data rescue