Synology SHR1 Data Recovery After Two Drives Failed

Two drives have failed in your Synology SHR-1 array. DSM reports the storage pool as Crashed and states that data cannot be recovered. SHR-1 data recovery after a double drive failure is the subject of this article — and the situation is more nuanced than DSM’s message suggests. Whether your data is recoverable depends not on the count of failed drives alone, but on how they failed and what state the surviving drives are in.

Synology SHR1 Data Recovery After Two Drives Failed

Find Your Situation First

Before reading further, use this table to locate your specific scenario and skip directly to the relevant section:

What you observe Likely situation Recovery outlook Go to
Both drives appear in OS, superblocks readable Logical failure — drives physically intact Possible Level 1 → Level 2
One drive readable, one not detected One logical + one physical failure Partial Level 2 → Level 3
Second drive failed during rebuild after first Classic double failure — partial data loss likely Partial Level 2
One or both drives click, grind, not detected Physical failure — heads or platters Lab required Level 3

Why Two Failures Break SHR-1 Mathematically

SHR-1 uses a single parity set — conceptually equivalent to RAID 5 in terms of fault tolerance. For each stripe across the array, one drive holds the XOR parity of the others. When one drive fails, any missing data block can be recalculated: take the parity block and XOR it against all surviving data blocks, and the missing value is recovered. This works because there is one unknown and one equation.

With two drives failed, there are two unknowns per stripe and still only one parity equation. The math has no solution. No software, regardless of how it is marketed, can reconstruct two independently unknown values from a single XOR relationship. This is not a limitation of any particular tool — it is a property of the parity scheme itself.

What software can do is recover data from stripes where only one of the two failed drives contributed. If your files happen to be stored on stripes that do not touch both failed drives, they are recoverable. Files that span stripes involving both failed members are not. The proportion of recoverable data depends on the size and distribution of the failed drives within the array — which is why partial recovery is often possible even when full recovery is not.

📌 SHR-2 note

SHR-2 stores two independent parity sets per stripe — analogous to RAID 6 — and tolerates any two simultaneous drive failures without data loss. However, SHR-2 is not immune to this scenario: three simultaneous failures break its parity math in the same way two failures break SHR-1. If you are running SHR-2 and have lost three drives, the same logic below applies. For a comparison of SHR-1 and SHR-2 fault tolerance in practice, see our article on SHR/SHR-2 recovery after Synology hardware failure.

How the Two Drives Failed Matters

Not all double failures are equivalent. The sequence and nature of the failures determine what recovery is possible:

Simultaneous failure — power event or controller

Both drives stopped responding at the same moment. The data on each drive is likely physically intact — the array simply lost quorum. This is the most recoverable version of a double failure because neither drive’s data was partially overwritten.

🔄

Second drive failed during rebuild

The first drive failed, the array ran degraded, a rebuild started — and the second drive failed partway through. New parity was being written to the replacement drive as the second original drive died. Stripes that were rebuilt before the second failure are intact; stripes that were being rebuilt at the time of the second failure are corrupted. Partial recovery is typically possible.

First drive failed long ago, unnoticed

The array ran in degraded mode — without redundancy — for an extended period, and then the second drive failed. If the first failure went undetected, there may have been additional writes to degraded stripes in the interim. Recovery chances depend entirely on the physical state of the second failed drive. See our article on recognizing early signs of hard drive failure for how to catch this earlier.

Assess the Physical State of Each Drive

Before any software attempt, determine whether the drives are physically readable. Power down the NAS cleanly if it is still running — do not force-reboot, and do not insert a replacement drive, which would trigger DSM to attempt another rebuild. Connect each drive individually to a recovery machine.

If any drive produces audible clicking, grinding, or repeated failed spin-up attempts — power it off immediately and do not attempt further reads. Each spin-up cycle on a drive with damaged heads increases platter damage. Go directly to Level 3.

For drives that power up silently, run mdadm --examine on each device partition:

Check superblock state on each drive
	mdadm --examine /dev/sdb3
	          Magic : a92b4efc
	        Version : 1.2
	    Feature Map : 0x0
	     Array UUID : 4b2f8e1a:7c3d9f02:1a4b8c3d:9e2f7b01
	           Name : DiskStation:2
	  Creation Time : Fri Mar 10 14:22:31 2023
	     Raid Level : raid5
	   Raid Devices : 3
	 Avail Dev Size : 7813959680
	     Array Size : 15627862016
	  Used Dev Size : 7813931008
	    Data Offset : 2048 sectors
	   Super Offset : 8 sectors
	   Unused Space : before=1968 sectors, after=0 sectors
	          State : clean, degraded
	
	    Device UUID : 9d3f1c2a:4e8b7f03:2c1d9e4b:7a3f8c01
	    Update Time : Mon Jun  2 09:14:22 2025
	  Bad Block Log : 512 entries available at offset 16 sectors
	       Checksum : 3f8a1b2c - correct
	         Events : 247

Look for three things: the Array UUID must match across all drives — if it does not, a drive is from a different array or its superblock is corrupt. The Events counter should be similar across members — a large divergence indicates one drive missed a significant number of write operations. The State field will show clean, degraded on a drive that was part of an array that lost a member.

Also open the built-in S.M.A.R.T. monitor in RS RAID Retrieve at this point. Reallocated Sector Count and Pending Sectors on a surviving drive matter: a drive with accumulating bad sectors during an already-degraded period may have produced silent read errors that affected data integrity before the second failure occurred. For background on interpreting S.M.A.R.T. data, see our article on hard drive failure signs.

Three Levels of Recovery

🖥️
Level 1 Manual assembly via Linux terminal
Difficulty:
Low

Only attempt this path if mdadm --examine returns valid superblocks with matching UUIDs on all drives. The terminal approach gives you direct control over every step but has hard limits: it requires all drives to be physically present and readable, it has no fallback when superblocks are damaged, and it cannot indicate in advance which files are corrupted. Image every drive with ddrescue before running any of the commands below — work exclusively from the images.

1

Step 1 — Image each drive before anything else

	# Install ddrescue if not present
	apt-get install -y gddrescue
	# Image each drive to a separate file — run for each member
	ddrescue -d -r3 /dev/sdb /mnt/images/sdb.img /mnt/images/sdb.log
	ddrescue -d -r3 /dev/sdc /mnt/images/sdc.img /mnt/images/sdc.log
	ddrescue -d -r3 /dev/sdd /mnt/images/sdd.img /mnt/images/sdd.log
        

The -r3 flag tells ddrescue to retry unreadable sectors three times before marking them as failed. The .log file allows the imaging to be resumed if interrupted. Do not skip this step — a drive with hidden bad sectors will degrade further under the sustained read load of array reconstruction.

2

Step 2 — Verify superblocks across all members

		# Run on each image — confirm UUID and Events match
		mdadm --examine /mnt/images/sdb.img
		mdadm --examine /mnt/images/sdc.img
		mdadm --examine /mnt/images/sdd.img
        

Confirm three things before proceeding. The Array UUID must be identical across all images — a mismatch means a drive is from a different array or its superblock is corrupt. The Events counter should be close across all members — a divergence of hundreds or thousands means one drive missed a significant sequence of writes. The Raid Devices count must match the expected number of members.

3

Step 3 — Force assembly from image files

	# Tell the kernel to treat image files as block devices
	losetup /dev/loop0 /mnt/images/sdb.img
	losetup /dev/loop1 /mnt/images/sdc.img
	losetup /dev/loop2 /mnt/images/sdd.img
	# Force assembly — specify the data partitions, not the whole device
	mdadm --assemble --force /dev/md0 /dev/loop0p3 /dev/loop1p3 /dev/loop2p3
	# Check assembly result
	cat /proc/mdstat
	mdadm --detail /dev/md0
        

--force assembles the array even without sufficient members for a consistent state. The resulting array is degraded. Stripes that involved both failed drives contain incorrect parity-reconstructed data — files on those stripes will be corrupt or zero-byte when copied. Files on stripes that touched only one failed drive are intact and readable. There is no way to know from the outside which files fall into which category before attempting to copy them.

4

Step 4 — Activate LVM and mount read-only

	# Activate LVM volume group on the assembled array
	pvscan
	vgchange -ay
	# List logical volumes to find the correct device path
	lvs
	# Mount read-only — never mount rw on a damaged array
	mount /dev/vg1/volume_1 /mnt/data -o ro
        

Always mount with -o ro. Writing to a forced-assembled array in this state will extend corruption into previously intact stripes. Once mounted, verify directory structure and file sizes before starting any copy operation. Corrupted files surface as read errors, I/O errors, or zero-byte output during copy — there is no prior indicator.

5

Step 5 — Copy with error handling

	# Use rsync with error logging — skips unreadable files rather than stopping
	rsync -av --ignore-errors /mnt/data/ /mnt/destination/ 2>&1 | tee /mnt/rsync.log
	# Review what was skipped
	grep -i "error\|failed\|cannot" /mnt/rsync.log
        

Standard cp stops on the first read error. rsync --ignore-errors logs the failure and continues to the next file, maximising the total amount of data recovered. Review the log afterward to identify which files could not be copied — these are the ones that landed on stripes involving both failed drives.

When to stop and move to Level 2: mdadm --examine shows mismatched UUIDs between drives; --force assembly fails or produces an inactive array in /proc/mdstat; LVM activation finds no volume group; the filesystem mounts but shows an empty or corrupted directory tree. Any of these means the superblock or LVM metadata is too damaged for manual recovery — RS RAID Retrieve’s RAID Constructor handles these cases.

🖥️
Level 2 RS RAID Retrieve — automatic detection and manual reconstruction
Difficulty:
Low

RS RAID Retrieve handles the full sequence — S.M.A.R.T. assessment, drive imaging, array reconstruction, LVM activation, and filesystem access — within a single application on Windows, Linux, or macOS. It covers two distinct reconstruction paths depending on the state of the mdadm superblocks.

1

S.M.A.R.T. assessment — before any read operation

Connect all drives to the recovery machine and open the built-in S.M.A.R.T. monitor before scanning. The attributes that matter most in this scenario are Reallocated Sector Count (ID 05), Current Pending Sector Count (ID C5), and Uncorrectable Sector Count (ID C6). Non-zero values on any of these — especially on drives that appear to be the “healthy” survivors — indicate sectors that were already failing during the degraded period before the second drive died. A surviving drive with elevated pending sectors is a drive that may produce read errors partway through a multi-hour reconstruction scan.

2

Drive imaging — protect originals from scan stress

For any drive with elevated S.M.A.R.T. values, use RS RAID Retrieve’s built-in imaging function to create a sector-level image before the reconstruction scan. The imager makes multiple passes over problematic sectors, logs unreadable areas with a sector map, and produces a complete image file that represents the best possible read of that drive. All subsequent steps — array reconstruction, LVM activation, filesystem scan — run against the static image file rather than the live drive. This prevents the drive from deteriorating further under the read load of a full array scan, which on a multi-terabyte SHR configuration can take several hours.

3

Automatic array reconstruction — when superblocks are intact

RS RAID Retrieve scans all connected drives and images for mdadm superblock signatures. When found, it reads the Array UUID, RAID level, member device roles, stripe block size, and event counters across all members. It then reconstructs the SHR volume topology — mdadm array → LVM Physical Volume → Volume Group → Logical Volume → Btrfs or ext4 filesystem — without requiring a valid quorum and without writing anything to the source drives. For a double failure where both drives are physically readable and superblocks are intact, this path typically requires no manual input.

The reconstruction operates in degraded mode: stripes where both failed drives contributed cannot be reconstructed from parity, and the corresponding files are flagged as inaccessible. Stripes where only one of the two failed drives contributed are reconstructed from the remaining member data and parity — those files are fully recoverable. The program marks inaccessible files in the directory tree before the copy step, so you know the recovery scope before committing to a destination.

RAID Constructor — when superblock metadata is damaged

If automatic detection produces no result — because superblocks are partially overwritten, corrupted by a firmware event, or missing due to a failed drive that was partially written during rebuild — switch to RAID Constructor. This mode allows manual specification of all array parameters, bypassing the superblock entirely.

First, identify the filesystem offset using the built-in HEX editor. Open each drive or image in the HEX editor and search for the ASCII marker LABLEONE. In SHR and SHR-2 configurations, this string marks the beginning of the volume data area. The sector immediately preceding the LABLEONE sector — which will be filled with zeros — is the offset sector. Note its sector number: this is the value to enter as the Offset parameter in RAID Constructor.

Enter the following parameters in RAID Constructor:

Parameter SHR-1 value SHR-2 value
RAID type RAID 5 RAID 6 / Left synchronous (P+Q)
Block size 64 KB 64 KB
Bytes per sector 512 512
Drive order Match original NAS bay sequence — bay 1 first
Offset Sector number from LABLEONE search (per drive)

If the original drive order is unknown, determine it by trial: add drives to the Selected Disks list in different sequences and observe the reconstructed filesystem tree after each change. A correct order produces a recognisable directory structure; an incorrect order produces garbage or an empty tree. The offset value must be set individually for each physical drive or image — it can differ between members if drives were of different capacities in the original SHR configuration.

If one of the failed drives cannot be connected at all — mechanically dead, not detected — use Add empty disk to insert a placeholder at the correct position in the drive list. RS RAID Retrieve treats the placeholder as a fully unreadable member: stripes where that drive contributed are reconstructed from parity using the remaining members (recovering data from those stripes), and stripes where the placeholder’s parity contribution is also needed are flagged as unrecoverable. This is the maximum possible recovery from a configuration with one physically inaccessible drive.

4

Filesystem scan and selective file recovery

Once the array is reconstructed — automatically or via RAID Constructor — RS RAID Retrieve scans the Btrfs or ext4 filesystem on the Logical Volume. The scan traverses the filesystem tree, identifies intact and damaged regions, and builds a complete directory listing. Files and folders whose data blocks intersect with unrecoverable stripes are marked before the copy step begins — no trial-and-error copy required to discover what is accessible.

Select the files and folders to recover, specify a destination on a separate drive with sufficient free space, and start the copy. Source drives and images are accessed read-only throughout the entire operation. For guidance on the LVM layer between the mdadm array and the filesystem, see our article on LVM structure and operation.

🖥️ 🏥
Level 3 Physical recovery lab — mechanically damaged drives
Difficulty:
Low

Software recovery — at any level — depends on the drive’s read/write heads being able to deliver sector data to the host controller. When the head stack is mechanically damaged, the voice coil actuator has seized, or the spindle bearing has failed, the drive cannot serve read requests regardless of the software configuration. A lab replaces the head stack in a cleanroom environment (ISO Class 5 or better), adjusts head alignment to the specific platter, and uses firmware-level tools to extract sector data from areas that the drive’s own electronics would refuse to read.

The output of a lab engagement is a set of sector-level image files — one per drive. These images contain the best possible read of each drive’s surface, including sectors recovered from physically damaged zones that produce I/O errors under normal conditions. Once received, these images are used directly in RS RAID Retrieve: the RAID Constructor reads the superblocks (or uses the LABLEONE offset method if superblocks are damaged), reconstructs the SHR array topology, and proceeds with filesystem scan and recovery exactly as described in Level 2.

The double-failure parity limitation applies to lab images the same way it applies to physically connected drives — stripes involving both failed members remain mathematically unrecoverable regardless of how cleanly the sector data was extracted. What the lab adds is access to sectors that software tools running on a live degraded drive would have missed due to read errors, which can meaningfully increase the proportion of recoverable files.

For a broader overview of when professional recovery is warranted and what the process involves, see our article on recovering data from failed hard drives.

Go directly to Level 3 if you observe any of these

  • Clicking, grinding, or buzzing sound from any drive on power-up
  • Drive not detected in BIOS/UEFI or in lsblk output at all
  • Drive detected but mdadm --examine returns I/O errors rather than superblock data
  • Drive surface temperature rises to abnormal levels within minutes of connection
  • S.M.A.R.T. Reallocated Sector Count (ID 05) in the hundreds, or Spin Retry Count (ID C0) increasing on each power cycle

Do not attempt software recovery on a mechanically failing drive. Each read pass adds wear to already-damaged head surfaces and platters. Power off and contact a lab before attempting any further access.

A double failure in SHR-1 sits at the boundary between software and physical recovery. Where exactly that boundary falls in your case depends on the state of the drives, not on the capabilities of the tools. Two physically intact drives with intact superblocks give software a real path to partial recovery. Two mechanically failed drives go straight to a lab. Most real-world double failures land somewhere between those extremes — which is why assessing drive state before choosing a recovery path matters more here than in any other scenario in this series.

Frequently Asked Questions

DSM's message is accurate in one specific sense: the array cannot be rebuilt to a fully consistent state through DSM's own storage manager. That is not the same as saying individual files are unrecoverable. DSM has no mechanism for partial RAID reconstruction or selective file extraction from a degraded volume — it treats recovery as all-or-nothing at the storage pool level. Software tools like RS RAID Retrieve operate at a lower level, working directly with the mdadm superblocks and LVM metadata to extract files from stripes that are not affected by both failures simultaneously. For a double failure where both drives are physically readable, meaningful partial recovery is often achievable even when DSM reports the situation as unrecoverable.
Image it first. A drive with bad sectors will produce read errors during the recovery scan, which can cause software to skip sectors, misalign stripe reconstruction, or stall entirely. ddrescue or RS RAID Retrieve's built-in imaging function will make multiple passes over problematic sectors, extracting as much data as possible, and produce a complete sector map showing which areas could not be read. The recovery scan then runs against the image — a static file — rather than the degrading physical drive, which protects against the drive deteriorating further during the hours-long scan operation. On a drive with bad sectors in an already-compromised RAID, this distinction matters.
Start fresh with new drives, and consider upgrading to SHR-2 if your NAS model supports it and you have four or more bays. The surviving drives in a double-failure scenario have been under sustained stress — running degraded with zero redundancy for some period, then subjected to the read load of recovery. Their S.M.A.R.T. data will tell you their actual condition, but even drives that pass S.M.A.R.T. checks have accumulated wear. Rebuilding a new array on the same hardware that just experienced a catastrophic failure is a risk that is difficult to justify when drives are the cheapest component in the system. For guidance on choosing the right RAID level for your use case, see our article on best RAID configurations for NAS.
Yes — this is the standard workflow when physical and logical recovery are combined. The lab produces sector-level images of each drive, delivering them as image files. RS RAID Retrieve can then work from these images exactly as it would from physically connected drives: it reads the mdadm superblocks from the images, reconstructs the SHR array configuration in RAID Constructor if needed, activates the LVM volume group, and presents the Btrfs or ext4 filesystem for recovery. The double-failure parity limitation still applies — stripes involving both failed drives are still unrecoverable — but everything else is accessible. The advantage of lab images is that they often recover sectors that were unreadable due to surface damage, which can meaningfully increase the proportion of recoverable files compared to working directly with a damaged drive.

Comments are closed.

Related Posts

How to recover data from NAS OpenMediaVault (OMV)?
How to recover data from NAS OpenMediaVault (OMV)?
OpenMediaVault (OMV) is a specialized operating system for independently assembled NAS storages. It is based on Debian Linux, one of the popular operating systems, and provides software for creating data storage based on various hard drive arrays. So, how can … Continue reading
How to Recover Data from a RAID-Z Array in TrueNAS
How to Recover Data from a RAID-Z Array in TrueNAS
A specialized operating system TrueNAS is one of the best solutions to manage DIY NAS storage systems. However, it also has some downsides – for example, the data recovery process from a TrueNAS storage can be really complicated. Also, some … Continue reading
DIY NAS or building NAS with an old computer
DIY NAS or building NAS with an old computer
One of the most pressing issues of today in the IT field is data storage, which involves information security and multi-user access. To solve this issue, there are SAN and NAS systems. What is a NAS, what purpose does it … Continue reading
How to create RAID 10 in Windows 10?
How to create RAID 10 in Windows 10?
If you care even a little bit about the safety of your data, then you have heard about RAID arrays. In this article, we’ll take a look at how to create a RAID 10 array in Windows 10.
Online Chat with Recovery Software