アレイから2つのディスクを使用してRAID5をリカバリする

Question

データを失いたい場合はこれが最善の方法であることがわかっているので、試行錯誤しながらこれにアプローチしたくありません。

Ubuntu14.04のRAID5に4 * 2TBのディスクを搭載したサーバーがあります（そうです賢明ではありません）。

私のデータのほとんどはRAID5の/homeにあり、/はRAID1にあります。

サーバーをレスキューモードで起動しましたが、わかりません：

問題がソフトまたはハードウェアの場合、
これらのデータを回復するためにレイドを再マウントする方法がある場合。

失敗したソフトウェアRAID（raid.wiki.kernel.org）の回復を注意深く読んだのですが、診断に自信がないので、注意深く判断したいと思います。何が起こっているのか、何かすることがあればどうすればいいのか…

私が試した唯一のシンは、マウントされていないmdsデバイスをマウントすることでした。これはmd2 mount /dev/md2 /mnt/で機能しましたが、/dev/md3: can't read superblockと言われているように、md0もmd3もマウントできませんでした。

これまでのところ、これは私がチェックしたものです：

EDIT parted -l

root@rescue:/mnt# parted -l Model: ATA ST2000DM001-1CH1 (scsi) Disk /dev/sda: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 20.5kB 1049kB 1029kB primary bios_grub 2 2097kB 10.5GB 10.5GB ext4 primary raid 3 10.5GB 2000GB 1989GB primary raid 4 2000GB 2000GB 536MB linux-swap(v1) primary Model: ATA ST2000DM001-1CH1 (scsi) Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 20.5kB 1049kB 1029kB primary bios_grub 2 2097kB 10.5GB 10.5GB ext4 primary raid 3 10.5GB 2000GB 1989GB primary raid 4 2000GB 2000GB 536MB linux-swap(v1) primary Error: /dev/sdc: unrecognised disk label Model: ATA ST2000DM001-1CH1 (scsi) Disk /dev/sdc: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: unknown Disk Flags: Model: ATA ST2000DM001-1CH1 (scsi) Disk /dev/sdd: 2000GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 20.5kB 1049kB 1029kB primary bios_grub 2 2097kB 10.5GB 10.5GB ext4 primary raid 3 10.5GB 2000GB 1989GB primary raid 4 2000GB 2000GB 536MB linux-swap(v1) primary Model: Linux Software RAID Array (md) Disk /dev/md2: 10.5GB Sector size (logical/physical): 512B/4096B Partition Table: loop Disk Flags: Number Start End Size File system Flags 1 0.00B 10.5GB 10.5GB ext4 Error: /dev/md127: unrecognised disk label Model: Linux Software RAID Array (md) Disk /dev/md127: 10.5GB Sector size (logical/physical): 512B/4096B Partition Table: unknown Disk Flags:

smartctl

root@rescue:~# smartctl -a -d ata /dev/sdc smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed: Input/output error A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. root@rescue:~# smartctl -a -d ata /dev/sdd smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.10.23-xxxx-std-ipv6-64-rescue] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST2000DM001-1CH164 Serial Number: W1E1KX59 LU WWN Device Id: 5 000c50 05c821593 Firmware Version: CC43 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Tue Dec 30 16:04:49 2014 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: ( 584) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 230) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 120551532 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 4008 7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 4351310995 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18725 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 32 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 089 089 000 Old_age Always - 11 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 056 045 Old_age Always - 32 (Min/Max 26/35) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 31 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 46 194 Temperature_Celsius 0x0022 032 044 000 Old_age Always - 32 (0 16 0 0) 197 Current_Pending_Sector 0x0012 082 082 000 Old_age Always - 3056 198 Offline_Uncorrectable 0x0010 082 082 000 Old_age Offline - 3056 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 18708h+109m+27.415s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 24242600022 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 112149279703 SMART Error Log Version: 1 ATA Error Count: 11 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 11 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 09:32:13.900 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 09:32:13.898 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 09:32:13.898 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 09:32:13.898 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 09:32:13.898 SET FEATURES [Set transfer mode] Error 10 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 ff ff ff 4f 00 09:32:10.764 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 09:32:10.763 READ FPDMA QUEUED 60 00 38 ff ff ff 4f 00 09:32:10.763 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 09:32:10.763 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 09:32:10.763 READ NATIVE MAX ADDRESS EXT Error 9 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 38 ff ff ff 4f 00 09:32:09.084 READ FPDMA QUEUED 61 00 08 00 88 38 41 00 09:32:07.445 WRITE FPDMA QUEUED 60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 09:32:07.416 READ FPDMA QUEUED Error 8 occurred at disk power-on lifetime: 18520 hours (771 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 61 00 40 ff ff ff 4f 00 09:32:04.118 WRITE FPDMA QUEUED 61 00 08 70 88 38 41 00 09:32:04.117 WRITE FPDMA QUEUED 61 00 40 ff ff ff 4f 00 09:32:04.117 WRITE FPDMA QUEUED 60 00 40 ff ff ff 4f 00 09:32:04.117 READ FPDMA QUEUED 60 00 08 ff ff ff 4f 00 09:32:04.117 READ FPDMA QUEUED Error 7 occurred at disk power-on lifetime: 17319 hours (721 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 50 a9 59 02 Error: UNC at LBA = 0x0259a950 = 39430480 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 00 00 aa 59 42 00 01:31:02.054 READ FPDMA QUEUED 60 00 00 00 a6 59 42 00 01:31:02.054 READ FPDMA QUEUED 60 00 00 00 92 36 42 00 01:30:55.032 READ FPDMA QUEUED 60 00 00 00 86 36 42 00 01:30:51.600 READ FPDMA QUEUED 60 00 00 00 82 36 42 00 01:30:51.593 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 18706 3848494344 # 2 Short offline Completed without error 00% 3481 - # 3 Short offline Completed without error 00% 3472 - # 4 Short offline Completed without error 00% 3472 - # 5 Short offline Completed without error 00% 13 - # 6 Short offline Completed without error 00% 5 - # 7 Short offline Completed without error 00% 5 - # 8 Short offline Completed without error 00% 0 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

ls/dev

ls /dev MAKEDEV md0 ptya0 ptyc5 ptyea ptyqf ptyt4 ptyv9 ptyxe ram11 sg1 tty34 ttyS1 ttyc2 ttye7 ttyqc ttyt1 ttyv6 ttyxb urandom aer_inject md127 ptya1 ptyc6 ptyeb ptyr0 ptyt5 ptyva ptyxf ram12 sg2 tty35 ttyS2 ttyc3 ttye8 ttyqd ttyt2 ttyv7 ttyxc vcs autofs md2 ptya2 ptyc7 ptyec ptyr1 ptyt6 ptyvb ptyy0 ram13 sg3 tty36 ttyS3 ttyc4 ttye9 ttyqe ttyt3 ttyv8 ttyxd vcs1 block md3 ptya3 ptyc8 ptyed ptyr2 ptyt7 ptyvc ptyy1 ram14 shm tty37 ttya0 ttyc5 ttyea ttyqf ttyt4 ttyv9 ttyxe vcs2 […]

cat/proc/mdstat

cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md127 : active raid1 sdc2[2] 10238912 blocks [4/1] [__U_] md2 : active raid1 sdd2[3] sda2[0] sdb2[1] 10238912 blocks [4/3] [UU_U]

mdadm --detail

root@rescue:~# mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Tue Sep 2 16:46:34 2014 Raid Level : raid1 Array Size : 10238912 (9.76 GiB 10.48 GB) Used Dev Size : 10238912 (9.76 GiB 10.48 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Sat Dec 27 17:31:03 2014 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Events : 0.503145 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 2 0 0 2 removed 3 8 50 3 active sync /dev/sdd2 root@rescue:~# mdadm --detail /dev/md127 /dev/md127: Version : 0.90 Creation Time : Tue Sep 2 16:46:34 2014 Raid Level : raid1 Array Size : 10238912 (9.76 GiB 10.48 GB) Used Dev Size : 10238912 (9.76 GiB 10.48 GB) Raid Devices : 4 Total Devices : 1 Preferred Minor : 127 Persistence : Superblock is persistent Update Time : Sat Dec 27 17:31:16 2014 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 removed 1 0 0 1 removed 2 8 34 2 active sync /dev/sdc2 3 0 0 3 removed

そして最後にmdadm--examine sd *

root@rescue:~# mdadm --examine /dev/sd* /dev/sda: MBR Magic : aa55 Partition[0] : 3907029167 sectors at 1 (type ee) mdadm: No md superblock detected on /dev/sda1. /dev/sda2: Magic : a92b4efc Version : 0.90.00 UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:34 2014 Raid Level : raid1 Used Dev Size : 10238912 (9.76 GiB 10.48 GB) Array Size : 10238912 (9.76 GiB 10.48 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 2 Update Time : Sat Dec 27 18:20:56 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 78eed7b8 - correct Events : 503147 Number Major Minor RaidDevice State this 0 8 2 0 active sync /dev/sda2 0 0 8 2 0 active sync /dev/sda2 1 1 8 18 1 active sync /dev/sdb2 2 2 0 0 2 faulty removed 3 3 8 50 3 active sync /dev/sdd2 /dev/sda3: Magic : a92b4efc Version : 0.90.00 UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:35 2014 Raid Level : raid5 Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB) Array Size : 5828236800 (5558.24 GiB 5968.11 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 3 Update Time : Mon Dec 22 10:33:05 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 2 Spare Devices : 0 Checksum : 4d44c428 - correct Events : 109608 Layout : left-symmetric Chunk Size : 512K Number Major Minor RaidDevice State this 0 8 3 0 active sync /dev/sda3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed mdadm: No md superblock detected on /dev/sda4. /dev/sdb: MBR Magic : aa55 Partition[0] : 3907029167 sectors at 1 (type ee) mdadm: No md superblock detected on /dev/sdb1. /dev/sdb2: Magic : a92b4efc Version : 0.90.00 UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:34 2014 Raid Level : raid1 Used Dev Size : 10238912 (9.76 GiB 10.48 GB) Array Size : 10238912 (9.76 GiB 10.48 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 2 Update Time : Sat Dec 27 18:20:56 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 78eed7ca - correct Events : 503147 Number Major Minor RaidDevice State this 1 8 18 1 active sync /dev/sdb2 0 0 8 2 0 active sync /dev/sda2 1 1 8 18 1 active sync /dev/sdb2 2 2 0 0 2 faulty removed 3 3 8 50 3 active sync /dev/sdd2 /dev/sdb3: Magic : a92b4efc Version : 0.90.00 UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:35 2014 Raid Level : raid5 Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB) Array Size : 5828236800 (5558.24 GiB 5968.11 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 3 Update Time : Mon Dec 22 10:33:05 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 2 Spare Devices : 0 Checksum : 4d44c43a - correct Events : 109608 Layout : left-symmetric Chunk Size : 512K Number Major Minor RaidDevice State this 1 8 19 1 active sync /dev/sdb3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed mdadm: No md superblock detected on /dev/sdb4. mdadm: No md superblock detected on /dev/sdc. mdadm: No md superblock detected on /dev/sdc1. mdadm: No md superblock detected on /dev/sdc2. mdadm: No md superblock detected on /dev/sdc3. mdadm: No md superblock detected on /dev/sdc4. /dev/sdd: MBR Magic : aa55 Partition[0] : 3907029167 sectors at 1 (type ee) mdadm: No md superblock detected on /dev/sdd1. /dev/sdd2: Magic : a92b4efc Version : 0.90.00 UUID : 5a33c710:006f668d:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:34 2014 Raid Level : raid1 Used Dev Size : 10238912 (9.76 GiB 10.48 GB) Array Size : 10238912 (9.76 GiB 10.48 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 2 Update Time : Sat Dec 27 18:20:56 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 78eed7ee - correct Events : 503147 Number Major Minor RaidDevice State this 3 8 50 3 active sync /dev/sdd2 0 0 8 2 0 active sync /dev/sda2 1 1 8 18 1 active sync /dev/sdb2 2 2 0 0 2 faulty removed 3 3 8 50 3 active sync /dev/sdd2 /dev/sdd3: Magic : a92b4efc Version : 0.90.00 UUID : 4a417350:7192f812:a4d2adc2:26fd5302 (local to Host rescue.ovh.net) Creation Time : Tue Sep 2 16:46:35 2014 Raid Level : raid5 Used Dev Size : 1942745600 (1852.75 GiB 1989.37 GB) Array Size : 5828236800 (5558.24 GiB 5968.11 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 3 Update Time : Mon Dec 22 01:55:55 2014 State : active Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 4d429eeb - correct Events : 109599 Layout : left-symmetric Chunk Size : 512K Number Major Minor RaidDevice State this 3 8 51 3 active sync /dev/sdd3 0 0 8 3 0 active sync /dev/sda3 1 1 8 19 1 active sync /dev/sdb3 2 2 0 0 2 faulty removed 3 3 8 51 3 active sync /dev/sdd3 mdadm: No md superblock detected on /dev/sdd4.

編集

最後の試みで、/ dev/md2をアンマウントし、mdadmで停止しました。

次に、/ dev/md3をアセンブルできました。

mdadm --assemble --force /dev/md3 /dev/sd[abd]3 mdadm: forcing event count in /dev/sdd3(3) from 109599 upto 109608 mdadm: clearing FAULTY flag for device 2 in /dev/md3 for /dev/sdd3 mdadm: Marking array /dev/md3 as 'clean' mdadm: /dev/md3 has been started with 3 drives (out of 4).

当時のSyslog：

md/raid:md3: device sda3 operational as raid disk 0 md/raid:md3: device sdd3 operational as raid disk 3 md/raid:md3: device sdb3 operational as raid disk 1 md/raid:md3: allocated 4338kB md/raid:md3: raid level 5 active with 3 out of 4 devices, algorithm 2 RAID conf printout: --- level:5 rd:4 wd:3 disk 0, o:1, dev:sda3 disk 1, o:1, dev:sdb3 disk 3, o:1, dev:sdd3 md3: detected capacity change from 0 to 5968114483200 md3: unknown partition table

そして、RAIDは問題ないようでした：

root@rescue:/mnt# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md3 : active raid5 sda3[0] sdd3[3] sdb3[1] 5828236800 blocks level 5, 512k chunk, algorithm 2 [4/3] [UU_U] […]

しかし、私はそれをマウントすることができませんでした：

root@rescue:/mnt# mount /dev/md3 /mnt/home mount: wrong fs type, bad option, bad superblock on /dev/md3, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so

その間、syslogに多くのエラーが表示されました。

ata4.00: exception Emask 0x0 SAct 0xfe SErr 0x0 action 0x0 ata4.00: irq_stat 0x40000008 ata4.00: failed command: READ FPDMA QUEUED ata4.00: cmd 60/18:08:18:34:63/00:00:e5:00:00/40 tag 1 ncq 12288 in res 41/40:18:18:34:63/00:00:e5:00:00/00 Emask 0x409 (media error) <F> ata4.00: status: { DRDY ERR } ata4.00: error: { UNC } ata4.00: configured for UDMA/133 sd 3:0:0:0: [sdd] Unhandled sense code sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 3:0:0:0: [sdd] Sense Key : Medium Error [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 e5 63 34 18 sd 3:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocate failed sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 e5 63 34 18 00 00 18 00 blk_update_request: 46 callbacks suppressed end_request: I/O error, dev sdd, sector 3848483864 md/raid:md3: read error not correctable (sector 3828001816 on sdd3). md/raid:md3: Disk failure on sdd3, disabling device.

hdparmで修正しようとしましたが、数が多すぎて、毎回新しいものがたくさん表示されます。

明らかに、syslog md/raid:md3: Disk failure on sdd3, disabling device.で言及されているように、同時にmd3をマウントしようとすると、アレイの状態がFAILEDに変わりました。

この戦いに負けたようです…

Henrik Carlqvist · Accepted Answer

私が物事を正しく理解している場合、あなたの/ dev/md3はraid5であり、/ dev/sda3、/ dev/sdb3、/ dev/sdc3（現在は存在しません）および/ dev/sdd3で構成されているはずです。

では、mdadm --detail/dev/md3から何が得られますか？

/ dev/sdcにまだ壊れたパーティションテーブルがあるように見えるのはなぜですか？

/ dev/sdcからのパーティションがまだ欠落している場合でも、データを復元できる可能性がありますが、次のレスキュー操作を試してみます。

1）ライブCDを使用して起動し、パーティションやRAIDディスクをマウントしないでください。

2）すべてのディスクのRAWイメージコピーを作成します。はい、8つ以上の空き容量が必要になりますTBどこかに、おそらくネットワークドライブ上にあります。ディスクに物理的に問題がない場合は、ddを使用して作成できます。一部のディスクが物理的に破損している場合は、代わりにddrescueプログラムを使用する必要がある場合があります。

3）元のRAW画像コピーの作業用RAW画像コピーを作成します。はい、そのためにさらに8 TBの空き容量が必要になります。

4）qemuやvirtualboxなどの仮想マシンを使用します。データレスキューに適した優れたライブCDを使用して仮想マシンを起動することから始めます。 Systemrescuecdは良い選択かもしれません。

5）仮想マシン内から、作業中のrawディスクイメージコピーを使用して、作業中のrawディスクイメージコピーを修正してみます。開始する1つの場所は、/ dev/sdcの作業中のrawディスクイメージコピーにパーティションテーブルを追加することです。/dev/sdcのパーティションテーブルは、おそらく/ dev/sddのパーティションテーブルと同じように見えるはずです。

6）問題が修正されたと思われる場合は、作業ディスクイメージのコピーから仮想マシンを起動します。

7）仮想マシンがディスクイメージファイルが修正されたことを証明したら、修正されたイメージを物理ディスクにコピーして戻します。一部の物理ディスクが壊れている場合は、最初にそれを交換することをお勧めします。

ある段階で、壊れたレイドを修正しようとしても状況が悪化しただけであることがわかった場合は、作業中のrawディスクイメージを元のrawディスクイメージで上書きして再起動します。