web-dev-qa-db-ja.com

不良ブロックの修正

取得後

WARNING: Your hard drive is failing
Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors

走る

$ Sudo smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.26.2.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingDian S200 60GB
Serial Number:    2017022100551
LU WWN Device Id: 0 000000 000000000
Firmware Version: P0707F1
User Capacity:    60,022,480,896 bytes [60.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA >3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct  3 10:56:08 2017 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       3
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       4486
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       13
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       98
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       9724
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       9
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1500
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       9602
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       3
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       13
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       28
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       3994818
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       2414
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       1
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       98
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       36124
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       10259
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       9799

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4486         -

Selective Self-tests/Logging not supported

詳細なsmartctl出力は次のとおりです。

$ Sudo smartctl -x /dev/sdb
smartctl 6.2 2017-02-27 r4394 [x86_64-linux-3.10.0-514.26.2.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KingDian S200 60GB
Serial Number:    2017022100551
LU WWN Device Id: 0 000000 000000000
Firmware Version: P0707F1
User Capacity:    60,022,480,896 bytes [60.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA >3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct  3 15:49:27 2017 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                    entering power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   050    -    0
  5 Reallocated_Sector_Ct   -O--CK   100   100   050    -    3
  9 Power_On_Hours          -O--CK   100   100   050    -    4491
 12 Power_Cycle_Count       -O--CK   100   100   050    -    13
160 Unknown_Attribute       -O--CK   100   100   050    -    1
161 Unknown_Attribute       PO--CK   100   100   050    -    98
163 Unknown_Attribute       -O--CK   100   100   050    -    0
164 Unknown_Attribute       -O--CK   100   100   050    -    10068
165 Unknown_Attribute       -O--CK   100   100   050    -    9
166 Unknown_Attribute       -O--CK   100   100   050    -    1
167 Unknown_Attribute       -O--CK   100   100   050    -    5
168 Unknown_Attribute       -O--CK   100   100   050    -    1500
169 Unknown_Attribute       -O--CK   100   100   050    -    100
175 Program_Fail_Count_Chip -O--CK   100   100   050    -    0
176 Erase_Fail_Count_Chip   -O--CK   100   100   050    -    0
177 Wear_Leveling_Count     -O--CK   100   100   050    -    9687
178 Used_Rsvd_Blk_Cnt_Chip  -O--CK   100   100   050    -    3
181 Program_Fail_Cnt_Total  -O--CK   100   100   050    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   050    -    0
192 Power-Off_Retract_Count -O--CK   100   100   050    -    13
194 Temperature_Celsius     -O---K   100   100   050    -    28
195 Hardware_ECC_Recovered  -O--CK   100   100   050    -    4314392
196 Reallocated_Event_Count -O--CK   100   100   050    -    2667
197 Current_Pending_Sector  -O--CK   100   100   050    -    3
198 Offline_Uncorrectable   -O--CK   100   100   050    -    1
199 UDMA_CRC_Error_Count    -O--CK   100   100   050    -    0
232 Available_Reservd_Space -O--CK   100   100   050    -    98
241 Total_LBAs_Written      ----CK   100   100   050    -    36474
242 Total_LBAs_Read         ----CK   100   100   050    -    10529
245 Unknown_Attribute       -O--CK   100   100   050    -    10146
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xde       GPL     VS       8  Device vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4488         -
# 2  Extended offline    Completed without error       00%      4487         -
# 3  Extended offline    Completed without error       00%      4486         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page Offset Size         Value  Description
  1  =====  =                =  == General Statistics (rev 1) ==
  1  0x008  4               13  Lifetime Power-On Resets
  1  0x010  4             4491  Power-on Hours
  1  0x018  6       2390408669  Logical Sectors Written
  1  0x020  6         69617191  Number of Write Commands
  1  0x028  6        690041929  Logical Sectors Read
  1  0x030  6          6959725  Number of Read Commands
  7  =====  =                =  == Solid State Device Statistics (rev 1) ==
  7  0x008  1                0  Percentage Used Endurance Indicator

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0005  4            1  R_ERR response for non-data FIS
0x000a  4           17  Device-to-Host register FISes sent due to a COMRESET
8
Manolete

過去にこの問題がありました。 IIRC、「オフラインの修正不可能なセクター」とは、ディスクコントローラー(PCのSATA/SCSIコントローラーではなく、ディスク内のセクター)が1つのセクターで繰り返し読み取り障害を起こし、それが間違いなく使用できないと判断したことを意味します。

それで、私はそのセクターをそれを使用するファイルシステムにとって悪いものとして宣言しなければなりませんか?

いいえ。幸いなことに、今日のディスクは、不良セクターを自動的にスペアセクターのプールから取得した良好なセクターに置き換えます。したがって、これらの不良セクターを使用しないようにファイルシステムに宣言する必要はありません。もちろん、そのプールのサイズは制限されています(Available_Reservd_Space sectors、私は推測します)、すべてのスペアセクターが使用されると、不良セクターは使用できなくなり、FSにそのように宣言する必要があります。

だから、すべてが大丈夫です、これは無害なメッセージですか?

あんまり。ドライブは不良セクターを数回読み取ろうとしましたが、毎回失敗しました。そのため、交換のためにキューに入れられていますが、ドライブはそれ自体ではそれを行うことができません(最終的にそれを読み取ることができることを期待し続けます)。セクターが新しいデータで上書きされるまで、「訂正不能」のままです。上書きされるか、ドライブがなんとかしてそれを読み取ることができた場合、ドライブは再マッピングされ、スペアセクターに置き換えられます(smartctl出力、Offline_Uncorrectableは1だけ減分され、Reallocated_Sector_Ctは1ずつ増加します。

私に何ができる?

そのような場合、私は通常、新しいセクターが正しい内容を持つように、RAID 1アレイを強制的に再同期させます(正常なディスク->障害のあるディスク)。いずれの場合でも、fsckを実行し、そのパーティションのバックアップがある場合(必要な場合)、そのバックアップを実際のコンテンツと比較します。

11
xhienne

長いsmartctlテストを実行します。問題が発生してエラーが検出された場合は、問題が発生していなければ、ハードドライブを使用しても問題ありません。

smartctl -t long /dev/sdb

注:ドライブの状態によっては、テスト前にハードドライブのデータをバックアップすることを忘れないでください。テストのストレスにより、ディスクがさらに損傷する可能性があります。

smartctlテストでエラーが発生した場合、_ Diskscan を使用してuncorrectable sectorsを試し、修正します。

4