web-dev-qa-db-ja.com

CentOS 7 DL120 G9 withH240-RAIDの問題の監視

Smart HBA H240カードを使用して新しいサーバーを構成し、hpssaducliをインストールしたところ、コントローラーが検出され、レポートを生成できるようになりました。

私が抱えている問題は、RAIDの障害を検出してアラートを送信する方法です。

Hpssaducliを介して生成されたレポートには、ふるいにかけるのが困難で、現在障害のあるアレイがない大量の情報が含まれているため、ドライブに障害が発生した場合にどの情報を見つける必要があるかわかりません。

詳細

root@server [~]# lsmod | grep hp
hpwdt                  14242  0
hpilo                  17381  0
shpchp                 37032  0
hpsa                   94958  3

root@server [~]# rpm -qa | grep hpsa
kmod-hpsa-3.4.12-110.rhel7u1.x86_64

root@server [~]# uname -a
Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@server [~]# hpssaducli
HP Smart Storage Diagnostics 2.10.14.0
Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ]
...
...

Diagnosable devices:
Smart HBA H240 in Slot 2

Hpssacliからの出力

root@server [~]# hpssacli ctrl all show config detail

Smart HBA H240 in Slot 2 (RAID Mode)
   Bus Interface: PCI
   Slot: 2
   Serial Number: XXXXXXXXX
   Cache Serial Number: XXXXXXXXX
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 1.34
   Rebuild Priority: High
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: False
   Drive Write Cache: Disabled
   Controller Memory Size: 256 MB
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 72
   Cache Module Temperature (C): 36
   Number of Ports: 2 Internal only
   Encryption: Disabled
   Express Local Encryption: False
   Driver Name: hpsa
   Driver Version: 3.4.12
   Driver Supports HP SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: RAID Mode
   Controller Mode Reboot: Not Required
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Host Serial Number: CZ250305FS
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None


   Port Name: 2I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 500143803366B9C0
         Port Location: Internal
         Managed Cable Connected: False

   Port Name: 1I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 500143803366B9C4
         Port Location: Internal
         Managed Cable Connected: False

   Internal Drive Cage at Port 1I, Box 1, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
      None attached


   Internal Drive Cage at Port 2I, Box 0, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached
      None attached

   Array: A
      Interface Type: Solid State SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 1.8 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable



      Logical Drive: 1
         Size: 931.5 GB
         Fault Tolerance: 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 512 KB
         Status: Ready for Rebuild
         Caching:  Disabled
         Unique Identifier: XXXXXXXXX
         Disk Name: /dev/sda
         Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3
         OS Status: LOCKED
         Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B
         Mirror Group 1:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
         Mirror Group 2:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
         Drive Type: Data
         LD Acceleration Method: HP SSD Smart Path

      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
         Sanitize Erase Supported: False

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
1
copyandpaster

これを重複して閉じたくありませんが、サーバーの正常性情報を提供するためにHP管理エージェントをインストールする必要があります。これは、ProLiant DL120 Gen9およびRHEL7の場合、 yum経由 または サポートサイト にリストされている個々のパッケージを使用して利用できます。

参照: HP ProLiant DL380e Gen8サーバー-SPP使用 いくつかのアイデアについて...

少なくとも、 hpssacli tool を使用して、実際のRAIDコントローラー情報をオンデマンドで提供できます。

ただし、他のユーティリティを含めると、サーバーは電子メール、SNMPトラップを送信し、ヘルスイベントをログに記録することもできることを理解してください。

1
ewwhite