smartdは属性の変更時にのみメールを送信します

Question

ディスクに障害が発生した場合にメールレポートを送信するようにsmartdを設定しました。残念ながら、私は同じ（変更されていない）値を持つ同じ属性によって毎日スパムされています：

The following warning/error was logged by the smartd daemon: Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors Device info: ST32000542AS, S/N:XXXXX, WWN:5-XXXXXX-XXXXXXXX, FW:XXXX, 2.00 TB

修正不可能なセクターのあるディスクに伴う危険性を十分に認識しています。（このディスクはRAID10構成で使用されます）

毎日同じメールを受信したくないだけです。値が変化/増加した場合にメールを受信したいだけです。

これは私の/etc/smartd.confの現在の構成です：

DEVICESCAN -d removable -n standby -t -m root -M exec /usr/share/smartmontools/smartd-runner

Smartdの望ましい動作を実現するには、何を変更する必要がありますか？

Stephen Kitt · Answer

状態の永続化が有効になっている場合のデフォルトの動作（Debianのデフォルトだと思います）は、クリティカルイベントがログに記録されたときに毎日（または、対応する電子メールがに送信されていない場合はクリティカルイベントがログに記録されたときに）電子メールを送信することです。少なくとも1日、メールが送信されます）。この動作は、-Mオプションを使用して変更できます。追加

-M once

smartd.confに送信すると、重大なイベントは1通のメールになり、繰り返しは発生しません。

（-M onceオプションは-M execオプションに追加されます。）

状態の永続性が実際に有効になっているかどうかを確認するには、/var/lib/smartmontoolsの内容を確認します。最近更新された、すべてのドライブの状態ファイルが表示されます。

dredkin · Answer

これは古いスレッドですが、smartdのマニュアルで見つけたものを投稿します。

-U ID[+] [ATA only] Report if the number of offline uncorrectable sectors is non-zero. Here ID is the id number of the Attribute whose raw value is the Offline Uncorrectable Sector count. The allowed range of ID is 0 to 255 inclusive. To turn off this reporting, use ID = 0. If the -U ID option is not given, then it defaults to -U 198 (since Attribute 198 is generally used to monitor offline uncorrectable sectors). If the name of this Attribute is changed by a '-v 198,FORMAT,NAME' (except '-v 198,FORMAT,Offline_Scan_UNC_SectCt'), directive, the default is changed to -U 0. If '+' is specified, a report is only printed if the number of sectors has increased since the last check cycle. **Some disks do not reset this attribute when a bad sector is reallocated.** See also '-v 198,increasing' below.

したがって、オプション-U 198+を追加すると、希望どおりの結果が得られます。

frostschutz · Answer

あなたは-M execとsmartd-runnerを使用しています。これは明らかにDebianの専門です。

Package Maintainers and system administrators can put scripts to be run when smartd detects an error into /etc/smartmontools/run.d. These scripts will be run by smartd-runner using run-parts(8). The script will receive the filename of the file containing the errormessage as first parameter. See /etc/smartmontools/run.d/10mail for an example.

このランナーの主な目的は、メールの送信をオプションにすることであり（メーラーが最初にインストールされているかどうかによって異なります）、それとは別に、デスクトップでポップアップ通知をトリガーするようです（デスクトップ通知機能がインストールされている場合）。

したがって、その10mailスクリプトを変更して、重複するメールを除外できると思います。

別の方法は、--savestatesオプションを指定してsmartdを実行し、smartd.confで-M onceを使用することです。これについては、smartd.confのマンページで詳しく説明しています。

 once - send only one warning email for each type of disk problem daily - send additional warning reminder emails, once per day, diminishing - send additional warning reminder emails, after a one-day interval, then a two-day interval, then a four-day test - send a single test email immediately upon smartd startup. exec PATH - run the executable PATH instead of the default mail

それがうまくいかない場合は、それを使用するか、結局のところそれに応じてフィルタリングする独自のメールハンドラ実行ロジックを作成する必要があります。

私は、修正不可能なセクターを持つディスクが暗示する危険性を十分に認識しています。（このディスクはRAID10構成で使用されます）

私はまだそのようなドライブをすぐに交換することをお勧めします。 RAIDが冗長性に関して約束することは、各ドライブが100％正常に機能している場合にのみ有効です。

ドライブを交換すると、データを失うリスクを冒すことなく、取り外したドライブに対して完全な書き込みテストを実行できます。このテストでのドライブの運賃に応じて、そのドライブを引き続き使用するかどうか（できればレイド外）をより多くの情報に基づいて決定できます。か否か。

ドライブをテストするまで、ドライブが実際にどれほど壊れているかはわかりません。また、smartd.confは定期的なセルフテストを実行しているようには見えないため、エラーが長期間検出されないままになる可能性があります。そして、これが再構築中にRAIDが死ぬ方法です。