新品のデュアルCPUサーバーを受け取ったばかりですが、起動後すぐにカーネルパニックでクラッシュし続けます。これは、アイドル状態のOSセットアップ中にも発生しました。 OSをインストールし、mcelogが何が起こっているのかを理解できるようにすることができましたが、出力をどうするかはわかりません。オンラインで読んだことで、これはソケット(1)の1つでDIMMに欠陥があるのではないかと思いましたが、いくつかのパスでmemtestを実行しましたが、エラーは見つかりませんでした。代わりに、これがソフトウェアの問題である可能性はありますか?私はすでに2つのOSを試しましたが、Debian/ProxmoxではCentOSよりもはるかに一般的でしたが、両方で同じことが起こりました。
サーバーの仕様:
デュアルIntel8コアXeonE5-2620v4
2 x DIMM 32GB DDR4 2400MHz RECC DDR4
MB SuperMicro X10DRL-i
MemtestまたはOSのインストール中に35ºCを超えることはなかったため、CPUサーマルではありません。また、CPUがクラッシュする前に、CPUでいくつかのショートベンチマークを実行することができ、温度は問題ありませんでした。
ここで何が起こっているのかをどうやって理解できますか?発生する前に数分間サーバーにアクセスできます。すでにvmcoreダンプをダウンロードしていますが、どうすればよいかわかりません。
起動してからクラッシュしてから50秒後のmceログは次のとおりです。
[ 56.367615] mce: [Hardware Error]: Machine check events logged
[ 70.420914] mce: [Hardware Error]: Machine check events logged
[ 71.886789] Disabling lock debugging due to kernel taint
[ 71.886894] mce: [Hardware Error]: CPU 24: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.887009] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.887122] mce: [Hardware Error]: TSC 206cc7cd362
[ 71.887184] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 11 microcode b00001d
[ 71.887289] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.889392] mce: [Hardware Error]: CPU 30: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.889489] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.889595] mce: [Hardware Error]: TSC 206cc7cd11d
[ 71.889657] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1d microcode b00001d
[ 71.889760] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.891804] mce: [Hardware Error]: CPU 14: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.891901] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.892007] mce: [Hardware Error]: TSC 206cc7cd10e
[ 71.892068] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1c microcode b00001d
[ 71.892171] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.894217] mce: [Hardware Error]: CPU 13: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.894314] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.894420] mce: [Hardware Error]: TSC 206cc7cd23c
[ 71.894480] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1a microcode b00001d
[ 71.894585] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.896634] mce: [Hardware Error]: CPU 29: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.896730] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.896835] mce: [Hardware Error]: TSC 206cc7cd194
[ 71.896896] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1b microcode b00001d
[ 71.897000] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.899053] mce: [Hardware Error]: CPU 28: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.899150] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.899256] mce: [Hardware Error]: TSC 206cc7cd719
[ 71.899335] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 19 microcode b00001d
[ 71.899438] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.901485] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.901582] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.901687] mce: [Hardware Error]: TSC 206cc7cd720
[ 71.901748] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 18 microcode b00001d
[ 71.901851] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.903934] mce: [Hardware Error]: CPU 10: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.904031] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.904136] mce: [Hardware Error]: TSC 206cc7cd851
[ 71.904197] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 14 microcode b00001d
[ 71.904300] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.906306] mce: [Hardware Error]: CPU 26: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.906403] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.906508] mce: [Hardware Error]: TSC 206cc7cd863
[ 71.906569] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 15 microcode b00001d
[ 71.909482] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.914367] mce: [Hardware Error]: CPU 11: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.917304] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.920287] mce: [Hardware Error]: TSC 206cc7cd515
[ 71.923159] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 16 microcode b00001d
[ 71.926031] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 71.930820] mce: [Hardware Error]: CPU 27: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.933685] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.936557] mce: [Hardware Error]: TSC 206cc7cd449
[ 71.939384] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 17 microcode b00001d
[ 71.944180] mce: [Hardware Error]: CPU 9: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.947059] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.949956] mce: [Hardware Error]: TSC 206cc7cd766
[ 71.952786] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 12 microcode b00001d
[ 71.957580] mce: [Hardware Error]: CPU 25: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.960480] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.963366] mce: [Hardware Error]: TSC 206cc7cd751
[ 71.966210] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 13 microcode b00001d
[ 71.971031] mce: [Hardware Error]: CPU 31: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.973919] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.976817] mce: [Hardware Error]: TSC 206cc7cd7f7
[ 71.979690] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1f microcode b00001d
[ 71.984474] mce: [Hardware Error]: CPU 15: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 71.987371] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 71.990290] mce: [Hardware Error]: TSC 206cc7cd803
[ 71.993151] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 1e microcode b00001d
[ 71.997992] mce: [Hardware Error]: CPU 8: Machine Check Exception: 5 Bank 20: fa00004000020e0f
[ 72.000918] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8138fb97> {intel_idle+0xd7/0x160}
[ 72.003828] mce: [Hardware Error]: TSC 206cc7cd374
[ 72.006692] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1487438906 SOCKET 1 APIC 10 microcode b00001d
[ 72.011533] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 72.014436] Kernel panic - not syncing: Fatal machine check
返信が遅いことは知っていますが、完全に忘れてしまいました。 CPUが正しく配置されていないか、出荷中に緩んでいた可能性があります。少なくとも、ベンダーは物を交換しなかったと言っているので、それは私に言ったことです。
彼らがそれを返送した後、すべてが機能していました。