遅いnginx / Unicornリクエストの監視

Question

私は現在、Nginxを使用して、 Sinatra アプリケーションを実行している nicorn サーバーにリクエストをプロキシしています。アプリケーションにはいくつかのルートしか定義されておらず、そのうちのルートはPostgreSQLデータベースに対してかなり単純な（コストのかからない）クエリを実行し、最終的にJSON形式でデータを返します。これらのサービスは God によって監視されています。

現在、このアプリケーションサーバーからの応答時間が非常に遅くなっています。 Nginxを介してプロキシされている別の2つのUnicornサーバーがあり、これらは完全に正常に応答しているので、Nginxからの不正行為を除外できると思います。

これが私の神の構成です：

# God configuration APP_ROOT = File.expand_path '../', File.dirname(__FILE__) God.watch do |w| w.name = "app_name" w.interval = 30.seconds # default w.start = "cd #{APP_ROOT} && Unicorn -c #{APP_ROOT}/config/Unicorn.rb -D" # -QUIT = graceful shutdown, waits for workers to finish their current request before finishing w.stop = "kill -QUIT `cat #{APP_ROOT}/tmp/Unicorn.pid`" w.restart = "kill -USR2 `cat #{APP_ROOT}/tmp/Unicorn.pid`" w.start_grace = 10.seconds w.restart_grace = 10.seconds w.pid_file = "#{APP_ROOT}/tmp/Unicorn.pid" # User under which to run the process w.uid = 'web' w.gid = 'web' # Cleanup the pid file (this is needed for processes running as a daemon) w.behavior(:clean_pid_file) # Conditions under which to start the process w.start_if do |start| start.condition(:process_running) do |c| c.interval = 5.seconds c.running = false end end # Conditions under which to restart the process w.restart_if do |restart| restart.condition(:memory_usage) do |c| c.above = 150.megabytes c.times = [3, 5] # 3 out of 5 intervals end restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = 5 end end w.lifecycle do |on| on.condition(:flapping) do |c| c.to_state = [:start, :restart] c.times = 5 c.within = 5.minute c.transition = :unmonitored c.retry_in = 10.minutes c.retry_times = 5 c.retry_within = 2.hours end end end

これが私のユニコーン構成です：

# Unicorn configuration file APP_ROOT = File.expand_path '../', File.dirname(__FILE__) worker_processes 8 preload_app true pid "#{APP_ROOT}/tmp/Unicorn.pid" listen 8001 stderr_path "#{APP_ROOT}/log/Unicorn.stderr.log" stdout_path "#{APP_ROOT}/log/Unicorn.stdout.log" before_fork do |server, worker| old_pid = "#{APP_ROOT}/tmp/Unicorn.pid.oldbin" if File.exists?(old_pid) && server.pid != old_pid begin Process.kill("QUIT", File.read(old_pid).to_i) rescue Errno::ENOENT, Errno::ESRCH # someone else did our job for us end end end

神のステータスログを確認しましたが、CPUとメモリの使用量が決して範囲外ではないようです。また、GitHubのブログページここで見つけることができる、メモリの多いワーカーを殺すための何かがあります。

Unicornログでtail -fを実行すると、someリクエストが表示されますが、60前後のときは、リクエストはほとんどありません。 -このトラブルが発生したように見える100秒前。このログには、期待どおりに刈り取られて開始されたワーカーも表示されます。

だから私の質問は、これをデバッグするにはどうすればよいですか？私が取るべき次のステップは何ですか？サーバーがすぐに応答することもありますが、それ以外の場合は非常に遅く、長期間（トラフィックのピーク時間である場合とそうでない場合があります）、非常に困惑しています。

どんなアドバイスも大歓迎です。

sciurus · Answer

atop のようなツールを使用して、一般的なシステムの状態を確認することから始めます。次に、 postgresが行っていることを詳しく見ていきます。それでも問題が解決しない場合は、tcpdump（またはより適切な tshark ）を使用して、ブラウザー、ngnix、およびUnicorn間の通信を表示します。それと併せて、nginxとUnicornで strace を試してみます。