蝉游记网站的部署 Nginx,Unicorn,Capistrano,OOB,Graceful Restart

蝉游记（ [url]http://chanyouji.com[/url] ）网站之前用Nginx+Passenger+自制script来部署，随着用户增多，移动app的api调用增加，服务器增多和无缝部署重启的需求，转移到了Nginx+Unicorn+Capistrano，写篇博客记录一下各种细节和需要注意的地方。

1. Nginx的配置


gzip  on;
#开启gzip，同时对于api请求的json格式也开启gzip
gzip_types application/json;

#每台机器都运行nginx+unicorn，本机用domain socket，方便切换
upstream ruby_backend {
    server unix:/tmp/unicorn.sock fail_timeout=0;
    server 10.4.8.34:4096 fail_timeout=0;
    server 10.4.3.8:4096 fail_timeout=0;
}

#用try_files方式和proxy执行rails动态请求
server {
    listen       80;
    server_name  chanyouji.com;
    root         /www/youji_deploy/current/public;

    try_files $uri/index.html $uri.html $uri @httpapp;

    location @httpapp {
      proxy_redirect     off;
      proxy_set_header   Host $host;
      proxy_set_header   X-Forwarded-Host $host;
      proxy_set_header   X-Forwarded-Server $host;
      proxy_set_header   X-Real-IP        $remote_addr;
      proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
      proxy_buffering    on;
      proxy_pass         http://ruby_backend;
   }
}

#用不同的域名提供静态资源服务，减少主域名带来的cookie请求和方便做cdn源
server {
    listen       80;
    server_name  cdn.chanyouji.cn cdnsource.chanyouji.cn;
    root         /www/youji_deploy/current/public;

    location ~ ^/(assets)/  {
      root /www/youji_deploy/current/public;
      gzip_static on; # to serve pre-gzipped version
      expires max;
      add_header Cache-Control public;
    }
}

2 unicorn.rb的配置


worker_processes 6

app_root = File.expand_path("../..", __FILE__)
working_directory app_root

# Listen on fs socket for better performance
listen "/tmp/unicorn.sock", :backlog => 64
listen 4096, :tcp_nopush => false

# Nuke workers after 30 seconds instead of 60 seconds (the default)
timeout 30

# App PID
pid "#{app_root}/tmp/pids/unicorn.pid"

# By default, the Unicorn logger will write to stderr.
# Additionally, some applications/frameworks log to stderr or stdout,
# so prevent them from going to /dev/null when daemonized here:
stderr_path "#{app_root}/log/unicorn.stderr.log"
stdout_path "#{app_root}/log/unicorn.stdout.log"

# To save some memory and improve performance
preload_app true
GC.respond_to?(:copy_on_write_friendly=) and
  GC.copy_on_write_friendly = true

# Force the bundler gemfile environment variable to
# reference the Сapistrano "current" symlink
before_exec do |_|
  ENV["BUNDLE_GEMFILE"] = File.join(app_root, 'Gemfile')
end

before_fork do |server, worker|
  # 参考 http://unicorn.bogomips.org/SIGNALS.html
  # 使用USR2信号，以及在进程完成后用QUIT信号来实现无缝重启
  old_pid = app_root + '/tmp/pids/unicorn.pid.oldbin'
  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end

  # the following is highly recomended for Rails + "preload_app true"
  # as there's no need for the master process to hold a connection
  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.connection.disconnect!
end

after_fork do |server, worker|
  # 禁止GC，配合后续的OOB，来减少请求的执行时间
  GC.disable
  # the following is *required* for Rails + "preload_app true",
  defined?(ActiveRecord::Base) and
    ActiveRecord::Base.establish_connection
end

3. GC OOB
这篇newrelic的文章解释很清楚： http://blog.newrelic.com/2013/05/28/unicorn-rawk-kick-gc-out-of-the-band/
就是将GC延迟到用户请求完成以后，这样就会缩短响应时间，配合现成的gem unicorn-worker-killer 也不用担心内存爆掉。

在config.ru里面配置：


require 'unicorn/oob_gc'
require 'unicorn/worker_killer'
#每10次请求，才执行一次GC
use Unicorn::OobGC, 10
#设定最大请求次数后自杀，避免禁止GC带来的内存泄漏（3072～4096之间随机，避免同时多个进程同时自杀，可以和下面的设定任选）
use Unicorn::WorkerKiller::MaxRequests, 3072, 4096
#设定达到最大内存后自杀，避免禁止GC带来的内存泄漏（192～256MB之间随机，避免同时多个进程同时自杀）
use Unicorn::WorkerKiller::Oom, (192*(1024**2)), (256*(1024**2))

require ::File.expand_path('../config/environment',  __FILE__)
run Youji::Application

4. Capistrano部署脚本


set :unicorn_config, "#{current_path}/config/unicorn.rb"
set :unicorn_pid, "#{current_path}/tmp/pids/unicorn.pid"

namespace :deploy do
  task :start, :roles => :app, :except => { :no_release => true } do
    run "cd #{current_path} && RAILS_ENV=production bundle exec unicorn_rails -c #{unicorn_config} -D"
  end

  task :stop, :roles => :app, :except => { :no_release => true } do
    run "if [ -f #{unicorn_pid} ]; then kill -QUIT `cat #{unicorn_pid}`; fi"
  end

  task :restart, :roles => :app, :except => { :no_release => true } do
    # 用USR2信号来实现无缝部署重启
    run "if [ -f #{unicorn_pid} ]; then kill -s USR2 `cat #{unicorn_pid}`; fi"
  end
end

完成这些改进以后，部署蝉游记的新版本就只用输入cap production deploy，然后就可以喝茶去了，也不用担心用户在重启动的时候会有短期卡死的问题 :)

补2张图：
new relic的监控图，和启用OOB之前相比，平均响应时间从100ms左右下降到了90ms左右：

[img]http://dl2.iteye.com/upload/attachment/0086/2423/a40a8d88-098b-3f4a-b44f-f1c41b7cd81b.png[/img]

服务器的内存和CPU使用：

[img]http://dl2.iteye.com/upload/attachment/0086/2425/470136d8-df23-3caa-b678-6487701bfa13.png[/img]