将上游响应数据记录在Nginx日志中

发表于 2021-03-06 分类于 Nginx 本文字数： 11k 阅读时长 ≈ 10 分钟

之前学习了如何在Nginx中添加Lua模块，接下来就可以做实验完成自己的目标需求了，比如将上游服务的响应数据记录在日志中，方便后续的调试、审计等功能。

准备上游服务

首先准备一个上游服务的示例，在http块中添加如下内容:

在http块中添加log_format，作为记录上游服务响应日志的格式:

log_format bodylog escape=json '$remote_addr - [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    '<"$request_body"> "$resp_body"';

upstream geo {
    server mytest.com;
}

当上游响应是json结构的数据时，Nginx默认会将引号打印成\x22，后面再处理的时候比较麻烦，比如:

1	\x22status\x22:\x22success\x22

加上escape=json后，会进行正确的转义:

1	\"status\":\"success\"

在Nginx文档中有相关解析:

The escape parameter (1.11.8) allows setting json or default characters escaping in variables, by default, default escaping is used. The none value (1.13.10) disables escaping.

通过escape=json也可以输出JSON形式的日志，比如:

log_format bodylog escape=json
  '{'
    '"remote_addr":"$remote_addr",'
    '"time_local":"$time_local",'
    '"request":"$request",'
    '"status": "$status",'
    '"body_bytes_sent":"$body_bytes_sent",'
    '"http_referrer":"$http_referer",'
    '"http_user_agent":"$http_user_agent"'
  '}';

可以查看日志输出:

{"remote_addr":"192.168.1.2","time_local":"07/Mar/2021:11:14:30 +0800","request":"GET /geo/1.1.1.1 HTTP/1.1","status": "200","body_bytes_sent":"325","http_referrer":"","http_user_agent":"curl/7.55.1""}

在server块中添加如下内容:

location ^~/geo/ {
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header Host mytest.com;
    rewrite ^/geo/(.*)$ /json/$1 break;
    proxy_pass http://geo;
    access_log  /usr/local/src/nginx/logs/geo.log bodylog;
}

利用Lua记录上游响应结果

直接配置

在server块中添加如下内容:

lua_need_request_body on;

set $resp_body "";
body_filter_by_lua '
    local resp_body = ngx.arg[1]
    ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
    if ngx.arg[2] then
        ngx.var.resp_body = ngx.ctx.buffered
    end
';

此处我记录的是整个响应结果，但如果响应过大可能会影响性能，可以做如下修改:

1	local resp_body = string.sub(ngx.arg[1], 1, 1000)

或者根据自己的需求上游响应数据进行过滤。

配置修改完成后reload，通过curl请求测试，查看/usr/local/src/nginx/logs/geo.log文件，上游服务响应的结果已经记录:

192.168.1.2 - [06/Mar/2021:23:34:21 +0800] "GET /geo/1.1.1.1 HTTP/1.1" 200 325 "" "curl/7.55.1" <""> "{\"status\":\"success\",\"country\":\"Australia\",\"countryCode\":\"AU\",\"region\":\"QLD\",\"regionName\":\"Queensland\",\"city\":\"South Brisbane\",\"zip\":\"4101\",\"lat\":-27.4766,\"lon\":153.0166,\"timezone\":\"Australia/Brisbane\",\"isp\":\"Cloudflare, Inc\",\"org\":\"APNIC and Cloudflare DNS Resolver project\",\"as\":\"AS13335 Cloudflare, Inc.\",\"query\":\"1.1.1.1\"}"

此时nginx.conf主要配置如下:

http {
    log_format bodylog escape=json '$remote_addr - [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       '<"$request_body"> "$resp_body"';

    upstream geo {
        server mytest.com;
    }

    server {
        listen       80;
        server_name  192.168.1.1;

        lua_need_request_body on;

        set $resp_body "";
        body_filter_by_lua '
            local resp_body = ngx.arg[1]
            ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
            if ngx.arg[2] then
              ngx.var.resp_body = ngx.ctx.buffered
            end
        ';

        location ^~/geo/ {
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host mytest.com;
            rewrite ^/geo/(.*)$ /json/$1 break;
            proxy_pass http://geo;
            access_log  /usr/local/src/nginx/logs/geo.log bodylog;
        }
    }
}

脚本写入

这里可以将nginx.conf中相关的配置写入自定义路径的配置文件和lua脚本中，后期的维护也更加方便:
1、在conf目录下创建mywork文件夹，用于存在自己的文件;
2、将bodylog的日志格式写入到mywork的文件中，通过include指令扩展;
3、将lua代码写入lua脚本;
4、修改nginx.conf文件:

http {
    include mywork/log_upstream_resp.conf;

    upstream geo {
        server mytest.com;
    }

    server {
        listen       80;
        server_name  192.168.1.1;

        lua_need_request_body on;

        set $resp_body "";
        body_filter_by_lua_file conf/mywork/log_upstream_resp.lua;

        location ^~/geo/ {
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host mytest.com;
            rewrite ^/geo/(.*)$ /json/$1 break;
            proxy_pass http://geo;
            access_log  /usr/local/src/nginx/logs/geo.log bodylog;
        }
    }
}

重新reload后进行测试，效果与直接配置一致，但nginx.conf看起来更加简洁，便于维护，如果有多个server的话，也可以进行拆分，通过include指令进行扩展。

error log

此前在业务上解决问题查找资料时，发现error_log记录日志和上述access_log的表现并不一样。在Github上找到了相同的issue，通过这个issue复现下，先来看一下配置文件:

upstream mytest {
    server 192.168.0.113;
}

server {
    listen 8103;
    error_log  /usr/local/src/nginx/logs/8103_error.log debug;
    location / {
            body_filter_by_lua_block {
                    ngx.log(ngx.INFO,"ngx.arg[1]===========", ngx.arg[1])
                    ngx.log(ngx.INFO,"ngx.arg[2]===========", ngx.arg[2])
            }
            proxy_pass http://mytest;
    }
}

mytest上游服务返回一串大的JSON数据，比如我这里使用https://raw.githubusercontent.com/grafana/grafana/main/package.json，大小为12.8KB。当访问8103端口时，期望将数据响应记录在error日志中。

但是当实际访问8103端口时，body_filter_by_lua多次调用，8103_error.log显示ngx.arg[1]只记录了一部分数据:

$ cat 8103_error.log 
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:2: ngx.arg[1]==========={
  "author": "Grafana Labs",
  "license": "AGPL-3.0-only",
  "private": true,
  "name": "grafana",
  "version": "8.2.0-pre",
  "repository": "github:grafana/grafana",
  "scripts": {
    "api-tests": "jest --notify --watch --config=devenv/e2e-api-tests/jest.js",
    "build": "node ./node_modules/webpack/bin/webpack.js --config scripts/webpack/webpack.prod.js",
    "dev": "webpack --progress --color --config scripts/webpack/webpack.dev.js",
    "e2e": "./e2e/start-and-run-suite",
    "e2e:debug": "./e2e/start-and-run-suite debug",
    "e2e:dev": "./e2e/start-and-run-suite dev",
    "test": "jest --notify --watch",
    "test:accessibility": "pa11y-ci --config .pa11yci.conf.js",
    "lint": "yarn run lint:ts && yarn run lint:sass",
    "lint:ts": "eslint . --ext .js,.tsx,.ts --cache",
    "lint:sass": "yarn run sass-lint -c public/sass/.sass-lint.yml 'public/sass/**/*.scss, packages/**/*.scss' -v -i '**/node_modules/**/*.scss'",
    "test:ci": "mkdir -p reports/junit && export JEST_JUNIT_OUTPUT_DIR=reports/junit && jest --ci --reporters=default --reporters=jest-junit -w ${TEST_MAX_WORKERS:-100%}",
    "lint:fix": "yarn lint --fix",
    "packages:build": "lerna run clean && lerna run build --ignore @grafana-plugins/input-datasource",
    "packages:docsExtract": "rm -rf ./reports/docs && lerna run docsExtract",
    "packages:docsToMarkdown": "api-documenter markdown --input-folder ./reports/docs/ --output-folder ./docs/sources/packages_api/ --hugo",
    "packages:prepare": "lerna version --no-push --no-git-tag-version --force-publish --exact",
    "packages:publish": "lerna publish from-package --contents dist",
    "packages:publishCanary": "lerna publish from-package --contents dist --dist-tag canary --yes",
    "packages:publishLatest": "lerna publish from-package --contents dist --yes",
    "packages:publishNext": "lerna publish from-package --contents dist --dist-tag next --yes",
    "packages:publishDev": "lerna publish fro
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:3: ngx.arg[2]===========false while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:2: ngx.arg[1]=========== while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:3: ngx.arg[2]===========true while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 client 192.168.0.118 closed keepalive connection

造成这个问题的原因是Nginx的错误日志消息有硬编码长度限制，在标准Nginx核心中为2KB，在OpenResty的Nginx核心中为4KB。

Nginx src/core/ngx_log.h中宏定义:

1	#define NGX_MAX_ERROR_STR 2048

OpenResty patches/nginx-1.19.9-larger_max_error_str.patch中:

-#define NGX_MAX_ERROR_STR   2048
+#ifndef NGX_MAX_ERROR_STR
+#define NGX_MAX_ERROR_STR   4096
+#endif

现在就知道问题所在了。

全缓冲

上面通过body_filter_by_lua可以获取全部响应正文，但agentzh春哥是不建议这么做的:

All the point of an output filter in nginx is to do the data processing in a non-buffered fashion, which is crucial for processing HUGE response body data streams. If you do full buffering yourself in body_filter_by_lua*, then you may blow up the LuaJIT’s GC-managed memory upon HUGE responses, and defeat all the purposes of nginx’s output filter design.

如果业务中必须获取到所有请求的响应数据之后进行处理(比如完整记录或者数据脱敏)，那必须缓冲所有的数据块。春哥在OpenResty的Discussion中也给出了答案，我们可以根据业务情况自行修改:

local chunk, eof = ngx.arg[1], ngx.arg[2]
local buffered = ngx.ctx.buffered
if not buffered then
    buffered = {}
    ngx.ctx.buffered = buffered
end

if chunk ~= "" then
    buffered[#buffered + 1] = chunk
    -- 也有写成这样的：table.insert(buffered, chunk)
    ngx.arg[1] = nil
end

if eof then
    local whole = table.concat(buffered)
    ngx.ctx.buffered = nil

    -- whole = todo, eg: replace
    -- whole = string.gsub(whole, "hello world", "HIT!")

    ngx.arg[1] = whole
    ngx.var.resp_body = whole
end

这种处理方式，比上面的更严谨，对变量进行了初始化，判断流是否已经结束等。另外，如果通过body_filter_by_lua*修改响应体，并且修改了响应体的实际大小，需要注意置空Content-Length报头，header_filter_by_lua*会在Nginx发送报头之前调用，不会误导客户端:

1
2
3

header_filter_by_lua '
            ngx.header.content_length = nil
        ';

或者

1 2	-- header_by_lua_file: ngx.header.content_length = nil

参考

1、https://serverfault.com/questions/361556/is-it-possible-to-log-the-response-data-in-nginx-access-log
2、http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
3、http://nginx.org/en/docs/ngx_core_module.html#include
4、https://stackoverflow.com/questions/25049667/how-to-generate-a-json-log-from-nginx
5、https://github.com/openresty/lua-nginx-module/issues/1092
6、https://forum.openresty.us/d/4462-7200de849c40c83a32394ae9cb27ceef/2
7、https://zhuanlan.zhihu.com/p/67904411
8、https://github.com/apache/apisix/blob/master/apisix/plugins/grpc-transcode/response.lua
9、https://jkzhao.github.io/2018/05/03/Lua-OpenResty%E4%BF%AE%E6%94%B9response-body/