将上游响应数据记录在Nginx日志中

之前学习了如何在Nginx中添加Lua模块,接下来就可以做实验完成自己的目标需求了,比如将上游服务的响应数据记录在日志中,方便后续的调试、审计等功能。

准备上游服务

首先准备一个上游服务的示例,在http块中添加如下内容:

在http块中添加log_format,作为记录上游服务响应日志的格式:

1
2
3
4
5
6
7
8
log_format bodylog escape=json '$remote_addr - [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'<"$request_body"> "$resp_body"';

upstream geo {
server mytest.com;
}

当上游响应是json结构的数据时,Nginx默认会将引号打印成\x22,后面再处理的时候比较麻烦,比如:

1
\x22status\x22:\x22success\x22

加上escape=json后,会进行正确的转义:

1
\"status\":\"success\"

在Nginx文档中有相关解析:

The escape parameter (1.11.8) allows setting json or default characters escaping in variables, by default, default escaping is used. The none value (1.13.10) disables escaping.

通过escape=json也可以输出JSON形式的日志,比如:

1
2
3
4
5
6
7
8
9
10
log_format bodylog escape=json
'{'
'"remote_addr":"$remote_addr",'
'"time_local":"$time_local",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';

可以查看日志输出:

1
{"remote_addr":"192.168.1.2","time_local":"07/Mar/2021:11:14:30 +0800","request":"GET /geo/1.1.1.1 HTTP/1.1","status": "200","body_bytes_sent":"325","http_referrer":"","http_user_agent":"curl/7.55.1""}

在server块中添加如下内容:

1
2
3
4
5
6
7
8
location ^~/geo/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host mytest.com;
rewrite ^/geo/(.*)$ /json/$1 break;
proxy_pass http://geo;
access_log /usr/local/src/nginx/logs/geo.log bodylog;
}

利用Lua记录上游响应结果

直接配置

在server块中添加如下内容:

1
2
3
4
5
6
7
8
9
10
lua_need_request_body on;

set $resp_body "";
body_filter_by_lua '
local resp_body = ngx.arg[1]
ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
if ngx.arg[2] then
ngx.var.resp_body = ngx.ctx.buffered
end
';

此处我记录的是整个响应结果,但如果响应过大可能会影响性能,可以做如下修改:

1
local resp_body = string.sub(ngx.arg[1], 1, 1000)

或者根据自己的需求上游响应数据进行过滤。

配置修改完成后reload,通过curl请求测试,查看/usr/local/src/nginx/logs/geo.log文件,上游服务响应的结果已经记录:

1
192.168.1.2 - [06/Mar/2021:23:34:21 +0800] "GET /geo/1.1.1.1 HTTP/1.1" 200 325 "" "curl/7.55.1" <""> "{\"status\":\"success\",\"country\":\"Australia\",\"countryCode\":\"AU\",\"region\":\"QLD\",\"regionName\":\"Queensland\",\"city\":\"South Brisbane\",\"zip\":\"4101\",\"lat\":-27.4766,\"lon\":153.0166,\"timezone\":\"Australia/Brisbane\",\"isp\":\"Cloudflare, Inc\",\"org\":\"APNIC and Cloudflare DNS Resolver project\",\"as\":\"AS13335 Cloudflare, Inc.\",\"query\":\"1.1.1.1\"}"

此时nginx.conf主要配置如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
http {
log_format bodylog escape=json '$remote_addr - [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'<"$request_body"> "$resp_body"';

upstream geo {
server mytest.com;
}

server {
listen 80;
server_name 192.168.1.1;

lua_need_request_body on;

set $resp_body "";
body_filter_by_lua '
local resp_body = ngx.arg[1]
ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
if ngx.arg[2] then
ngx.var.resp_body = ngx.ctx.buffered
end
';

location ^~/geo/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host mytest.com;
rewrite ^/geo/(.*)$ /json/$1 break;
proxy_pass http://geo;
access_log /usr/local/src/nginx/logs/geo.log bodylog;
}
}
}

脚本写入

这里可以将nginx.conf中相关的配置写入自定义路径的配置文件和lua脚本中,后期的维护也更加方便:
1、在conf目录下创建mywork文件夹,用于存在自己的文件;
2、将bodylog的日志格式写入到mywork的文件中,通过include指令扩展;
3、将lua代码写入lua脚本;
4、修改nginx.conf文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
http {
include mywork/log_upstream_resp.conf;

upstream geo {
server mytest.com;
}

server {
listen 80;
server_name 192.168.1.1;

lua_need_request_body on;

set $resp_body "";
body_filter_by_lua_file conf/mywork/log_upstream_resp.lua;

location ^~/geo/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host mytest.com;
rewrite ^/geo/(.*)$ /json/$1 break;
proxy_pass http://geo;
access_log /usr/local/src/nginx/logs/geo.log bodylog;
}
}
}

重新reload后进行测试,效果与直接配置一致,但nginx.conf看起来更加简洁,便于维护,如果有多个server的话,也可以进行拆分,通过include指令进行扩展。

error log

此前在业务上解决问题查找资料时,发现error_log记录日志和上述access_log的表现并不一样。在Github上找到了相同的issue,通过这个issue复现下,先来看一下配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
upstream mytest {
server 192.168.0.113;
}

server {
listen 8103;
error_log /usr/local/src/nginx/logs/8103_error.log debug;
location / {
body_filter_by_lua_block {
ngx.log(ngx.INFO,"ngx.arg[1]===========", ngx.arg[1])
ngx.log(ngx.INFO,"ngx.arg[2]===========", ngx.arg[2])
}
proxy_pass http://mytest;
}
}

mytest上游服务返回一串大的JSON数据,比如我这里使用https://raw.githubusercontent.com/grafana/grafana/main/package.json,大小为12.8KB。当访问8103端口时,期望将数据响应记录在error日志中。

但是当实际访问8103端口时,body_filter_by_lua多次调用,8103_error.log显示ngx.arg[1]只记录了一部分数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ cat 8103_error.log 
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:2: ngx.arg[1]==========={
"author": "Grafana Labs",
"license": "AGPL-3.0-only",
"private": true,
"name": "grafana",
"version": "8.2.0-pre",
"repository": "github:grafana/grafana",
"scripts": {
"api-tests": "jest --notify --watch --config=devenv/e2e-api-tests/jest.js",
"build": "node ./node_modules/webpack/bin/webpack.js --config scripts/webpack/webpack.prod.js",
"dev": "webpack --progress --color --config scripts/webpack/webpack.dev.js",
"e2e": "./e2e/start-and-run-suite",
"e2e:debug": "./e2e/start-and-run-suite debug",
"e2e:dev": "./e2e/start-and-run-suite dev",
"test": "jest --notify --watch",
"test:accessibility": "pa11y-ci --config .pa11yci.conf.js",
"lint": "yarn run lint:ts && yarn run lint:sass",
"lint:ts": "eslint . --ext .js,.tsx,.ts --cache",
"lint:sass": "yarn run sass-lint -c public/sass/.sass-lint.yml 'public/sass/**/*.scss, packages/**/*.scss' -v -i '**/node_modules/**/*.scss'",
"test:ci": "mkdir -p reports/junit && export JEST_JUNIT_OUTPUT_DIR=reports/junit && jest --ci --reporters=default --reporters=jest-junit -w ${TEST_MAX_WORKERS:-100%}",
"lint:fix": "yarn lint --fix",
"packages:build": "lerna run clean && lerna run build --ignore @grafana-plugins/input-datasource",
"packages:docsExtract": "rm -rf ./reports/docs && lerna run docsExtract",
"packages:docsToMarkdown": "api-documenter markdown --input-folder ./reports/docs/ --output-folder ./docs/sources/packages_api/ --hugo",
"packages:prepare": "lerna version --no-push --no-git-tag-version --force-publish --exact",
"packages:publish": "lerna publish from-package --contents dist",
"packages:publishCanary": "lerna publish from-package --contents dist --dist-tag canary --yes",
"packages:publishLatest": "lerna publish from-package --contents dist --yes",
"packages:publishNext": "lerna publish from-package --contents dist --dist-tag next --yes",
"packages:publishDev": "lerna publish fro
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:3: ngx.arg[2]===========false while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:2: ngx.arg[1]=========== while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 [lua] body_filter_by_lua:3: ngx.arg[2]===========true while sending to client, client: 192.168.0.118, server: , request: "GET /package.json HTTP/1.1", upstream: "http://192.168.0.113:80/package.json", host: "192.168.0.118:8103"
2021/09/05 19:21:43 [info] 22342#0: *32 client 192.168.0.118 closed keepalive connection

造成这个问题的原因是Nginx的错误日志消息有硬编码长度限制,在标准Nginx核心中为2KB,在OpenResty的Nginx核心中为4KB。

Nginx src/core/ngx_log.h中宏定义:

1
#define NGX_MAX_ERROR_STR   2048

OpenResty patches/nginx-1.19.9-larger_max_error_str.patch中:

1
2
3
4
-#define NGX_MAX_ERROR_STR   2048
+#ifndef NGX_MAX_ERROR_STR
+#define NGX_MAX_ERROR_STR 4096
+#endif

现在就知道问题所在了。

全缓冲

上面通过body_filter_by_lua可以获取全部响应正文,但agentzh春哥是不建议这么做的:

All the point of an output filter in nginx is to do the data processing in a non-buffered fashion, which is crucial for processing HUGE response body data streams. If you do full buffering yourself in body_filter_by_lua*, then you may blow up the LuaJIT’s GC-managed memory upon HUGE responses, and defeat all the purposes of nginx’s output filter design.

如果业务中必须获取到所有请求的响应数据之后进行处理(比如完整记录或者数据脱敏),那必须缓冲所有的数据块。春哥在OpenResty的Discussion中也给出了答案,我们可以根据业务情况自行修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
local chunk, eof = ngx.arg[1], ngx.arg[2]
local buffered = ngx.ctx.buffered
if not buffered then
buffered = {}
ngx.ctx.buffered = buffered
end

if chunk ~= "" then
buffered[#buffered + 1] = chunk
-- 也有写成这样的:table.insert(buffered, chunk)
ngx.arg[1] = nil
end

if eof then
local whole = table.concat(buffered)
ngx.ctx.buffered = nil

-- whole = todo, eg: replace
-- whole = string.gsub(whole, "hello world", "HIT!")

ngx.arg[1] = whole
ngx.var.resp_body = whole
end

这种处理方式,比上面的更严谨,对变量进行了初始化,判断流是否已经结束等。另外,如果通过body_filter_by_lua*修改响应体,并且修改了响应体的实际大小,需要注意置空Content-Length报头,header_filter_by_lua*会在Nginx发送报头之前调用,不会误导客户端:

1
2
3
header_filter_by_lua '
ngx.header.content_length = nil
';

或者

1
2
-- header_by_lua_file: 
ngx.header.content_length = nil

参考

1、https://serverfault.com/questions/361556/is-it-possible-to-log-the-response-data-in-nginx-access-log
2、http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
3、http://nginx.org/en/docs/ngx_core_module.html#include
4、https://stackoverflow.com/questions/25049667/how-to-generate-a-json-log-from-nginx
5、https://github.com/openresty/lua-nginx-module/issues/1092
6、https://forum.openresty.us/d/4462-7200de849c40c83a32394ae9cb27ceef/2
7、https://zhuanlan.zhihu.com/p/67904411
8、https://github.com/apache/apisix/blob/master/apisix/plugins/grpc-transcode/response.lua
9、https://jkzhao.github.io/2018/05/03/Lua-OpenResty%E4%BF%AE%E6%94%B9response-body/

0%