NetworkManager管理DNS配置

遇到一个场景,装完操作系统通过 GUI 进行网络配置、安装业务平台并直接修改 /etc/resolv.conf 文件,改变了DNS地址配置。过了几天后发现配置被修改,影响到了业务平台,但排查以后确认无人操作过 /etc/resolv.conf 文件。最后追查到 NetworkManager 被重启过,每次 NetworkManager 重启都会出现DNS恢复到系统安装时初始网络配置写的 DNS。NetworkManager 的处理流程是什么,跟着日志和源代码一起看看吧☺️

NetworkManager日志

这里复现问题的环境是 Rocky 9.6,网络的初始配置如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@localhost misaka]# nmcli device show ens33
GENERAL.DEVICE: ens33
GENERAL.TYPE: ethernet
GENERAL.HWADDR:
GENERAL.MTU: 1500
GENERAL.STATE: 100 (connected)
GENERAL.CONNECTION: ens33
GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/2
WIRED-PROPERTIES.CARRIER: on
IP4.ADDRESS[1]: 192.168.0.239/24
IP4.GATEWAY: 192.168.0.1
IP4.ROUTE[1]: dst = 192.168.0.0/24, nh = 0.0.0.0, mt = 100
IP4.ROUTE[2]: dst = 0.0.0.0/0, nh = 192.168.0.1, mt = 100
IP4.DNS[1]: 8.8.8.8
IP6.ADDRESS[1]: fe80::20c:29ff:feec:d07a/64
IP6.GATEWAY: --
IP6.ROUTE[1]: dst = fe80::/64, nh = ::, mt = 1024

RHEL发行版的日志位置通常是 /var/log/message,起初是想通过现有日志看看是否能找到有用信息。这里打开两个窗口,分别执行

1
2
3
tail -f  /var/log/messages | grep NetworkManager

systemctl restart NetworkManager

可以看到有如下输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Jul 12 23:33:45 localhost systemd[1]: NetworkManager-wait-online.service: Deactivated successfully.
Jul 12 23:33:45 localhost NetworkManager[171123]: <info> [1752334425.0882] caught SIGTERM, shutting down normally.
Jul 12 23:33:45 localhost NetworkManager[171123]: <info> [1752334425.0891] manager: NetworkManager state is now CONNECTED_SITE
Jul 12 23:33:45 localhost NetworkManager[171123]: <info> [1752334425.0935] exiting (success)
Jul 12 23:33:45 localhost systemd[1]: NetworkManager.service: Deactivated successfully.
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1471] NetworkManager (version 1.52.0-4.el9_6) is starting... (after a restart, boot:682b7f4f-9040-4361-9bbe-3dd582d2db4a)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1472] Read config: /etc/NetworkManager/NetworkManager.conf, /usr/lib/NetworkManager/conf.d/{00-server.conf,99-nvme-nbft-no-ignore-carrier.conf}, /run/NetworkManager/conf.d/15-carrier-timeout.conf
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1508] manager[0x55d4e0a3a050]: monitoring kernel firmware directory '/lib/firmware'.
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1553] hostname: hostname: using hostnamed
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1556] dns-mgr: init: dns=default,systemd-resolved rc-manager=symlink (auto)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1557] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1560] manager[0x55d4e0a3a050]: rfkill: Wi-Fi hardware radio set enabled
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1561] manager[0x55d4e0a3a050]: rfkill: WWAN hardware radio set enabled
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1576] Loaded device plugin: NMAtmManager (/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-device-plugin-adsl.so)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1578] Loaded device plugin: NMWifiFactory (/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-device-plugin-wifi.so)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1586] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-device-plugin-team.so)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1599] Loaded device plugin: NMBluezManager (/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-device-plugin-bluetooth.so)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1601] Loaded device plugin: NMWwanFactory (/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-device-plugin-wwan.so)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1603] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1604] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1605] manager: Networking is enabled by state file
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1607] settings: Loaded settings plugin: keyfile (internal)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1610] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.52.0-4.el9_6/libnm-settings-plugin-ifcfg-rh.so")
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1626] dhcp: init: Using DHCP client 'internal'
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1629] manager: (lo): new Loopback device (/org/freedesktop/NetworkManager/Devices/1)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1633] device (lo): state change: unmanaged -> unavailable (reason 'connection-assumed', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1640] device (lo): state change: unavailable -> disconnected (reason 'connection-assumed', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1647] device (lo): Activation: starting connection 'lo' (ba530e38-0cc8-44a7-9dea-c1126d22d767)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1651] device (ens33): carrier: link connected
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1655] manager: (ens33): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1659] manager: (ens33): assume: will attempt to assume matching connection 'ens33' (2752299c-a8b2-362e-af75-d3b722cce23b) (indicated)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1659] device (ens33): state change: unmanaged -> unavailable (reason 'connection-assumed', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1664] device (ens33): state change: unavailable -> disconnected (reason 'connection-assumed', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1669] device (ens33): Activation: starting connection 'ens33' (2752299c-a8b2-362e-af75-d3b722cce23b)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1677] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1687] device (lo): state change: disconnected -> prepare (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1690] device (lo): state change: prepare -> config (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1693] device (lo): state change: config -> ip-config (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1695] device (ens33): state change: disconnected -> prepare (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1697] device (ens33): state change: prepare -> config (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1771] modem-manager: ModemManager available
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1775] device (ens33): state change: config -> ip-config (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1779] device (lo): state change: ip-config -> ip-check (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1787] policy: set 'ens33' (ens33) as default for IPv4 routing and DNS
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1835] device (lo): state change: ip-check -> secondaries (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1836] device (lo): state change: secondaries -> activated (reason 'none', managed-type: 'external')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1839] device (lo): Activation: successful, device activated.
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1857] device (ens33): state change: ip-config -> ip-check (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1891] device (ens33): state change: ip-check -> secondaries (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1894] device (ens33): state change: secondaries -> activated (reason 'none', managed-type: 'assume')
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1898] manager: NetworkManager state is now CONNECTED_SITE
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1901] device (ens33): Activation: successful, device activated.
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1907] manager: NetworkManager state is now CONNECTED_GLOBAL
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1909] manager: startup complete
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.2826] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.6020] agent-manager: agent[f8948c5639bb9824,:1.25/org.gnome.Shell.NetworkAgent/42]: agent registered
Jul 12 23:33:55 localhost systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jul 12 23:34:15 localhost NetworkManager[178235]: <info> [1752334455.0274] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 12 23:34:25 localhost systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jul 12 23:34:45 localhost NetworkManager[178235]: <info> [1752334485.0224] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 12 23:34:55 localhost systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jul 12 23:38:45 localhost NetworkManager[178235]: <info> [1752334725.0229] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 12 23:38:55 localhost systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

将此段日志,让豆包分析,豆包反馈日志记录了 NetworkManager 从”旧进程关闭”到”新进程启动并完成初始化””的过程,关键节点包括:

  • 旧进程正常关闭:收到 SIGTERM 信号后,旧 NetworkManager 进程(PID 171123)正常退出,状态为 success。
  • 新进程启动:新进程(PID 178235)以版本 1.52.0-4.el9_6 启动,读取配置文件并初始化核心模块。
  • 设备识别与激活:成功识别回环设备(lo)和以太网设备(ens33),并完成激活,最终网络状态达到 CONNECTED_GLOBAL(全局连接)。
  • 服务初始化完成:启动过程无错误,所有核心功能(DNS 管理、DHCP 客户端、设备插件等)正常加载。

其中能说明有DNS相关的操作如下:

1
2
Jul 12 23:33:45 localhost NetworkManager[178235]: <info>  [1752334425.1556] dns-mgr: init: dns=default,systemd-resolved rc-manager=symlink (auto)
Jul 12 23:33:45 localhost NetworkManager[178235]: <info> [1752334425.1787] policy: set 'ens33' (ens33) as default for IPv4 routing and DNS

当 systemd-resolved 服务不存在时,dns=default 会使 NetworkManager 直接管理 DNS,生成 /etc/resolv.conf,并通过 rc-manager=symlink 创建符号链接。

NetworkManager源码

到这里其实只能看出 NetworkManager 确实操作了/etc/resolv.conf的变更,但具体如何执行还是无法清晰获取,接下来转换思路。

由 NetworkManager 管理的 DNS 配置会在 /etc/resolv.conf 中包含以下注释:

1
2
# Generated by NetworkManager
nameserver 8.8.8.8

直接使用 Generated by NetworkManager 字符串检索代码,作为入口,来看下代码中的处理流程。

在 NetworkManager 源代码 src/core/dns/nm-dns-manager.c 中存在函数 create_resolv_conf,其中包含了相应的字符串,且生成 nameserver 配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
static char *
create_resolv_conf(const char *const *searches,
const char *const *nameservers,
const char *const *options)
{
GString *str;
gsize i;

str = g_string_new_len(NULL, 245);

g_string_append(str, "# Generated by NetworkManager\n");

if (searches && searches[0]) {
gsize search_base_idx;

g_string_append(str, "search");
search_base_idx = str->len;

for (i = 0; searches[i]; i++) {
const char *s = searches[i];
gsize l = strlen(s);

if (l == 0 || NM_STRCHAR_ANY(s, ch, NM_IN_SET(ch, ' ', '\t', '\n'))) {
/* there should be no such characters in the search entry. Also,
* because glibc parser would treat them as line/word separator.
*
* Skip the value silently. */
continue;
}

if (search_base_idx > 0) {
if (str->len - search_base_idx + 1 + l > 254) {
/* this entry crosses the 256 character boundary. Older glibc versions
* would truncate the entry at this point.
*
* Fill the line with spaces to cross the 256 char boundary and continue
* afterwards. This way, the truncation happens between two search entries. */
while (str->len - search_base_idx < 257)
g_string_append_c(str, ' ');
search_base_idx = 0;
}
}

g_string_append_c(str, ' ');
g_string_append_len(str, s, l);
}
g_string_append_c(str, '\n');
}

if (nameservers && nameservers[0]) {
for (i = 0; nameservers[i]; i++) {
if (i == 3) {
g_string_append(
str,
"# NOTE: the libc resolver may not support more than 3 nameservers.\n");
g_string_append(str, "# The nameservers listed below may not be recognized.\n");
}
g_string_append(str, "nameserver ");
g_string_append(str, nameservers[i]);
g_string_append_c(str, '\n');
}
}

if (options && options[0]) {
g_string_append(str, "options");
for (i = 0; options[i]; i++) {
g_string_append_c(str, ' ');
g_string_append(str, options[i]);
}
g_string_append_c(str, '\n');
}

return g_string_free(str, FALSE);
}

在IDE中继续搜索该函数存在多次调用,分别为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
char *
nmtst_dns_create_resolv_conf(const char *const *searches,
const char *const *nameservers,
const char *const *options)
{
return create_resolv_conf(searches, nameservers, options);
}

static gboolean
write_resolv_conf(FILE *f,
const char *const *searches,
const char *const *nameservers,
const char *const *options,
GError **error)
{
gs_free char *content = NULL;

content = create_resolv_conf(searches, nameservers, options);
return write_resolv_conf_contents(f, content, error);
}

static void
update_resolv_conf_no_stub(NMDnsManager *self,
const char *const *searches,
const char *const *nameservers,
const char *const *options)
{
gs_free char *content = NULL;
GError *local = NULL;

content = create_resolv_conf(searches, nameservers, options);

if (!g_file_set_contents(NO_STUB_RESOLV_CONF, content, -1, &local)) {
_LOGD("update-resolv-no-stub: failure to write file: %s", local->message);
g_error_free(local);
return;
}

_LOGT("update-resolv-no-stub: '%s' successfully written", NO_STUB_RESOLV_CONF);
}

static SpawnResult
update_resolv_conf(NMDnsManager *self,
const char *const *searches,
const char *const *nameservers,
const char *const *options,
GError **error,
NMDnsManagerResolvConfManager rc_manager)
{
FILE *f;
gboolean success;
gs_free char *content = NULL;
SpawnResult write_file_result = SR_SUCCESS;
int errsv;
gboolean resconf_link_cached = FALSE;
gs_free char *resconf_link = NULL;

content = create_resolv_conf(searches, nameservers, options);

if (rc_manager == NM_DNS_MANAGER_RESOLV_CONF_MAN_FILE
|| (rc_manager == NM_DNS_MANAGER_RESOLV_CONF_MAN_SYMLINK
&& !_read_link_cached(_PATH_RESCONF, &resconf_link_cached, &resconf_link))) {
gs_free char *rc_path_syml = NULL;
nm_auto_free char *rc_path_real = NULL;
const char *rc_path = _PATH_RESCONF;
GError *local = NULL;

if (rc_manager == NM_DNS_MANAGER_RESOLV_CONF_MAN_FILE) {
rc_path_real = realpath(_PATH_RESCONF, NULL);
if (rc_path_real)
rc_path = rc_path_real;
else {
/* realpath did not resolve a path-name. That either means,
* _PATH_RESCONF:
* - does not exist
* - is a plain file
* - is a dangling symlink
*
* Handle the case, where it is a dangling symlink... */
rc_path_syml = nm_utils_read_link_absolute(_PATH_RESCONF, NULL);
if (rc_path_syml)
rc_path = rc_path_syml;
}
}

/* we first write to /etc/resolv.conf directly. If that fails,
* we still continue to write to runstatedir but remember the
* error. */
if (!g_file_set_contents(rc_path, content, -1, &local)) {
_LOGT("update-resolv-conf: write to %s failed (rc-manager=%s, %s)",
rc_path,
_rc_manager_to_string(rc_manager),
local->message);
g_propagate_error(error, local);
/* clear @error, so that we don't try reset it. This is the error
* we want to propagate to the caller. */
error = NULL;
write_file_result = SR_ERROR;
} else {
_LOGT("update-resolv-conf: write to %s succeeded (rc-manager=%s)",
rc_path,
_rc_manager_to_string(rc_manager));
}
}

if ((f = fopen(MY_RESOLV_CONF_TMP, "we")) == NULL) {
errsv = errno;
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not open %s: %s",
MY_RESOLV_CONF_TMP,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: open temporary file %s failed (%s)",
MY_RESOLV_CONF_TMP,
nm_strerror_native(errsv));
return SR_ERROR;
}

success = write_resolv_conf_contents(f, content, error);
if (!success) {
errsv = errno;
_LOGT("update-resolv-conf: write temporary file %s failed (%s)",
MY_RESOLV_CONF_TMP,
nm_strerror_native(errsv));
}

if (fclose(f) < 0) {
if (success) {
errsv = errno;
/* only set an error here if write_resolv_conf() was successful,
* since its error is more important.
*/
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not close %s: %s",
MY_RESOLV_CONF_TMP,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: close temporary file %s failed (%s)",
MY_RESOLV_CONF_TMP,
nm_strerror_native(errsv));
}
return SR_ERROR;
} else if (!success)
return SR_ERROR;

if (rename(MY_RESOLV_CONF_TMP, MY_RESOLV_CONF) < 0) {
errsv = errno;
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not replace %s: %s",
MY_RESOLV_CONF,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: failed to rename temporary file %s to %s (%s)",
MY_RESOLV_CONF_TMP,
MY_RESOLV_CONF,
nm_strerror_native(errsv));
return SR_ERROR;
}

if (rc_manager == NM_DNS_MANAGER_RESOLV_CONF_MAN_FILE) {
_LOGT("update-resolv-conf: write internal file %s succeeded (rc-manager=%s)",
MY_RESOLV_CONF,
_rc_manager_to_string(rc_manager));
return write_file_result;
}

if (rc_manager != NM_DNS_MANAGER_RESOLV_CONF_MAN_SYMLINK
|| !_read_link_cached(_PATH_RESCONF, &resconf_link_cached, &resconf_link)) {
_LOGT("update-resolv-conf: write internal file %s succeeded", MY_RESOLV_CONF);
return write_file_result;
}

if (!nm_streq0(_read_link_cached(_PATH_RESCONF, &resconf_link_cached, &resconf_link),
MY_RESOLV_CONF)) {
_LOGT("update-resolv-conf: write internal file %s succeeded (don't touch symlink %s "
"linking to %s)",
MY_RESOLV_CONF,
_PATH_RESCONF,
_read_link_cached(_PATH_RESCONF, &resconf_link_cached, &resconf_link));
return write_file_result;
}

/* By this point, /etc/resolv.conf exists and is a symlink to our internal
* resolv.conf. We update the symlink so that applications get an inotify
* notification.
*/
if (unlink(RESOLV_CONF_TMP) != 0 && ((errsv = errno) != ENOENT)) {
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not unlink %s: %s",
RESOLV_CONF_TMP,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: write internal file %s succeeded "
"but cannot delete temporary file %s: %s",
MY_RESOLV_CONF,
RESOLV_CONF_TMP,
nm_strerror_native(errsv));
return SR_ERROR;
}

if (symlink(MY_RESOLV_CONF, RESOLV_CONF_TMP) == -1) {
errsv = errno;
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not create symlink %s pointing to %s: %s",
RESOLV_CONF_TMP,
MY_RESOLV_CONF,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: write internal file %s succeeded "
"but failed to symlink %s: %s",
MY_RESOLV_CONF,
RESOLV_CONF_TMP,
nm_strerror_native(errsv));
return SR_ERROR;
}

if (rename(RESOLV_CONF_TMP, _PATH_RESCONF) == -1) {
errsv = errno;
g_set_error(error,
NM_MANAGER_ERROR,
NM_MANAGER_ERROR_FAILED,
"Could not rename %s to %s: %s",
RESOLV_CONF_TMP,
_PATH_RESCONF,
nm_strerror_native(errsv));
_LOGT("update-resolv-conf: write internal file %s succeeded "
"but failed to rename temporary symlink %s to %s: %s",
MY_RESOLV_CONF,
RESOLV_CONF_TMP,
_PATH_RESCONF,
nm_strerror_native(errsv));
return SR_ERROR;
}

_LOGT("update-resolv-conf: write internal file %s succeeded and update symlink %s",
MY_RESOLV_CONF,
_PATH_RESCONF);
return write_file_result;
}

因为存在多个函数调用,还是无法确定具体调用流程。但在上述源码中可以看到包含 _LOGT 的日志函数调用,是否可以通过日志配置来观测 NetworkManager 的详细行为呢?

跟随IDE可以在 nm-logging-fwd.h 中可以看到一些关于日志级别的定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
static inline int
nm_log_level_to_syslog(NMLogLevel nm_level)
{
switch (nm_level) {
case LOGL_ERR:
return 3; /* LOG_ERR */
case LOGL_WARN:
return 4; /* LOG_WARN */
case LOGL_INFO:
return 5; /* LOG_NOTICE */
case LOGL_DEBUG:
return 6; /* LOG_INFO */
case LOGL_TRACE:
return 7; /* LOG_DEBUG */
default:
return 0; /* LOG_EMERG */
}
}

#define _LOGL_TRACE LOGL_TRACE
#define _LOGL_DEBUG LOGL_DEBUG
#define _LOGL_INFO LOGL_INFO
#define _LOGL_WARN LOGL_WARN
#define _LOGL_ERR LOGL_ERR

/* This is the default definition of _NMLOG_ENABLED(). Special implementations
* might want to undef this and redefine it. */
#define _NMLOG_ENABLED(level) (nm_logging_enabled((level), (_NMLOG_DOMAIN)))

#define _LOGT(...) _NMLOG(_LOGL_TRACE, __VA_ARGS__)
#define _LOGD(...) _NMLOG(_LOGL_DEBUG, __VA_ARGS__)
#define _LOGI(...) _NMLOG(_LOGL_INFO, __VA_ARGS__)
#define _LOGW(...) _NMLOG(_LOGL_WARN, __VA_ARGS__)
#define _LOGE(...) _NMLOG(_LOGL_ERR, __VA_ARGS__)

可以看到最详细的日志是 TRACE 级别。那么要观测 NetworkManager 详细输出,就可以在 /etc/NetworkManager/conf.d/99-logging.conf 中添加如下内容,该文件默认不存在直接创建即可,重启 NetworkManager 时会被加载。

1
2
3
[logging]
level=DEBUG # 可选级别:ERR, WARN, INFO, DEBUG
domains=ALL # 记录所有模块的日志,或指定特定模块(如 "DHCP4,DHCP6")

打开两个shell会话,并分别执行以下命令,就可以通过 journalctl 看到详细的日志输出,将该日志截取出来,分析执行步骤即可:

1
2
3
journalctl -fu NetworkManager

systemctl restart NetworkManager

因为 TRACE 日志量非常大,这里就不全放出了。通过检索/etc/resolv.conf关键字,在 TRACE 日志中找到了如下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <debug> [1752340261.9116] dns-mgr: (device_l3cd_changed): queueing DNS updates (1)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <info> [1752340261.9116] policy: set 'ens33' (ens33) as default for IPv4 routing and DNS
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <debug> [1752340261.9116] manager: PrimaryConnection now ens33
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: set-hostname: updating hostname (ip conf)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: get-hostname: "localhost" (from dbus)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: device hostname info:
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: - prio: 100 ipv4 (def) dhcp dns dev:ens33
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: - prio: 100 ipv6 dhcp dns dev:ens33
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9117] policy: get-hostname: "localhost" (from dbus)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <info> [1752340261.9117] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <debug> [1752340261.9118] dns-mgr: (device_l3cd_changed): DNS configuration changed
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <debug> [1752340261.9118] dns-mgr: (device_l3cd_changed): committing DNS changes (0)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <debug> [1752340261.9118] dns-mgr: update-dns: updating resolv.conf
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9118] dns-mgr: config: 100 best v4 2 : 8.8.8.8
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9118] dns-mgr: config: 100 default v6 2 :
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9118] dns-mgr: plugin: add domain <auto-default> (i=2, p=100)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9118] dns-mgr: plugin: settings: ifindex=2, priority=100, default-route=1, search=, reverse=0.168.192.in-addr.arpa
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9119] dns-mgr: update-resolv-no-stub: '/run/NetworkManager/no-stub-resolv.conf' successfully written
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9148] dns-mgr: update-resolv-conf: write to /etc/resolv.conf succeeded (rc-manager=symlink)
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9150] dns-mgr: update-resolv-conf: write internal file /run/NetworkManager/resolv.conf succeeded
Jul 13 01:11:01 localhost.localdomain NetworkManager[275984]: <trace> [1752340261.9151] dns-mgr: current configuration: [{'nameservers': <['8.8.8.8']>, 'interface': <'ens33'>, 'priority': <100>, 'vpn': <false>}]

可以看出,正是由 update_resolv_conf 函数产生的写操作。那么通过日志,并结合源代码,利用IDE可以追溯出具体的函数调用链为:

nm_policy_class_init() -> constructed() -> device_added() -> devices_list_register() -> device_l3cd_changed() -> nm_dns_manager_set_ip_config() -> update_dns() -> update_resolv_conf_no_stub()、update_resolv_conf() -> create_resolv_conf() -> write_resolv_conf_contents()

其中 nm_policy_class_init() 是 GObject 框架下的类初始化函数,其调用机制遵循 GObject 的类注册流程。具体来说,这个函数会在 NMPolicy 类型注册时被自动调用,而非通过显式函数调用。跟随IDE可以发现在nm-policy.c文件中存在如下一行

1
G_DEFINE_TYPE(NMPolicy, nm_policy, G_TYPE_OBJECT)

这个宏会展开为类型注册代码,最终触发 nm_policy_class_init() 的调用。

NetworkManager 由 meson 进行构建,在 meason.build 中包含了 subdir(‘src’),而 src/meson.build 中又包含了 subdir(‘core’),在执行 meson build 后在 build 目录中生成的 build.ninja 文件中包含了构建过程。

1
2
3
4
build src/core/libNetworkManager.a.p/nm-policy.c.o: c_COMPILER ../src/core/nm-policy.c || src/libnm-core-public/nm-core-enum-types.h
DEPFILE = src/core/libNetworkManager.a.p/nm-policy.c.o.d
DEPFILE_UNQUOTED = src/core/libNetworkManager.a.p/nm-policy.c.o.d
ARGS = -Isrc/core/libNetworkManager.a.p -Isrc/core -I../src/core -Isrc/libnm-core-public -I../src/libnm-core-public -Isrc -I../src -I. -I.. -I/usr/include/gio-unix-2.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/sysprof-4 -I/usr/include/libmount -I/usr/include/blkid -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -Wcast-align=strict -Wdeclaration-after-statement -Wfloat-equal -Wformat-nonliteral -Wformat-security -Wimplicit-function-declaration -Wimplicit-int -Winit-self -Wint-conversion -Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wold-style-definition -Wpointer-arith -Wshadow -Wshift-negative-value -Wstrict-prototypes -Wundef -Wvla -Wno-duplicate-decl-specifier -Wno-format-truncation -Wno-format-y2k -Wno-missing-field-initializers -Wno-pragmas -Wno-sign-compare -Wno-unknown-pragmas -Wno-unused-parameter -fno-strict-aliasing -Wimplicit-fallthrough -fPIC -pthread -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_42 -DGLIB_VERSION_MAX_ALLOWED=GLIB_VERSION_2_42

NetworkManager 管理 DNS 配置和 NetworkManager 的配置有关。有default、systemd-resolved、dnsmasq 三种方式。我个人观察到在 Centos7、Rocky 9 上通常是采用 default 方式进行,而 Ubuntu 20.04 中采用的是 systemd-resolvd。

1
2
# Ubuntu 20.04的日志输出
Jul 13 02:51:24 misaka NetworkManager[815]: <info> [1752346284.4650] dns-mgr[0x55eb10a9f290]: init: dns=systemd-resolved rc-manager=symlink, plugin=systemd-resolved

systemd-resolved 是 systemd 套件的一部分,用于处理 DNS 解析和其他网络名称解析任务。它提供了 DNS 缓存、多重 DNS 解析、DNS-over-TLS 等功能。在使用systemd-resolved的 Ubuntu 系统中,/etc/resolv.conf通常是指向/run/systemd/resolve/stub-resolv.conf的符号链接,所有 DNS 查询会被转发到systemd-resolved的本地代理(127.0.0.53),然后由systemd-resolved根据/run/systemd/resolve/resolv.conf的配置进行实际的 DNS 解析。

到这里就基础梳理出每次 NetworkManager 重启时,对 DNS 配置进行操作的处理流程。

那么解决文章开头 NetworkManager 每次重启都还原成初始网络配置中的 DNS 问题的答案也非常简单:

  • 在/etc/NetworkManager/system-connections/ens33.nmconnection中注释dns配置,然后在/etc/resolv.conf中添加目标nameserver

  • 在/etc/NetworkManager/system-connections/ens33.nmconnection中修改成目标dns配置,如果有多个,要注意写分号呦(例如: dns=8.8.8.8;114.114.114.114;)

NetworkManager编译

在排查问题前有想过可能需要 debug 或者自行加一些日志,但实际项目 TRACE日志记录的非常详细,这一点是值得去学习的。同时为了IDE读取比较顺利,这里就学习了下 NetworkManager 的编译方法。我尝试了 Rocky 9 和 Ubuntu 20.04 上的编译,进行下记录。

Meson构建

NetworkManager 使用 Meson 构建,Meson 是一个开源构建系统,旨在实现极快的速度,更重要的是,尽可能地方便用户使用。目前有很多项目在使用,比如 GNOME、KDE 等。

Meson 官网: https://mesonbuild.com/index.html
Github 地址: https://github.com/mesonbuild/meson

ninja编译

Meson 和 Ninja 通常会配合使用,Meson 负责构建项目依赖关系,Ninja 负责编译代码。Ninja 是一个轻量的构建系统,主要关注构建的速度。Ninja 使用 .ninja 文件定义构建规则,语法简洁,通常由 Meson 等工具自动生成。

Ninja 官网: https://ninja-build.org/
Github 地址:https://github.com/ninja-build/ninja

Rocky 9.6

1、克隆项目

1
2
3
yum install vim tree curl wget tree

git clone -b nm-1-52 https://gitlab.freedesktop.org/NetworkManager/NetworkManager.git

2、打开rocky devel repo,将 enable 改为1,要不有些 devel 包找不到

1
2
3
4
5
6
7
8
9
10
vim /etc/yum.repos.d/rocky-devel.repo

[rocky-devel]
name=Rocky Linux $releasever - Devel
baseurl=https://mirrors.rockylinux.org/$contentdir/$releasever/devel/$basearch/os/
enabled=1 # 改为 1 启用
gpgcheck=1
gpgkey=/etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

yum makecache

3、安装 Meson 和 Ninja

1
2
3
4
5
6
7
8
9
yum install python3-pip

pip3 config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

pip3 install ninja meson

ln /usr/local/bin/meson -s /usr/bin/meson

ln /usr/local/bin/ninja -s /usr/bin/ninja

4、安装 NetworkManager 所需要的依赖。怎么确定这些依赖呢?项目依赖可以在 meson.build 文件中搜索dependency配置项。

1
yum install -y cmake uuid libuuid-devel libudev-devel dbus-devel glib2-devel libndp-devel gobject-introspection-devel libaudit-devel audit-libs-devel polkit-devel gnutls-devel nss-devel nspr-devel ppp-devel ModemManager-glib-devel mobile-broadband-provider-info-devel jansson-devel libpsl-devel libcurl-devel readline-devel libedit-devel newt-devel 

5、执行 Meson 构建和 Ninja 编译

1
2
3
cd NetworkManager && meson build

cd build && ninja

ninja 会显示编译进度,执行的很快:

1
[649/988] Compiling C object src/core/libNetworkManager.a.p/devices_nm-device-ip-tunnel.c.o

Ubuntu 20.04

1、克隆项目

1
2
3
apt install vim tree curl wget tree git

git clone -b nm-1-52 https://gitlab.freedesktop.org/NetworkManager/NetworkManager.git

2、安装 Meson 和 Ninja

1
2
3
4
5
6
7
8
python3 -m pip install --upgrade pip

ln -s /usr/local/bin/pip3 /usr/bin/pip

pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple


pip3 install ninja meson

3、安装 NetworkManager 所需要的依赖,因为 20.04 默认的 Cmake 版本对于 Meson 最新版本来说低一些,所以需要先处理下 CMake PPA 源。

1
2
3
4
5
6
7
8
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ focal main' | sudo tee /etc/apt/sources.list.d/kitware.list >/dev/null
apt-get update
apt-get install kitware-archive-keyring
rm /usr/share/keyrings/kitware-archive-keyring.gpg
apt-get install cmake

apt install -y build-essential libdbus-glib-1-dev libglib2.0-dev libnm-dev libssl-dev libxml2-dev libreadline-dev gettext autogen autoconf automake libtool libevdev-dev libsystemd-dev libglib2.0-dev libjson-glib-dev libunistring-dev check valgrind swig libndp-dev libgirepository1.0-dev gobject-introspection gir1.2-glib-2.0 libaudit-dev libpolkit-gobject-1-dev libgnutls28-dev libnss3-dev libnspr4-dev gnutls-bin ppp-dev libmm-glib-dev dhcpcd5 libjansson-dev libpsl-dev libcurl4-openssl-dev libnewt-dev xsltproc uuid

4、执行 Meson 构建和 Ninja 编译

1
2
3
cd NetworkManager && meson build

cd build && ninja

构建、编译完成

1、在上面执行 meson build 通过后会生成如下内容,证明可以进入到 ninja 阶段。如果有报错提示,就按信息解决即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
Message: 
System paths:
prefix: /usr/local
exec_prefix: /usr/local
systemdunitdir: /usr/local/lib/systemd/system
udev_dir: /usr/local/lib/udev
nmbinary: /usr/local/sbin/NetworkManager
nmconfdir: /usr/local/etc/NetworkManager
nmlibdir: /usr/local/lib/NetworkManager
nmdatadir: /usr/local/share/NetworkManager
nmstatedir: /var/local/lib/NetworkManager
nmrundir: /var/local/run/NetworkManager
nmvpndir: /usr/local/lib/x86_64-linux-gnu/NetworkManager
nmplugindir: /usr/local/lib/x86_64-linux-gnu/NetworkManager/1.52.1
system_ca_path: /etc/ssl/certs
dbus_conf_dir: /usr/local/share/dbus-1/system.d

Platform:
session tracking: systemd-logind,consolekit
suspend/resume: systemd
policykit: true (default: true) (restrictive modify.system)
polkit-agent-helper-1: /usr/lib/policykit-1/polkit-agent-helper-1
selinux: true
systemd-journald: true (default: logging.backend=journal)
hostname persist: default
libaudit: true (default: logging.audit=true)

Features:
wext: true
wifi: true
iwd: false
pppd: true /usr/sbin/pppd plugins:/usr/local/lib/x86_64-linux-gnu/pppd/2.4.9
jansson: yes (soname: libjansson.so.4)
iptables: "/usr/sbin/iptables"
ip6tables: "/usr/sbin/ip6tables"
nft: "/usr/sbin/nft"
modprobe: "/usr/sbin/modprobe"
modemmanager-1: true
mobile-broadband-provider-info-database: /usr/share/mobile-broadband-provider-info/serviceproviders.xml
ofono: false
concheck: true
libteamdctl: false
ovs: true
nmcli: true
nmtui: true
nm-cloud-setup: true

Configuration_plugins (main.plugins=)
ifcfg-rh: false (deprecated)
default value of main.migrate-ifcfg-rh: false
ifupdown: true

Handlers for /etc/resolv.conf:
resolvconf: true /usr/sbin/resolvconf
netconfig: false

config-dns-rc-manager-default: auto

DHCP clients (default internal):
dhcpcd: true /usr/sbin/dhcpcd
dhclient: false (deprecated)


Miscellaneous:
have introspection: true
build documentation and manpages: false
firewalld zone for shared mode: true
tests: yes
more-asserts: 0
more-logging: true
warning-level: 2
valgrind: false
code coverage: false
LTO: false
Linker garbage collection: true
crypto: nss (have-gnutls: true, have-nss: true)
sanitizers: none
Mozilla Public Suffix List: true
vapi: false
ebpf: false
readline: libreadline

Build targets in project: 369

Found ninja-1.11.1.git.kitware.jobserver-1 at /usr/local/bin/ninja
WARNING: Running the setup command as `meson [options]` instead of `meson setup [options]` is ambiguous and deprecated.

2、ninja 执行完进度条会消失掉,直接进去看文件即可。编译 NetworkManager 完成后的目录及大小

1
2
3
4
5
6
7
8
9
10
11
12
13
14
du -sh ./* | sort -h -r
1.2G ./src
18M ./introspection
8.9M ./po
2.9M ./meson-private
1.8M ./build.ninja
1.2M ./meson-info
1012K ./compile_commands.json
664K ./examples
200K ./data
124K ./meson-logs
8.0K ./meson-uninstalled
8.0K ./config.h
4.0K ./config-extra.h

3、NetworkManager 编译后的产物会分散在 src 下的各个子目录中,如果有安装需要,可以执行 ninja install 命令。

到这里学习就结束啦☺️

参考链接

1、https://apt.kitware.com/
2、https://people.freedesktop.org/~lkundrak/nm-docs/NetworkManager.conf.html
3、https://www.alibabacloud.com/help/zh/alinux/user-guide/networkmanager-configuration-files-and-common-configurations