[comment]: # ({2c9c6c2c-75dc9168})
# 9 在proc.mem和proc.num项目中选择进程的注意事项

[comment]: # ({/2c9c6c2c-75dc9168})

[comment]: # ({db08038a-93778d6f})
#### Processes modifying their commandline

一些程序使用修改它们的命令行作为显示当前活动的方法。 用户可以通过运行
`ps` 和 `top` 命令来查看活动。这些程序的例子包括 *PostgreSQL*,
*Sendmail*, *Zabbix*.

让我们来看一个Linux的例子，假设我们想要监视许多Zabbix代理进程。

`ps` 命令显示的进程如下

    $ ps -fu zabbix
    UID        PID  PPID  C STIME TTY          TIME CMD
    ...
    zabbix    6318     1  0 12:01 ?        00:00:00 sbin/zabbix_agentd -c /home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf
    zabbix    6319  6318  0 12:01 ?        00:00:01 sbin/zabbix_agentd: collector [idle 1 sec]                          
    zabbix    6320  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #1 [waiting for connection]            
    zabbix    6321  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #2 [waiting for connection]            
    zabbix    6322  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #3 [waiting for connection]            
    zabbix    6323  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: active checks #1 [idle 1 sec]                   
    ...

通过名称和用户选择进程来完成任务:

    $ zabbix_get -s localhost -k 'proc.num[zabbix_agentd,zabbix]'
    6

现在让我们将 `zabbix_agentd` 重命名为 `zabbix_agentd_30` 并重新启动它。

`ps` 现在显示为

    $ ps -fu zabbix
    UID        PID  PPID  C STIME TTY          TIME CMD
    ...
    zabbix    6715     1  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30 -c /home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf
    zabbix    6716  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: collector [idle 1 sec]                          
    zabbix    6717  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #1 [waiting for connection]            
    zabbix    6718  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #2 [waiting for connection]            
    zabbix    6719  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #3 [waiting for connection]            
    zabbix    6720  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: active checks #1 [idle 1 sec]                   
    ...

现在根据名称和用户选择进程会产生不正确的结果:

    $ zabbix_get -s localhost -k 'proc.num[zabbix_agentd_30,zabbix]'
    1

为什么将可执行文件重命名为更长的名称会导致完全不同的结果?

Zabbix agent 启动时检查进程名字， `/proc/<pid>/status`
文件是打开的并且检查 `Name` 行。 我们的例子中 `Name` 行如下:

    $ grep Name /proc/{6715,6716,6717,6718,6719,6720}/status
    /proc/6715/status:Name:   zabbix_agentd_3
    /proc/6716/status:Name:   zabbix_agentd_3
    /proc/6717/status:Name:   zabbix_agentd_3
    /proc/6718/status:Name:   zabbix_agentd_3
    /proc/6719/status:Name:   zabbix_agentd_3
    /proc/6720/status:Name:   zabbix_agentd_3

`status` 文件中的进程名会被截断为15个字符。

`ps` 命令会产生相似的结果:

    $ ps -u zabbix
      PID TTY          TIME CMD
    ...
     6715 ?        00:00:00 zabbix_agentd_3
     6716 ?        00:00:01 zabbix_agentd_3
     6717 ?        00:00:00 zabbix_agentd_3
     6718 ?        00:00:00 zabbix_agentd_3
     6719 ?        00:00:00 zabbix_agentd_3
     6720 ?        00:00:00 zabbix_agentd_3
     ...

显然, 跟我们的 `proc.num[]` `name` 参数值 `zabbix_agentd_30`并不一样。
Zabbix agent从`status` 文件中匹配进程名失败后，会转到
`/proc/<pid>/cmdline`文件。

agent如何看待“cmdline”文件，可以通过运行一个命令来说明

    $ for i in 6715 6716 6717 6718 6719 6720; do cat /proc/$i/cmdline | awk '{gsub(/\x0/,"<NUL>"); print};'; done
    sbin/zabbix_agentd_30<NUL>-c<NUL>/home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf<NUL>
    sbin/zabbix_agentd_30: collector [idle 1 sec]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #1 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #2 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #3 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: active checks #1 [idle 1 sec]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...

`/proc/<pid>/cmdline` 文件包含在*C*语言中用于终止字符的隐藏的、
不可显示的空字符 。 这个例子中空字符以 "<NUL>" 形式出现。

Zabbix agent 检查 "cmdline" ，得到 `zabbix_agentd_30`值，
该值匹配我们的`name` 参数值 `zabbix_agentd_30`。 因此, 主进程会被监控项
`proc.num[zabbix_agentd_30,zabbix]`计数。

当检查下一进程时, agent 从`cmdline`文件中得到
`zabbix_agentd_30: collector [idle 1 sec]` ，但不匹配 `name` 参数值
`zabbix_agentd_30`。 所以，只有不改变命令行的主进程被计数， 其他的 agent
进程改变了命令行而被忽略。

这个例子展示了 `name` 参数不能用在 `proc.mem[]` 和 `proc.num[]`
监控项目中来选择进程。

`cmdline` 参数使用恰当的正则表达式会达到一个正确的结果:

    $ zabbix_get -s localhost -k 'proc.num[,zabbix,,zabbix_agentd_30[ :]]'
    6

使用 `proc.mem[]` and `proc.num[]`
监控项监控可以修改命令行的程序时要小心。

在给 `proc.mem[]` 和 `proc.num[]` 监控项使用`name` and `cmdline` 参数前,
你应该使用 `proc.num[]` 监控项和 `ps` 命令测试该参数。

[comment]: # ({/db08038a-93778d6f})

[comment]: # ({603d0b05-0ff11097})
#### Linux 内核线程

[comment]: # ({/603d0b05-0ff11097})

[comment]: # ({10e64d00-3588d8df})
##### `proc.mem[]` 和 `proc.num[]` 监控项中的 `cmdline` 参数不可以使用线程

让我们以内核线程为例:

    $ ps -ef| grep kthreadd
    root         2     0  0 09:33 ?        00:00:00 [kthreadd]

可以用进程“名称”参数选择:

    $ zabbix_get -s localhost -k 'proc.num[kthreadd,root]'
    1

但使用进程`cmdline` 参数就不起作用:

    $ zabbix_get -s localhost -k 'proc.num[,root,,kthreadd]'
    0

原因是Zabbix
agent采用“cmdline”参数中指定的正则表达式，并将其应用于进程的内容
`/proc/<pid>/cmdline`. 对于内核线程的 `/proc/<pid>/cmdline` 文件是空的，
所以, `cmdline` 参数不会匹配到。

[comment]: # ({/10e64d00-3588d8df})

[comment]: # ({e585a6e5-8d1ef0a6})
##### `proc.mem[]` 和`proc.num[]` 监控项中的线程计数

Linux 内核线程通过`proc.num[]` 监控项计数，但是 `proc.mem[]`
监控项并不报告内存。 例如:

    $ ps -ef | grep kthreadd
    root         2     0  0 09:51 ?        00:00:00 [kthreadd]

    $ zabbix_get -s localhost -k 'proc.num[kthreadd]'
    1

    $ zabbix_get -s localhost -k 'proc.mem[kthreadd]'
    ZBX_NOTSUPPORTED: Cannot get amount of "VmSize" memory.

但是如果用户线程和内核线程名字相同会发生什么呢 ? 可能会是这样:

    $ ps -ef | grep kthreadd
    root         2     0  0 09:51 ?        00:00:00 [kthreadd]
    zabbix    9611  6133  0 17:58 pts/1    00:00:00 ./kthreadd

    $ zabbix_get -s localhost -k 'proc.num[kthreadd]'
    2

    $ zabbix_get -s localhost -k 'proc.mem[kthreadd]'
    4157440

`proc.num[]` 计算内核线程和用户进程。 `proc.mem[]`
只计算用户进程内存，如果为0计算内核线程内存。这和上面报告
ZBX\_NOTSUPPORTED 的例子不同。

如果程序名恰好匹配其中一个线程，请小心使用`proc.mem[]` 和`proc.num[]`
监控项 。

在给 `proc.mem[]` 和`proc.num[]` 监控项配置参数时, 你应该使用
`proc.num[]` 监控项 和 `ps` 命令测试该参数。

### 9 Notes on selecting processes in proc.mem and proc.num items

#### Processes modifying their commandline

Some programs use modifying their commandline as a method for displaying
their current activity. A user can see the activity by running `ps` and
`top` commands. Examples of such programs include *PostgreSQL*,
*Sendmail*, *Zabbix*.

Let's see an example from Linux. Let's assume we want to monitor a
number of Zabbix agent processes.

`ps` command shows processes of interest as

    $ ps -fu zabbix
    UID        PID  PPID  C STIME TTY          TIME CMD
    ...
    zabbix    6318     1  0 12:01 ?        00:00:00 sbin/zabbix_agentd -c /home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf
    zabbix    6319  6318  0 12:01 ?        00:00:01 sbin/zabbix_agentd: collector [idle 1 sec]                          
    zabbix    6320  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #1 [waiting for connection]            
    zabbix    6321  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #2 [waiting for connection]            
    zabbix    6322  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: listener #3 [waiting for connection]            
    zabbix    6323  6318  0 12:01 ?        00:00:00 sbin/zabbix_agentd: active checks #1 [idle 1 sec]                   
    ...

Selecting processes by name and user does the job:

    $ zabbix_get -s localhost -k 'proc.num[zabbix_agentd,zabbix]'
    6

Now let's rename `zabbix_agentd` executable to `zabbix_agentd_30` and
restart it.

`ps` now shows

    $ ps -fu zabbix
    UID        PID  PPID  C STIME TTY          TIME CMD
    ...
    zabbix    6715     1  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30 -c /home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf
    zabbix    6716  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: collector [idle 1 sec]                          
    zabbix    6717  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #1 [waiting for connection]            
    zabbix    6718  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #2 [waiting for connection]            
    zabbix    6719  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: listener #3 [waiting for connection]            
    zabbix    6720  6715  0 12:53 ?        00:00:00 sbin/zabbix_agentd_30: active checks #1 [idle 1 sec]                   
    ...

Now selecting processes by name and user produces an incorrect result:

    $ zabbix_get -s localhost -k 'proc.num[zabbix_agentd_30,zabbix]'
    1

Why a simple renaming of executable to a longer name lead to quite
different result ?

Zabbix agent starts with checking the process name. `/proc/<pid>/status`
file is opened and the line `Name` is checked. In our case the `Name`
lines are:

    $ grep Name /proc/{6715,6716,6717,6718,6719,6720}/status
    /proc/6715/status:Name:   zabbix_agentd_3
    /proc/6716/status:Name:   zabbix_agentd_3
    /proc/6717/status:Name:   zabbix_agentd_3
    /proc/6718/status:Name:   zabbix_agentd_3
    /proc/6719/status:Name:   zabbix_agentd_3
    /proc/6720/status:Name:   zabbix_agentd_3

The process name in `status` file is truncated to 15 characters.

A similar result can be seen with `ps` command:

    $ ps -u zabbix
      PID TTY          TIME CMD
    ...
     6715 ?        00:00:00 zabbix_agentd_3
     6716 ?        00:00:01 zabbix_agentd_3
     6717 ?        00:00:00 zabbix_agentd_3
     6718 ?        00:00:00 zabbix_agentd_3
     6719 ?        00:00:00 zabbix_agentd_3
     6720 ?        00:00:00 zabbix_agentd_3
     ...

Obviously, that is not equal to our `proc.num[]` `name` parameter value
`zabbix_agentd_30`. Having failed to match the process name from
`status` file the Zabbix agent turns to `/proc/<pid>/cmdline` file.

How the agent sees the "cmdline" file can be illustrated with running a
command

    $ for i in 6715 6716 6717 6718 6719 6720; do cat /proc/$i/cmdline | awk '{gsub(/\x0/,"<NUL>"); print};'; done
    sbin/zabbix_agentd_30<NUL>-c<NUL>/home/zabbix/ZBXNEXT-1078/zabbix_agentd.conf<NUL>
    sbin/zabbix_agentd_30: collector [idle 1 sec]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #1 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #2 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: listener #3 [waiting for connection]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...
    sbin/zabbix_agentd_30: active checks #1 [idle 1 sec]<NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL><NUL>...

`/proc/<pid>/cmdline` files in our case contain invisible, non-printable
null bytes, used to terminate strings in *C* language. The null bytes
are shown as "<NUL>" in this example.

Zabbix agent checks "cmdline" for the main process and takes a
`zabbix_agentd_30`, which matches our `name` parameter value
`zabbix_agentd_30`. So, the main process is counted by item
`proc.num[zabbix_agentd_30,zabbix]`.

When checking the next process, the agent takes
`zabbix_agentd_30: collector [idle 1 sec]` from the `cmdline` file and
it does not meet our `name` parameter `zabbix_agentd_30`. So, only the
main process which does not modify its commandline, gets counted. Other
agent processes modify their command line and are ignored.

This example shows that the `name` parameter cannot be used in
`proc.mem[]` and `proc.num[]` for selecting processes in this case.

Using `cmdline` parameter with a proper regular expression produces a
correct result:

    $ zabbix_get -s localhost -k 'proc.num[,zabbix,,zabbix_agentd_30[ :]]'
    6

Be careful when using `proc.mem[]` and `proc.num[]` items for monitoring
programs which modify their commandlines.

Before putting `name` and `cmdline` parameters into `proc.mem[]` and
`proc.num[]` items, you may want to test the parameters using
`proc.num[]` item and `ps` command.

#### Linux kernel threads

##### Threads cannot be selected with `cmdline` parameter in `proc.mem[]` and `proc.num[]` items

Let's take as an example one of kernel threads:

    $ ps -ef| grep kthreadd
    root         2     0  0 09:33 ?        00:00:00 [kthreadd]

It can be selected with process `name` parameter:

    $ zabbix_get -s localhost -k 'proc.num[kthreadd,root]'
    1

But selection by process `cmdline` parameter does not work:

    $ zabbix_get -s localhost -k 'proc.num[,root,,kthreadd]'
    0

The reason is that Zabbix agent takes the regular expression specified
in `cmdline` parameter and applies it to contents of process
`/proc/<pid>/cmdline`. For kernel threads their `/proc/<pid>/cmdline`
files are empty. So, `cmdline` parameter never matches.

##### Counting of threads in `proc.mem[]` and `proc.num[]` items

Linux kernel threads are counted by `proc.num[]` item but do not report
memory in `proc.mem[]` item. For example:

    $ ps -ef | grep kthreadd
    root         2     0  0 09:51 ?        00:00:00 [kthreadd]

    $ zabbix_get -s localhost -k 'proc.num[kthreadd]'
    1

    $ zabbix_get -s localhost -k 'proc.mem[kthreadd]'
    ZBX_NOTSUPPORTED: Cannot get amount of "VmSize" memory.

But what happens if there is a user process with the same name as a
kernel thread ? Then it could look like this:

    $ ps -ef | grep kthreadd
    root         2     0  0 09:51 ?        00:00:00 [kthreadd]
    zabbix    9611  6133  0 17:58 pts/1    00:00:00 ./kthreadd

    $ zabbix_get -s localhost -k 'proc.num[kthreadd]'
    2

    $ zabbix_get -s localhost -k 'proc.mem[kthreadd]'
    4157440

`proc.num[]` counted both the kernel thread and the user process.
`proc.mem[]` reports memory for the user process only and counts the
kernel thread memory as if it was 0. This is different from the case
above when ZBX\_NOTSUPPORTED was reported.

Be careful when using `proc.mem[]` and `proc.num[]` items if the program
name happens to match one of the thread.

Before putting parameters into `proc.mem[]` and `proc.num[]` items, you
may want to test the parameters using `proc.num[]` item and `ps`
command.

[comment]: # ({/e585a6e5-8d1ef0a6})
