Fluentd 过滤插件：grep 用法详解 / 四六文摘

“ filter_grep 是一个常用的过滤日志内容的插件。”

熟悉或者使用过 Linux 系统的小伙伴应该知道，Linux 中有三个处理文本内容的利器：grep、awk 和 sed。这其中，grep 算是最常用的文本查找命令了。而正则表达式也是每个软件开发人员工作中不可避免会用到的文本处理方法。

正因为如此，Fluentd 内置了 grep 过滤插件，方便我们针对日志事件的某些字段进行过滤操作。

【配置示例】

我们先看一段配置片段：

<filter foo.bar> @type grep

<regexp> key message pattern /cool/ </regexp>

<regexp> key hostname pattern /^web\d+\.example\.com$/ </regexp>

<exclude> key message pattern /uncool/ </exclude></filter>

这个例子会去匹配满足如下三个条件的日志事件：

日志事件的 message 字段的值包含 cool 文本
日志事件的 hostname 字段的值匹配 web<数字>.example.com 形式
日志事件的 message 字段的值不含 uncool 文本

因此，如下日志内容会被 Fluentd 保留：

{"message":"It's cool outside today", "hostname":"web001.example.com"}{"message":"That's not cool", "hostname":"web1337.example.com"}

而，如下日志内容会被 Fluentd 丢弃：

{"message":"I am cool but you are uncool", "hostname":"db001.example.com"}{"hostname":"web001.example.com"}{"message":"It's cool outside today"}

注意看一下，上边这3条日志，第一条 hostname 的值不合法，第二条和第三条都只含有一个字段。

【插件参数】

<regexp> 指令

用于指定过滤规则，包含两个参数：key 和 pattern

key：必需参数，指定需要过滤的字段名

pattern：必需参数，指定过滤使用的正则表达式

比如，下边这段配置用于匹配 price 是正数的日志事件

<regexp> key price pattern /[1-9]\d*/</regexp>

当有多个 <regexp> 存在时，grep 插件仅保留满足所有<regexp>条件的日志。

所以，对于如下这段配置

<regexp> key price pattern /[1-9]\d*/</regexp><regexp> key item_name pattern /^book_/</regexp>

只有 item_name 字段是以 book_ 开头并且 price 字段为正数的日志记录才会被保留，其余日志会被丢弃。

你可以在 pattern 中使用 | 操作符来设置具有或（or）含义的正则表达式。

比如：

<regexp> key item_name pattern /(^book_|^article)/</regexp>

这个配置片段会匹配 item_name 字段以 book_ 或 article 开头的日志，其余日志将被丢弃。

如果你使用的 pattern 中包含前置的斜杠（比如，文件路径），你需要对这个前置的斜杠进行转义。否则，匹配结果可能未如所愿。

看一个简单的例子：

<regexp> key filepath pattern \/spool/</regexp>

这个片段用于匹配 filepath 字段中包含 /spool/ 的日志。

也可以向下边这样进行转义：

<regexp> key filepath pattern /\/spool\//</regexp>
regexpN

用于设置多个匹配规则，N 取值为 1~20.

每个 regexpN 接受以空白分隔的两个参数，分别表示key、pattern。

比如，可以使用 regexpN 改写上边使用过的例子：

regexp1 price [1-9]\d*regexp2 item_name ^book_

注意，这个参数已废弃。请使用<regexp>指令。
<exclude> 指令

用于指定过滤规则，以丢弃和规则匹配的日志事件。

包含两个参数：key 和 pattern

key：必需参数，指定需要过滤的字段名

pattern：必需参数，指定过滤使用的正则表达式

比如，下边这个配置用于丢弃 status_code 为 5xx 的日志：

<exclude> key status_code pattern /^5\d\d$/</exclude>

当有多个 <exclude> 存在时，grep 插件会丢弃满足任一<exclude>条件的日志。

所以，对于如下这段配置

<exclude> key status_code pattern /^5\d\d$/</exclude><exclude> key url pattern /\.css$/</exclude>

任一 status_code 字段值为 5xx 或者 url 字段值以 .css 结尾的日志都会被丢弃。
excludeN

设置多个用于丢弃日志事件的匹配规则，N 取值为 1~20.

每个 excludeN 接受以空白分隔的两个参数，分别表示key、pattern。

比如，可以使用 excludeN 改写上边使用过的例子：

exclude1 status_code ^5\d\d$exclude2 url \.css$

注意，这个参数已废弃。请使用<exclude>指令。
<and> 指令

用于组合过滤规则，可包含<regexp>或<exclude>指令。

满足<and>中所有 pattern 的日志才会被保留或丢弃。

Fluentd v1.2.0以上版本支持此指令。

<and> <regexp> key price pattern /[1-9]\d*/ </regexp> <regexp> key item_name pattern /^book_/ </regexp></and>

上边这个配置等同于如下配置：

<regexp> key price pattern /[1-9]\d*/</regexp><regexp> key item_name pattern /^book_/</regexp>

我们也可以在<and>指令中使用<exclude>指令。

<and> <exclude> key container_name pattern /^app\d{2}/ </exclude> <exclude> key log_level pattern /^(?:debug|trace)$/ </exclude></and>
<or> 指令

用于组合过滤规则，可包含<regexp>或<exclude>指令。

满足<or>中任一 pattern 的日志就会被保留或丢弃。

Fluentd v1.2.0以上版本支持此指令。

<or> <exclude> key status_code pattern /^5\d\d$/ </exclude> <exclude> key url pattern /\.css$/ </exclude></or>

上边这个配置等同于如下配置：

<exclude> key status_code pattern /^5\d\d$/</exclude><exclude> key url pattern /\.css$/</exclude>

我们也可以在<or>指令中使用<regexp>指令。

<or> <regexp> key container_name pattern /^db\d{2}/ </regexp> <regexp> key log_level pattern /^(?:warn|error)$/ </regexp></or>

我们会在后续文章中对 filter_grep 进行功能测试，以验证其使用方法。

敬请继续关注。

Fluentd 过滤插件：grep 用法详解

相关推荐