The filter.yml
file controls how the trace2receiver
component
translates the Trace2 data stream from Git commands into OTEL data
structures. This filtering is content- and context-aware and is
independent of any statistical filtering performed by later stages in
the OTEL Collector pipeline.
The filter settings pathname is set in the
receivers.trace2receiver.filter
parameter in the main config.yml
file.
The trace2receiver
does "smart filtering" rather than just
"percentage filtering". This allows it to control the verbosity of
the generated telemetry from the Trace2 data stream from Git commands.
For example, you might want very verbose output for your monorepo
while doing a performance study and only minimal output otherwise. Or
you might want no telemetry at all for insignificant or personal
repos.
-
Detail Levels: There are four builtin detail levels. These vary from no telemetry to very verbose telemetry. These form the foundation of the filtering system.
-
Rulesets: Rulesets build upon detail levels. They let you define a meaningful name for a set of filtering patterns, such as dropping telemetry for "uninteresting" commands and requesting verbose telemetry for "interesting" ones. Rulesets can only refer to detail levels. They cannot refer to other rulesets.
-
Repo Nicknames: Repo Nicknames are an aliasing technique built on top of rulesets. They serve two roles: (1) they select a detail level or ruleset for content filtering, and (2) they help with data aggregation from different repo instances across different machines.
As we'll see later, a Git command can refer to a detail level, a ruleset, or a repo nickname to override the default filtering and telemetry verbosity.
The following sections explain each of these concepts in more detail. And later in this document we'll see how they can be used by Git commands.
All detail level names begin with a dl:
prefix to distinguish
them from ruleset names and repo nicknames.
<detail-level> ::= "dl:drop"
| "dl:summary"
| "dl:process"
| "dl:verbose"
-
dl:drop
-- Drop or omit all telemetry for the command. -
dl:summary
-- Generate basic telemetry for the command; the primary focus is the lifespan of the command, the arguments, and exit code. -
dl:process
-- Adds process-level data events, process-level timer and counter values, and child process (and hook) events to the summary-level data. -
dl:verbose
-- Adds thread-level and region-level details to the process-level data.
All ruleset names have a rs:
prefix to distinguish them from detail
levels and repo nicknames.
<ruleset-name> ::= "rs:<string>"
The content of a ruleset is defined in a ruleset file.
A ruleset name is essentially an alias for the underlying ruleset file. Using a ruleset name avoids requiring users know how and where the telemetry service is installed.
The filter.yml
file contains a dictionary to map ruleset names to
pathnames:
rulesets:
<ruleset-name-1>: <ruleset-pathname-1>
<ruleset-name-2>: <ruleset-pathname-2>
...
Ruleset files will be loaded when the receiver starts up.
Note
If you want to modify the list of rulesets or edit one of the ruleset files, you'll need to restart the telemetry service when you're finished.
A repo nickname is another level aliasing on top of rulesets. Conceptually, this is a way to say that this repo is an instance of project "foo" and that telemetry data from it can be aggregated with Git command data from other instances of project "foo".
This avoids the need for the telemetry service or data store to try to
guess how to aggregate data by parsing the remote.origin.url
or
the basename of the repo root directory. Users can simple say that
this repo is an instance of repo "foo" and aggregate or partition
data as they want.
Nicknames also let us say that all instances of repo "foo" should use the ruleset "rs:bar".
A repo nickname is a simple string without either dl:
or rs:
prefix.
The filter.yml
file contains a dictionary to map nicknames to detail
levels or rulesets:
nicknames:
<nickname-1>: <ruleset-name> | <detail-level>
<nickname-1>: <ruleset-name> | <detail-level>
...
Git can be told to send additional Git config key/value pairs in the
Trace2 telemetry string using the
trace2.configparams
config setting. We can use that mechanism to have Git send extra meta
data to help trace2receiver
decide how to generate or filter OTEL
data.
In the examples here we have chosen to use the otel.trace2.*
namespace for all of these special config settings, but you can use
any prefix you want.
To tell Git to always send these config settings, we must add this
namespace to the trace2.configparams
config setting at the global
or system
level.
$ git config --system trace2.configparams "otel.trace2.*"
The filter.yml
contains a dictionary to define the spelling of
these keys:
keynames:
nickname_key: "otel.trace2.nickname"
ruleset_key: "otel.trace2.ruleset"
We can set repo nicknames on our repos using the Git config
setting named in the nickname_key
parameter. Thereafter, Git will
silently send the nickname on every Git command in those repos.
The nickname should be local to the individual repo.
$ cd /path/to/my/repo1
$ git config --local otel.trace2.nickname "monorepo"
$
$ cd /path/to/my/repo2
$ git config --local otel.trace2.nickname "monorepo"
$
$ cd /path/to/my/repo3
$ git config --local otel.trace2.nickname "personal"
Or you can set it for a single command:
$ cd /path/to/my/repo4
$ git -c otel.trace2.nickname=personal status
If no nickname is defined or the given repo nickname is not defined in
the filter.yml
file, the receiver will fall back to the default
filter settings.
In the above example, I've suggested "monorepo" and "personal" as
nicknames, but you might use the base name of the repo, such as
git.git
or chromium.git
or just chromium
. Or you might use a
project codename (and further hide the origin URL).
You might use different nicknames for desktop users versus build servers on instances of the same repo to help partition the data in the data store by use cases or machine classes. For example, you might want to see the P80 fetch times for interactive users and not have to sift thru fetches from build machines.
The repo nickname helps identify/classify the data and lets you set an expected ruleset. However, there are times when you might want to maintain the above classification, but use different verbosity for some commands or for some repo instances.
The ruleset_key
parameter lets you explicitly select a ruleset and
override the ruleset associated with the nickname.
$ cd /path/to/my/repo1
$ git config --local otel.trace2.ruleset "rs:production"
$
$ cd /path/to/my/repo2
$ git config --local otel.trace2.ruleset "rs:test"
$
$ cd /path/to/my/repo3
$ git config --local otel.trace2.ruleset "dl:drop"
Or set it for a single command:
$ cd /path/to/my/repo4
$ git -c otel.trace2.ruleset="dl:summary" status
If the named ruleset or detail level is not defined in the filter.yml
file, the receiver will fall back to the default filter settings.
If a Git command sends both a ruleset_key
and nickname_key
, the
ruleset_key
wins. (Both key values will be included in the OTEL
telemetry, but the telemetry data will be filtered using the value of
the ruleset_key
.)
Now that all of the concepts have been introduced, we can describe
the complete syntax of the filter.yml
file. All sections and rows
are optional.
keynames:
nickname_key: <git-config-key>
ruleset_key: <git-config-key>
nicknames:
<nickname-1>: <ruleset-name> | <detail-level>
<nickname-1>: <ruleset-name> | <detail-level>
...
rulesets:
<ruleset-name-1>: <ruleset-pathname-1>
<ruleset-name-2>: <ruleset-pathname-2>
...
defaults:
ruleset: <ruleset-name> | <detail-level>
The value of the defaults.ruleset
parameter will be used when a Git
command does not specify a repo nickname or ruleset.
If there is no default, the builtin default of dl:summary
will be
used.
In this filter:
keynames:
nickname_key: "otel.trace2.nickname"
ruleset_key: "otel.trace2.ruleset"
nicknames:
monorepo: "dl:verbose"
personal: "dl:drop"
rulesets:
"rs:status": "./rulesets/rs-status.yml"
defaults:
ruleset: "dl:summary"
The receiver will watch for the otel.trace2.nickname
and
otel.trace2.ruleset
Git config key/values pairs in the Trace2
telemetry stream to override the builtin filtering defaults.
Commands that send otel.trace2.ruleset = rs:status
will
use the command-level filtering described in the rs-status.yml
ruleset file.
Commands that send otel.trace2.nickname = monorepo
will
use dl:verbose
and emit very verbose telemetry.
Commands that send otel.trace2.nickname = personal
will
use dl:drop
and not emit any telemetry.
All other commands will use the default dl:summary
and
emit command overview telemetry.