Filtering of Data in Insights Parsers and Rules

In this tutorial we will investigate filters in insights-core, what they are, how they affect your components and how you can use them in your code. Documentation on filters can be found in the insights-core documentation.

The primary purposes of filters are:

  1. to prevent the collection of sensitive information while enabling the collection of necessary information for analysis, and;

  2. to reduce the amount of information collected.

Filters are typically added in rule modules since the purpose of a rule is to analyze particular information and identify a problem, potential problem or fact about the system. A filter may also be added in a parse modules if it is required to enable parsing of the data. We will discuss this further when we look at the example. Filters added by rules and parsers are applied when the data is collected from a system. They are combined so that if they are added from multiple rules and parsers, each rule will receive all information that was collected by all filters for a given source. An example will help demonstrate this.

Suppose you write some rules that needs information from /var/log/messages. This file could be very large and contain potentially sensitive information, so it is not desirable to collect the entire file. Let’s say rule_a needs messages that indicate my_special_process has failed to start. And another rule, rule_b needs messages that indicate that my_other_process had the errors MY_OTHER_PROCESS: process locked or MY_OTHER_PROCESS: memory exceeded. Then the two rules could add the following filters to ensure that just the information they need is collected:

rule_a:

add_filter(Specs.messages, 'my_special_process')

rule_b:

add_filter(Specs.messages, ['MY_OTHER_PROCESS: process locked',
                            'MY_OTHER_PROCESS: memory exceeded'])

The effect of this would be that when /var/log/messages is collected, the filters would be applied and only the lines containing the strings 'my_special_process', 'MY_OTHER_PROCESS: process locked', or 'MY_OTHER_PROCESS: memory exceeded' would be collected. This significantly reduces the size of the data and the chance that sensitive information in /var/log/messages might be collected.

While there are significant benefits to filtering, you must be aware that a datasource is being filtered or your rules could fail to identify a condition that may be present on a system. For instance suppose a rule rule_c also needs information from /var/log/messages about process_xyz. If rule_c runs with other rules like rule_a or rule_b then it would never see lines containing "process_xyz" appearing in /var/log/messages unless it adds a new filter. When any rule or parser adds a filter to a datasource, that data will be filtered for all components, not just the component adding the filter. Because of this it is important to understand when a datasource is being filtered so that your rule will function properly and include its own filters if needed.

Exploring Filters

Unfiltered Data

Suppose we want to write a rule that will evaluate the contents of the configuration file death_star.ini to determine if there are any vulnerabilities. Since this is a new data source that is not currently collected by insights-core we’ll need to add three elements to collect, parse and evaluate the information.

[1]:
""" Some imports used by all of the code in this tutorial """
import sys
sys.path.insert(0, "../..")
from __future__ import print_function
import os
from insights import run
from insights.specs import SpecSet
from insights.core import IniConfigFile
from insights.core.plugins import parser, rule, make_fail
from insights.core.spec_factory import simple_file

First we’ll need to add a specification to collect the configuration file. Note that for purposes of this tutorial we are collecting from a directory where this notebook is located. Normally the file path would be an absolute path on your system or in an archive.

[2]:
class Specs(SpecSet):
    """
    Define a new spec to collect the file we need.
    """
    death_star_config = simple_file(os.path.join(os.getcwd(), 'death_star.ini'), filterable=True)

Next we’ll need to add a parser to parse the file being collected by the spec. Since this file is in INI format and insights-core provides the IniConfigFile parser, we can just use that to parse the file. See the parser documentation to find out what methods that parser provides.

[3]:
@parser(Specs.death_star_config)
class DeathStarCfg(IniConfigFile):
    """
    Define a new parser to parse the spec. Since the spec is a standard INI format we
    can use the existing IniConfigFile parser that is provided by insights-core.

    See documentation here:
    https://insights-core.readthedocs.io/en/latest/api_index.html#insights.core.IniConfigFile
    """
    pass

Finally we can write the rule that will examine the contents of the parsed configuration file to determine if there are any vulnerabilities. In this INI file we can find the vulnerabilities by searching for keywords to find one that contains the string vulnerability. If any vulnerabilities are found the rule should return information in the form of a response that documents the vulnerabilities found, and tags them with the key DS_IS_VULNERABLE. If no vulnerabilities are found the rule should just drop out, effectively returning None.

[4]:
@rule(DeathStarCfg)
def ds_vulnerable(ds_cfg):
    """
    Define a new rule to look for vulnerable conditions that may be
    included in the INI file.  If found report them.
    """
    vulnerabilities = []
    for section in ds_cfg.sections():
        print("Section: {}".format(section))
        for item_key in ds_cfg.items(section):
            print("    {}={}".format(item_key, ds_cfg.get(section, item_key)))
            if 'vulnerability' in item_key:
                vulnerabilities.append((item_key, ds_cfg.get(section, item_key)))

    if vulnerabilities:
        return make_fail('DS_IS_VULNERABLE', vulnerabilities=vulnerabilities)

Before we run the rule, lets look at the contents of the configuration file. It is in the format of a typical INI file and contains some interesting information. In particular we see that it does contain a keyword that should match the string we are looking for in the rule, “major_vulnerability=ray-shielded particle exhaust vent”. So we expect the rule to return results.

[5]:
!cat death_star.ini
[global]
logging=debug
log=/var/logs/sample.log

# Keep this info secret
[secret_stuff]
username=dvader
password=luke_is_my_son

[facts]
major_vulnerability=ray-shielded particle exhaust vent

[settings]
music=The Imperial March
color=black

Lets run our rule and find out. To run the rule we’ll use the insights.run() function and as the argument pass in our rule object (note this is not a string but the actual object). The results returned will be an insights.dr.broker object that contains all sorts of information about the execution of the rule. You can explore more details of the broker in the Insights Core Tutorial notebook.

The print statements in our rule provide output as it loops through the configuration file.

[6]:
results = run(ds_vulnerable)
Section: global
    logging=debug
    log=/var/logs/sample.log
Section: secret_stuff
    username=dvader
    password=luke_is_my_son
Section: facts
    major_vulnerability=ray-shielded particle exhaust vent
Section: settings
    color=black
    music=The Imperial March

Now we are ready to look at the results. The results are stored in results[ds_vulnerable] where the rule object ds_vulnerable is the key into the dictionary of objects that your rule depended upon to execute, such as the parser DeathStarCfg and the spec Spec.death_star_config. You can see this by looking at those objects in results.

[7]:
type(results[Specs.death_star_config])
[7]:
insights.core.spec_factory.TextFileProvider
[8]:
type(results[DeathStarCfg])
[8]:
__main__.DeathStarCfg
[9]:
type(results[ds_vulnerable])
[9]:
insights.core.plugins.make_fail

Now lets look at the rule results to see if they match what we expected.

[10]:
results[ds_vulnerable]
[10]:
{'error_key': 'DS_IS_VULNERABLE',
 'type': 'rule',
 'vulnerabilities': [(u'major_vulnerability',
   u'ray-shielded particle exhaust vent')]}

Success, it worked as we expected finding the vulnerability. Now lets look at how filtering can affect the rule results.

Filtering Data

When we looked at the contents of the file you may have noticed some other interesting information such as this:

# Keep this info secret
[secret_stuff]
username=dvader
password=luke_is_my_son

As a parser writer, if you know that a file could contain sensitive information, you may choose to filter it in the parser module to avoid collecting it. Usernames, passwords, hostnames, security keys, and other sensitive information should not be collected. In this case the username and password are in the configuration file, so we should add a filter to this parser to prevent them from being collected.

How do we add a filter and avoid breaking the parser? Each parser is unique, so the parser writer must determine if a filter is necessary, and how to add a filter that will allow the parser to function with a minimal set of data. For instance a Yaml or XML parser might have a difficult time parsing a filtered Yaml or XML file.

For our example, we are using an INI file parser. INI files are structured with sections which are identified as a section name in square brackets like [section name], followed by items like name or name=value. One possible way to filter an INI file is to add the filter "[" which will collect all lines with sections but no items. This can be successfully parsed by the INI parser, so that is how we’ll filter out this sensitive information in our configuration file. We’ll rewrite the parser adding the add_filter(Specs.death_star_config, '[') to filter all lines except those with a '[' string.

[11]:
from insights.core.filters import add_filter

add_filter(Specs.death_star_config, '[')

@parser(Specs.death_star_config)
class DeathStarCfg(IniConfigFile):
    """
    Define a new parser to parse the spec. Since the spec is a standard INI format we
    can use the existing IniConfigFile parser that is provided by insights-core.

    See documentation here:
    https://insights-core.readthedocs.io/en/latest/api_index.html#insights.core.IniConfigFile
    """
    pass

Now lets run the rule again and see what happens. Do you expect the same results we got before?

[12]:
results = run(ds_vulnerable)
results.get(ds_vulnerable, "No results")        # Use .get method of dict so we can provide default other than None
Section: global
Section: secret_stuff
Section: facts
Section: settings
[12]:
'No results'

Is that what you expected? Notice the output from the print statements in the rule, only the section names are printed. That is the result of adding the filter, only lines with '[' (the sections) are collected and provided to the parser. This means that the lines we were looking for in the rule are no longer there, and that it appears our rule didn’t find any vulnerabilities. Next we’ll look at how to fix our rule to work with the filtered data.

Adding Filters to Rules

We can add filters to a rule just like we added a filter to the parser, using the add_filter() method. The add_filter method requires a spec and a string or list/set of strings. In this case our rule is looking for the string 'vulnerability' so we just need to add that to the filter.

Alternatively, filters can be added by specifying a parser or combiner in the add_filter() method instead of a spec. In that scenario, the dependency tree will be traversed to locate underlying datasources that are filterable (filterable parameter is equal to True). And the specified filters will be added to those datasouces. In our example, we can filter the underlying Specs.death_star_config datasource by adding the add_filter(DeathStarCfg, 'vulnerability') statement. This is especially useful when you are working with a combiner that consolidates data from multiple parsers, which in turn depend on multiple datasources. Adding a filter to a combiner would allow for consistent filtering of data across all applicable datasources.

[13]:
add_filter(Specs.death_star_config, 'vulnerability')

@rule(DeathStarCfg)
def ds_vulnerable(ds_cfg):
    """
    Define a new rule to look for vulnerable conditions that may be
    included in the INI file.  If found report them.
    """
    vulnerabilities = []
    for section in ds_cfg.sections():
        print("Section: {}".format(section))
        for item_key in ds_cfg.items(section):
            print("    {}={}".format(item_key, ds_cfg.get(section, item_key)))
            if 'vulnerability' in item_key:
                vulnerabilities.append((item_key, ds_cfg.get(section, item_key)))

    if vulnerabilities:
        return make_fail('DS_IS_VULNERABLE', vulnerabilities=vulnerabilities)

Now lets run the rule again and see what happens.

[14]:
results = run(ds_vulnerable)
results.get(ds_vulnerable, "No results")        # Use .get method of dict so we can provide default other than None
Section: global
Section: secret_stuff
Section: facts
    major_vulnerability=ray-shielded particle exhaust vent
Section: settings
[14]:
{'error_key': 'DS_IS_VULNERABLE',
 'type': 'rule',
 'vulnerabilities': [(u'major_vulnerability',
   u'ray-shielded particle exhaust vent')]}

Now look at the output from the print statements in the rule, the item that was missing is now included. By adding the string required by our rule to the spec filters we have successfully included the data needed by our rule to detect the problem. Also, by adding the filter to the parser we have eliminated the sensitive information from the input.

Determining if a Spec is Filtered

When you are developing your rule, you may want to add some code, during development, to check if the spec you are using is filtered. This can be accomplished by looking at the spec in insights/specs/init.py. Each spec is defined here as a RegistryPoint() type. If the spec is filtered it will have the parameter filterable=True, for example the following indicates that the messages log (/var/log/messages) will be filtered:

messages = RegistryPoint(filterable=True)

If you need to use a parser that relies on a filtered spec then you need to add your own filter to ensure that your rule will receive the data necessary to evaluate the rule conditions. If you forget to add a filter to your rule, if you include integration tests for your rule, pytest will indicate an exception like the following warning you that the add_filter is missing:

telemetry/rules/tests/integration.py:7:
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

component = <function report at 0x7fa843094e60>, input_data = <InputData {name:test4-00000}>, expected = None

    def run_test(component, input_data, expected=None):
        if filters.ENABLED:
            mod = component.__module__
            sup_mod = '.'.join(mod.split('.')[:-1])
            rps = _get_registry_points(component)
            filterable = set(d for d in rps if dr.get_delegate(d).filterable)
            missing_filters = filterable - ADDED_FILTERS.get(mod, set()) - ADDED_FILTERS.get(sup_mod, set())
            if missing_filters:
                names = [dr.get_name(m) for m in missing_filters]
                msg = "%s must add filters to %s"
>               raise Exception(msg % (mod, ", ".join(names)))
E               Exception: telemetry.rules.plugins.kernel.overcommit must add filters to insights.specs.Specs.messages

../../insights/insights-core/insights/tests/__init__.py:114: Exception

If you see this exception when you run tests then it means you need to include add_filter to your rule.

Turning Off Filtering Globally

There are often times that you would want or need to turn off filtering in order to perform testing or to fully analyze some aspects of a system and diagnose problems. Also if you are running locally on a system you might want to collect all data unfiltered. You can to this by setting the environment variable INSIGHTS_FILTERS_ENABLED=False prior to running insights-core. This won’t work inside this notebook unless you follow the directions below.

[15]:
"""
This code will disable all filtering if it is run as the first cell when the notebook
is opened.  After the notebook has been started you will need to click on the Kernel
menu and then the restart item, and then run this cell first before all others.
You would need to restart the kernel and then not run this cell to prevent disabling
filters.
"""
import os
os.environ['INSIGHTS_FILTERS_ENABLED'] = 'False'
[16]:
results = run(ds_vulnerable)
results.get(ds_vulnerable, "No results")        # Use .get method of dict so we can provide default other than None
Section: global
Section: secret_stuff
Section: facts
    major_vulnerability=ray-shielded particle exhaust vent
Section: settings
[16]:
{'error_key': 'DS_IS_VULNERABLE',
 'type': 'rule',
 'vulnerabilities': [(u'major_vulnerability',
   u'ray-shielded particle exhaust vent')]}

Debugging Components

If you are writing component code you may sometimes not see any results even though you expected them and no errors were displayed. That is because insights-core is catching the exceptions and saving them. In order to see the exceptions you can use the following method to display the results of a run and any errors that occurrerd.

[17]:
def show_results(results, component):
    """
    This function will show the results from run() where:

        results = run(component)

    run will catch all exceptions so if there are any this
    function will print them out with a stack trace, making
    it easier to develop component code.
    """
    if component in results:
        print(results[component])
    else:
        print("No results for: {}".format(component))

    if results.exceptions:
        for comp in results.exceptions:
            print("Component Exception: {}".format(comp))
            for exp in results.exceptions[comp]:
                print(results.tracebacks[exp])

Here’s an example of this function in use

[18]:
@rule(DeathStarCfg)
def bad_rule(cfg):
    # Force an error here
    infinity = 1 / 0
[19]:
results = run(bad_rule)
No handlers could be found for logger "insights.core.dr"
[20]:
show_results(results, bad_rule)
No results for: <function bad_rule at 0x7f8c02e99d50>
Component Exception: <function bad_rule at 0x7f8c02e99d50>
Traceback (most recent call last):
  File "../../insights/core/dr.py", line 962, in run
    result = DELEGATES[component].process(broker)
  File "../../insights/core/plugins.py", line 303, in process
    r = self.invoke(broker)
  File "../../insights/core/plugins.py", line 64, in invoke
    return super(PluginType, self).invoke(broker)
  File "../../insights/core/dr.py", line 661, in invoke
    return self.component(*args)
  File "<ipython-input-18-0450035609f8>", line 4, in bad_rule
    infinity = 1 / 0
ZeroDivisionError: integer division or modulo by zero

[ ]: