Insights API
Input Data Formats
Before any data reaches the rules framework, it obviously has to be generated. There are currently several input data formats that can be processed by insights-core:
SOSReports
A SOSReport is a command-line tool for Red Hat Enterprise Linux (and other systems) to collect configuration and diagnostic information from the system.
Insights Archives
These archives have been designed from the ground up to fit the Insights use case -- that is, to be automatically uploaded on a daily basis. This means that the data contained in the archive is exactly what is required to effectively process all currently-developed rules -- and nothing more.
In addition, there are several features built in to the insights-client package (which is the tool that creates Insights archives) to better meet Red Hat customers’ security and privacy concerns.
- Blacklists
A list of files or commands to never upload.
- Filters
A set of simple strings used to filter files before adding them to the archive.
- Dynamic Uploader Configuration
The client will download a configuration file from Red Hat (by default) every time it’s executed that will list out every file to collect and command to run, including filters to apply to each file, specified by rule plugins. The configuration is signed, which should be verified by the client before using it to collect data.
These features allow these archives to be processed quickly and more securely in the Insights production environment. On the other hand, the reduced data set narrows the scope of uses to be Insights-specific.
OCP 4 Archives
OCP 4 can generate diagnostic archives with a component called the
insights-operator
. They are automatically uploaded to Red Hat for analysis.
The openshift-must-gather CLI tool produces more comprehensive archives than
the operator. insights-core
recognizes them as well.
Execution Model
To build rules effectively, one should have a general idea of how data is executed against each rule. At a high level:
Each unit of input data is mapped to a symbolic name
Each parser that “subscribes” to that symbolic name is executed with the given content as part of a
Context
object.The outputs of all parsers are sorted by host
For each host, every rule is invoked with the local context, populated by parsers from the same plugin, and the shared context, the combination of all shared parser outputs for this particular host.
The outputs of all rules is returned, along with other various bits of metadata, to the client, depending on what invoked the rules framework.
Contexts
The term Context
refers to the context of the information that is collected
and evaluated by Insights. Examples of context are Host Context (directly
collected from a host), Host Archive Context (uploaded Insights archive),
SOSReports (uploaded SOSReport archive), and Docker Image Context (directly
collected from Docker image). The context determines which data sources are
collected, and that in determines the hierarchy of parsers, collectors and rules
that are executed. Contexts enable different collection methods for data for
each unique context, and also provide a default set of data sources that are
common among one or more contexts. All available contexts are defined in the
module insights.core.context
.
Data Sources
Data Sources
define how data processed by Insights is collected. Each
data source is specific to a unique set of data. For example a data source
is defined for the contents of the file /etc/hosts
and for the output
of the command /sbin/fdisk -l
. The default data sources provide the
primary data collection specifications for all contexts and are located in
insights.specs.default.DefaultSpecs
.
Each specific Context
may
override a default data source to provide a different collection
specification. For instance when the Insights client collects the fdisk -l
information it will use the default datasource and execute the command on
the target machine. This is the insights.core.context.HostContext
.
The Insights client stores that information as a file in an archive.
When the client uploads that information to the Red Hat Insights
service it is processed in the insights.core.context.HostArchiveContext
.
Because the fdisk -l
data is now in a file in the archive the data sources
defined in insights.specs.insights_archive.InsightsArchiveSpecs
are
used instead. In this case Insights will collect the data from a file
named insights_commands/fdisk_-l
.
The command type datasources in default.py
(simple_command
and
foreach_execute
) only target HostContext
. File based datasources fire for any
context annotated with @fs_root
(HostContext
, InsightsArchiveContext
,
SosArchiveContext
, DockerImageContext
, and etc.).
That’s why we need a definition in the *_archive.py
files for every command but only
for the files that are different from default.py
.
Also, the order in which spec modules load matters. Say we have 2 classes containing
specs, A and B. If B loads after A and both A and B have entries for hostname
,
the one that fires depends on the context that each one targets. E.g. if A.hostname
targets HostContext
and B.hostname
targets InsightsArchiveContext
, then
they’ll each fire for whichever context is loaded. But if both A.hostname
and
B.hostname
target HostContext
, the datasource in the class that loads last
will win for that context.
While data sources are specific to the context, the purpose of the data source
hierarchy is to provide a consistent set of input to Parsers
. For this
reason Parsers
should generally depend upon insights.specs.Specs
data sources.
This hierarchy allows a developer to override a particular datasource. For instance,
if a developer found a bug in a sos_archive datasource,
she could create her own class inheriting from insights.core.spec_factory.SpecSet
,
create the datasource in it,
and have the datasource target SosArchiveContext
. So long as the module containing
her class loads after default.py
, insights_archive.py
, and sos_archive.py
,
her definition will win for that datasource when running under a SosArchiveContext
.
Specification Factories
Data sources may utilize various methods called spec factories for collection
of information.
Collection from a file (/etc/hosts
) and from a command (/sbin/fdisk -l
)
are two of the most common. These are implemented by the
insights.core.spec_factory.simple_file()
and insights.core.spec_factory.simple_command()
spec factories
respectively. All of the spec factories currently available for the creation
of data sources are listed below.
insights.core.spec_factory.simple_file()
simple_file collects the contents of files, for example:
auditd_conf = simple_file("/etc/audit/auditd.conf") audit_log = simple_file("/var/log/audit/audit.log")
insights.core.spec_factory.simple_command()
simple_command collects the output from a command, for example:
blkid = simple_command("/sbin/blkid -c /dev/null") brctl_show = simple_command("/usr/sbin/brctl show")
insights.core.spec_factory.glob_file()
glob_file collects the contents of each file matching the glob pattern(s). glob_file also can take a list of patterns as well as an ignore keyword arg that is a regular expression telling it which of the matching files to throw out, for example:
httpd_conf = glob_file(["/etc/httpd/conf/httpd.conf", "/etc/httpd/conf.d/*.conf"]) ifcfg = glob_file("/etc/sysconfig/network-scripts/ifcfg-*") rabbitmq_logs = glob_file("/var/log/rabbitmq/rabbit@*.log", ignore=".*rabbit@.*(?<!-sasl).log$")
insights.core.spec_factory.first_file()
first_file collects the contents of the first readable file from a list of files, for example:
meminfo = first_file(["/proc/meminfo", "/meminfo"]) postgresql_conf = first_file([ "/var/lib/pgsql/data/postgresql.conf", "/opt/rh/postgresql92/root/var/lib/pgsql/data/postgresql.conf", "database/postgresql.conf" ])
insights.core.spec_factory.listdir()
listdir collects a simple directory listing of all the files and directories in a path, for example:
block_devices = listdir("/sys/block") ethernet_interfaces = listdir("/sys/class/net", context=HostContext)
insights.core.spec_factory.foreach_execute()
foreach_execute executes a command for each element in provider. Provider is the output of a different datasource that returns a list of single elements or a list of tuples. This spec factory is typically utilized in combination with a simple_file, simple_command or listdir spec factory to generate the input elements, for example:
ceph_socket_files = listdir("/var/run/ceph/ceph-*.*.asok", context=HostContext) ceph_config_show = foreach_execute(ceph_socket_files, "/usr/bin/ceph daemon %s config show") ethernet_interfaces = listdir("/sys/class/net", context=HostContext) ethtool = foreach_execute(ethernet_interfaces, "/sbin/ethtool %s")
insights.core.spec_factory.foreach_collect()
foreach_collect substitutes each element in provider into path and collects the files at the resulting paths. This spec factory is typically utilized in combination with a simple_command or listdir spec factory to generate the input elements, for example:
httpd_pid = simple_command("/usr/bin/pgrep -o httpd") httpd_limits = foreach_collect(httpd_pid, "/proc/%s/limits") block_devices = listdir("/sys/block") scheduler = foreach_collect(block_devices, "/sys/block/%s/queue/scheduler")
insights.core.spec_factory.first_of()
first_of returns the first of a list of dependencies that exists. At least one must be present, or this component won’t fire. This spec factory is typically utilized in combination with other spec factories to generate the input list, for example:
postgresql_log = first_of([glob_file("/var/lib/pgsql/data/pg_log/postgresql-*.log"), glob_file("/opt/rh/postgresql92/root/var/lib/pgsql/data/pg_log/postgresql-*.log"), glob_file("/database/postgresql-*.log")]) systemid = first_of([simple_file("/etc/sysconfig/rhn/systemid"), simple_file("/conf/rhn/sysconfig/rhn/systemid")])
Custom Data Source
If greater control over data source content is required than provided by the
existing specification factories, it is possible to write a custom data source.
This is accomplished by decorating a function with the @datasource
decorator
and returning a list
type. Here’s an example:
1@datasource(HostContext)
2def block(broker):
3 remove = (".", "ram", "dm-", "loop")
4 tmp = "/dev/%s"
5 return [(tmp % f) for f in os.listdir("/sys/block") if not f.startswith(remove)]
Custom datasources also can return insights.core.spec_factory.CommandOutputProvider
,
insights.core.spec_factory.TextFileProvider
, or
insights.core.spec_factory.RawFileProvider
instances.
Parsers
A Parser
takes the raw content of a particular Data Source
such as
file contents or command output, parses it, and then provides a small API for
plugins to query. The parsed data and computed facts available via the API are
also serialized to be used in downstream processes.
Choosing a Module
Currently all shared parsers are defined in the package
insights.parsers
. From there, the parsers are separated into
modules based on the command or file that the parser consumes. Commands or
files that are logically grouped together can go in the same module, e.g. the
ethtool
based commands and ps
based commands.
Defining Parsers
There are a couple things that make a function a parser:
The function is decorated with the
@parser
decoratorThe function can take multiple parameters, the first is always expected to be of type
Context
. Any additional parameters will normally represent acomponent
with a sole purpose of determining if the parser will fire.
Registration and Symbolic Names
Parsers are registered with the framework by use of the @parser
decorator.
This decorator will add the function object to the list of parsers associated
with the given data source name. Without the decorator, the parser will
never be found by the framework.
Data source names represent all the possible file content types that can be
analyzed by parsers. The rules framework uses the data source name mapping
defined in insights.specs.Specs
to map a symbolic
name to a command, a single file or multiple files. More detail on this mapping
is provided in the section Specification Factories
The same mapping is used to create the
uploader.json
file consumed by Insights Client to collect data from
customer systems. The Client RPM is developed and
distributed with Red Hat Enterprise Linux as part of the base distribution.
Updates to the Client RPM occur less frequently than to the Insights Core application.
Additionally customers may not update the Client RPM on their systems.
So developers need to check both the Insights Core and the Client applications
to determine what information is available for processing in Insights.
- class insights.core.plugins.parser(*args, **kwargs)[source]
Decorates a component responsible for parsing the output of a
datasource
.@parser
should accept multiple arguments, the first will ALWAYS be the datasource the parser component should handle. Any subsequent argument will be acomponent
used to determine if the parser should fire.@parser
should only decorate subclasses ofinsights.core.Parser
.Warning
If a Parser component handles a datasource that returns a
list
, a Parser instance will be created for each element of the list. Combiners or rules that depend on the Parser will be passed the list of instances and not a single parser instance. By default, if any parser in the list succeeds, those parsers are passed on to dependents, even if others fail. If all parsers should succeed or fail together, passcontinue_on_error=False
.
Parser Contexts
Each parser may take multiple parameters.
The first is always expected to be of type Context
.
Order is also important and the parameter of type Context
must always be first.
All information available to a parser is found in the
insights.core.context.Context
object.
Please refer to the Context API documentation
insights.core.context
for more details.
Any additional parameters will not be of type Context
but will normally represent a component
with a sole
purpose of determining if the parser will fire.
Parser Outputs
Parsers can return any value, as long as it’s serializable.
Parser developers are encouraged to wrap output data in a Parser
class. This makes plugin developers able to query for higher-level facts about
a particular file, while also exporting the higher level facts for use outside
of Insights plugins.
- class insights.core.Parser(context)[source]
Base class designed to be subclassed by parsers.
The framework will construct your object with a Context that will provide at least the content as an interable of lines and the path that the content was retrieved from.
Facts should be exposed as instance members where applicable. For example:
self.fact = "123"
Examples
>>> class MyParser(Parser): ... def parse_content(self, content): ... self.facts = [] ... for line in content: ... if 'fact' in line: ... self.facts.append(line) >>> content = ''' ... # Comment line ... fact=fact 1 ... fact=fact 2 ... fact=fact 3 ... '''.strip() >>> my_parser = MyParser(context_wrap(content, path='/etc/path_to_content/content.conf')) >>> my_parser.facts ['fact=fact 1', 'fact=fact 2', 'fact=fact 3'] >>> my_parser.file_path '/etc/path_to_content/content.conf' >>> my_parser.file_name 'content.conf'
- file_name
Filename portion of the input file.
- Type:
str
- file_path
Full context path of the input file.
- Type:
str
- parse_content(content)[source]
This method must be implemented by classes based on this class.
Rule Plugins
The purpose of Rule plugins is to identify a particular problem in a given system based on certain facts about that system. Each Rule plugin consists of a module with:
One
@rule
-decorated functionAn
ERROR_KEY
member (recommended)- A docstring for the module that includes
A summary of the plugin
A longer description of what the plugin identifies
Links to Red Hat solutions
- class insights.core.plugins.rule(*args, **kwargs)[source]
Decorator for components that encapsulate some logic that depends on the data model of a system. Rules can depend on
datasource
instances,parser
instances,combiner
instances, or anything else.For example:
@rule(SshDConfig, InstalledRpms, [ChkConfig, UnitFiles], optional=[IPTables, IpAddr]) def report(sshd_config, installed_rpms, chk_config, unit_files, ip_tables, ip_addr): # ... # ... some complicated logic # ... bash = installed_rpms.newest("bash") return make_pass("BASH", bash=bash)
Notice that the arguments to
report
correspond to the dependencies in the@rule
decorator and are in the same order.Parameters to the decorator have these forms:
Criteria
Example Decorator Arguments
Description
Required
SshDConfig, InstalledRpms
Regular arguments
At Least One
[ChkConfig, UnitFiles]
An argument as a list
Optional
optional=[IPTables, IpAddr]
A list following optional=
If a parameter is required, the value provided for it is guaranteed not to be
None
. In the example above,sshd_config
andinstalled_rpms
will not beNone
.At least one of the arguments to parameters of an “at least one” list will not be
None
. In the example, either or both ofchk_config
andunit_files
will not beNone
.Any or all arguments for optional parameters may be
None
.The following keyword arguments may be passed to the decorator:
- Keyword Arguments:
requires (list) -- a list of components that all components decorated with this type will require. Instead of using
requires=[...]
, just pass dependencies as variable arguments to@rule
as in the example above.optional (list) -- a list of components that all components decorated with this type will implicitly depend on optionally. Additional components passed as
optional
to the decorator will be appended to this list.metadata (dict) -- an arbitrary dictionary of information to associate with the component you’re decorating. It can be retrieved with
get_metadata
.tags (list) -- a list of strings that categorize the component. Useful for formatting output or sifting through results for components you care about.
group --
GROUPS.single
orGROUPS.cluster
. Used to organize components into “groups” that run together withinsights.core.dr.run()
.cluster (bool) -- if
True
will put the component into theGROUPS.cluster
group. Defaults toFalse
. Overridesgroup
ifTrue
.content (string or dict) -- a jinja2 template or dictionary of jinja2 templates. The
Response
subclasses rules can return are dictionaries.make_pass
,make_fail
, andmake_response
all accept first a key and then a list of arbitrary keyword arguments. If content is a dictionary, the key is used to look up the template that the rest of the keyword argments will be interpolated into. If content is a string, then it is used for all return values of the rule. If content isn’t defined but aCONTENT
variable is declared in the module, it will be used for every rule in the module and also can be a string or list of dictionarieslinks (dict) -- a dictionary with strings as keys and lists of urls as values. The keys categorize the urls, e.g. “kcs” for kcs urls and “bugzilla” for bugzilla urls.
Rule Parameters
The parameters for each rule function mirror the parser or parsers identified
in the @rule
decorator. This is best demonstrated by an example:
1 @rule(InstalledRpms, Lsof, Netstat)
2 def heartburn(installed_rpms, lsof, netstat):
3 # Rule implementation
Line 1 of this example indicates that the rule depends on 3 parsers,
InstalledRpms
, Lsof
, and Netstat
. The signature for the
rule function on line 2 contains the parameters that correspond respectively
to the parsers specified in the decorator. All three parsers are required
so if any are not present in the input data, then the rule will not be called.
This also means that all three input parameters will have some value corresponding
to the parser objects. It is up to the rule to evaluate the object attributes
and methods to determine if the criteria is met to trigger the rule.
Rule Output
Rules can return multiple types of responses. If a rule is detecting some
problem and finds it, it should return make_fail
. If it is detecting a
problem and is sure the problem doesn’t exist, it should return make_pass
.
If it wants to return information not associated with a failure or success, it
should return make_info
.
To return a rule “hit”, return the result of make_fail
:
- class insights.core.plugins.make_fail(key, **kwargs)[source]
Returned by a rule to signal that its conditions have been met.
Example:
# completely made up package buggy = InstalledRpms.from_package("bash-3.4.23-1.el7") @rule(InstalledRpms) def report(installed_rpms): bash = installed_rpms.newest("bash") if bash == buggy: return make_fail("BASH_BUG_123", bash=bash) return make_pass("BASH", bash=bash)
To return a rule success, return the result of make_pass
:
- class insights.core.plugins.make_pass(key, **kwargs)[source]
Returned by a rule to signal that its conditions explicitly have not been met. In other words, the rule has all of the information it needs to determine that the system it’s analyzing is not in the state the rule was meant to catch.
An example rule might check whether a system is vulnerable to a well defined exploit or has a bug in a specific version of a package. If it can say for sure “the system does not have this exploit” or “the system does not have the buggy version of the package installed”, then it should return an instance of
make_pass
.Example:
# completely made up package buggy = InstalledRpms.from_package("bash-3.4.23-1.el7") @rule(InstalledRpms) def report(installed_rpms): bash = installed_rpms.newest("bash") if bash == buggy: return make_fail("BASH_BUG_123", bash=bash) return make_pass("BASH", bash=bash)
To return system info, return the result of make_info
:
- class insights.core.plugins.make_info(key, **kwargs)[source]
Returned by a rule to surface information about a system.
Example:
@rule(InstalledRpms) def report(rpms): bash = rpms.newest("bash") return make_info("BASH_VERSION", bash=bash.nvra)
Testing
Since the plugin itself is a fairly simple set of python functions,
individual functions can be easily unit tested. Unit tests are required
for all plugins and can be found in the rules/tests
directory of the
source. Unit tests are written using the usual x-unit based
unittests
module with some helpers from
pytest framework. pytest
is the used
test runner.
To run all unit tests with pytest:
py.test
Run a single single unit test one can:
py.test path/test_plugin_name.py::TestCaseClass::test_method
To get test results with coverage report:
py.test --cov=plugin_package
Feature Deprecation
Parsers and other parts of the framework go through periodic revisions and updates, and sometimes previously used features will be deprecated. This is a three step process:
An issue to deprecate the outdated feature is raised in GitHub. This allows discussion of the plans to deprecate this feature and the proposed replacement functionality.
The outdated function, method or class is marked as deprecated. Code using this feature now generates a warning when the tests are run, but otherwise works. At this stage anyone receiving a warning about pending deprecation SHOULD change over to using the new functionality or at least not using the deprecated version. The deprecation message MUST include information about how to replace the deprecated function.
Once sufficient time has elapsed, the outdated feature is removed. The py.test tests will fail with a fatal error, and any code checked in that uses deprecated features will not be able to be merged because of the tests failing. Anyone receiving a warning about deprecation MUST fix their code so that it no longer warns of deprecation.
The usual time between each step should be two minor versions of the Insights core.
To deprecate code, call the insights.util.deprecated()
function from
within the code that will be eventually removed, in the following manner:
Functions
from insights.util import deprecated
def old_feature(arguments):
deprecated(old_feature, "Use the new_feature() function instead", "3.1.25")
...
Class methods
from insights.util import deprecated
class ThingParser(Parser):
...
def old_method(self, *args, **kwargs):
deprecated(self.old_method, "Use the new_method() method instead", "3.1.25")
self.new_method(*args, **kwargs)
...
Class
from insights.util import deprecated
class ThingParser(Parser):
def __init__(self, *args, **kwargs):
deprecated(ThingParser, "Use the new_feature() function instead", "3.1.25")
super(ThingParser, self).__init__(*args, **kwargs)
...
The insights.util.deprecated()
function takes three arguments:
The
function
or method being deprecated. This is used to tell the user where the deprecated code is. Classes cannot be directly deprecated, and should instead emit a deprecation message in their__init__
method.The
solution
to using this deprecated. This is a descriptive string that should tell anyone using the deprecated function what to do in future. Examples might be:For a replaced parser: “Please use the
NewParser
parser in thenew_parser
module.”For a specific method being replaced by a general mechanism: “Please use the
search
method with the argumentsstate="LISTEN"
.”
The last
version
of insights-core that the functions will be available before it is removed. For example:For version 3.1.0 the last revision will be 3.1.25. If the deprecation message indicate that the last version is 3.1.25, the function will be removed in 3.2.0.
Insights-core release timeline
Version |
Expected release date |
---|---|
3.4.0 |
June 2024 |
3.5.0 |
December 2024 |
3.6.0 |
June 2025 |
3.7.0 |
December 2025 |
3.8.0 |
June 2026 |
Note
We bump the insights-core revision every week. Please refer the CHANGELOG.md file for more info.
The minor version will be bumped after every 25 revisions. For example, after 3.1.25, we would move to 3.2.0 except for 3.0.300 which marks the first planned release. After 3.0.300, we bump the minor version to 3.1.0.