############ Insights API ############ Input Data Formats ================== Before any data reaches the rules framework, it obviously has to be generated. There are currently several input data formats that can be processed by insights-core: SOSReports ---------- A SOSReport_ is a command-line tool for Red Hat Enterprise Linux (and other systems) to collect configuration and diagnostic information from the system. .. _SOSReport: https://github.com/sosreport/sos Insights Archives ----------------- These archives have been designed from the ground up to fit the Insights use case -- that is, to be automatically uploaded on a daily basis. This means that the data contained in the archive is exactly what is required to effectively process all currently-developed rules -- and nothing more. In addition, there are several features built in to the insights-client_ package (which is the tool that creates Insights archives) to better meet Red Hat customers' security and privacy concerns. .. _insights-client: https://github.com/redhataccess/insights-client Blacklists A list of files or commands to never upload. Filters A set of simple strings used to filter files before adding them to the archive. Dynamic Uploader Configuration The client will download a configuration file from Red Hat (by default) every time it's executed that will list out every file to collect and command to run, including filters to apply to each file, specified by rule plugins. The configuration is signed, which should be verified by the client before using it to collect data. These features allow these archives to be processed quickly and more securely in the Insights production environment. On the other hand, the reduced data set narrows the scope of uses to be Insights-specific. Execution Model =============== To build rules effectively, one should have a general idea of how data is executed against each rule. At a high level: - Each unit of input data is mapped to a symbolic name - Each parser that "subscribes" to that symbolic name is executed with the given content as part of a ``Context`` object. - The outputs of all parsers are sorted by host - For each host, every rule is invoked with the local context, populated by parsers from the same plugin, and the shared context, the combination of all shared parser outputs for this particular host. - The outputs of all rules is returned, along with other various bits of metadata, to the client, depending on what invoked the rules framework. .. _context-label: Contexts ======== The term ``Context`` refers to the context of the information that is collected and evaluated by Insights. Examples of context are Host Context (directly collected from a host), Host Archive Context (uploaded Insights archive), SOSReports (uploaded SOSReport archive), and Docker Image Context (directly collected from Docker image). The context determines which data sources are collected, and that in determines the hierarchy of parsers, collectors and rules that are executed. Contexts enable different collection methods for data for each unique context, and also provide a default set of data sources that are common among one or more contexts. All available contexts are defined in the module :py:mod:`insights.core.context`. .. _datasources-ref: Data Sources ============ ``Data Sources`` define how data processed by Insights is collected. Each data source is specific to a unique set of data. For example a data source is defined for the contents of the file ``/etc/hosts`` and for the output of the command ``/sbin/fdisk -l``. The default data sources provide the primary data collection specifications for all contexts and are located in :py:class:`insights.specs.default.DefaultSpecs`. Each specific ``Context`` may override a default data source to provide a different collection specification. For instance when the Insights client collects the ``fdisk -l`` information it will use the default datasource and execute the command on the target machine. This is the :py:class:`insights.core.context.HostContext`. The Insights client stores that information as a file in an archive. When the client uploads that information to the Red Hat Insights service it is processed in the :py:class:`insights.core.context.HostArchiveContext`. Because the ``fdisk -l`` data is now in a file in the archive the data sources defined in :py:class:`insights.specs.insights_archive.InsightsArchiveSpecs` are used instead. In this case Insights will collect the data from a file named ``insights_commands/fdisk_-l``. The command type datasources in ``default.py`` (``simple_command`` and ``foreach_execute``) only target ``HostContext``. File based datasources fire for any context annotated with ``@fs_root`` (``HostContext``, ``InsightsArchiveContext``, ``SosArchiveContext``, ``DockerImageContext``, and etc.). That's why we need a definition in the ``*_archive.py`` files for every command but only for the files that are different from ``default.py``. Also, the order in which spec modules load matters. Say we have 2 classes containing specs, A and B. If B loads after A and both A and B have entries for ``hostname``, the one that fires depends on the context that each one targets. E.g. if ``A.hostname`` targets ``HostContext`` and ``B.hostname`` targets ``InsightsArchiveContext``, then they'll each fire for whichever context is loaded. But if both ``A.hostname`` and ``B.hostname`` target ``HostContext``, the datasource in the class that loads last will win for that context. While data sources are specific to the context, the purpose of the data source hierarchy is to provide a consistent set of input to ``Parsers``. For this reason ``Parsers`` should generally depend upon :py:class:`insights.specs.Specs` data sources. This hierarchy allows a developer to override a particular datasource. For instance, if a developer found a bug in a sos_archive datasource, she could create her own class inheriting from :py:class:`insights.core.spec_factory.SpecSet`, create the datasource in it, and have the datasource target ``SosArchiveContext``. So long as the module containing her class loads after ``default.py``, ``insights_archive.py``, and ``sos_archive.py``, her definition will win for that datasource when running under a ``SosArchiveContext``. .. _specification-factories: Specification Factories ----------------------- Data sources may utilize various methods called spec factories for collection of information. Collection from a file (``/etc/hosts``) and from a command (``/sbin/fdisk -l``) are two of the most common. These are implemented by the :py:func:`insights.core.spec_factory.simple_file` and :py:func:`insights.core.spec_factory.simple_command` spec factories respectively. All of the spec factories currently available for the creation of data sources are listed below. :py:func:`insights.core.spec_factory.simple_file` simple_file collects the contents of files, for example:: auditd_conf = simple_file("/etc/audit/auditd.conf") audit_log = simple_file("/var/log/audit/audit.log") :py:func:`insights.core.spec_factory.simple_command` simple_command collects the output from a command, for example:: blkid = simple_command("/sbin/blkid -c /dev/null") brctl_show = simple_command("/usr/sbin/brctl show") :py:func:`insights.core.spec_factory.glob_file` glob_file collects the contents of each file matching the glob pattern(s). glob_file also can take a list of patterns as well as an ignore keyword arg that is a regular expression telling it which of the matching files to throw out, for example:: httpd_conf = glob_file(["/etc/httpd/conf/httpd.conf", "/etc/httpd/conf.d/*.conf"]) ifcfg = glob_file("/etc/sysconfig/network-scripts/ifcfg-*") rabbitmq_logs = glob_file("/var/log/rabbitmq/rabbit@*.log", ignore=".*rabbit@.*(?`_. ``pytest`` is the used test runner. To run all unit tests with pytest: py.test Run a single single unit test one can: py.test path/test_plugin_name.py::TestCaseClass::test_method To get test results with coverage report: py.test --cov=plugin_package Feature Deprecation =================== Parsers and other parts of the framework go through periodic revisions and updates, and sometimes previously used features will be deprecated. This is a three step process: 1. An issue to deprecate the outdated feature is raised in GitHub. This allows discussion of the plans to deprecate this feature and the proposed replacement functionality. 2. The outdated function, method or class is marked as deprecated. Code using this feature now generates a warning when the tests are run, but otherwise works. At this stage anyone receiving a warning about pending deprecation SHOULD change over to using the new functionality or at least not using the deprecated version. The deprecation message MUST include information about how to replace the deprecated function. 3. Once sufficient time has elapsed, the outdated feature is removed. The py.test tests will fail with a fatal error, and any code checked in that uses deprecated features will not be able to be merged because of the tests failing. Anyone receiving a warning about deprecation MUST fix their code so that it no longer warns of deprecation. The usual time between each step should be two minor versions of the Insights core. To deprecate code, call the :py:func:`insights.util.deprecated` function from within the code that will be eventually removed, in the following manner: Functions --------- .. code-block:: python from insights.util import deprecated def old_feature(arguments): deprecated(old_feature, "Use the new_feature() function instead", "3.1.25") ... Class methods ------------- .. code-block:: python from insights.util import deprecated class ThingParser(Parser): ... def old_method(self, *args, **kwargs): deprecated(self.old_method, "Use the new_method() method instead", "3.1.25") self.new_method(*args, **kwargs) ... Class ----- .. code-block:: python from insights.util import deprecated class ThingParser(Parser): def __init__(self, *args, **kwargs): deprecated(ThingParser, "Use the new_feature() function instead", "3.1.25") super(ThingParser, self).__init__(*args, **kwargs) ... The :py:func:`insights.util.deprecated` function takes three arguments: - The ``function`` or method being deprecated. This is used to tell the user where the deprecated code is. Classes cannot be directly deprecated, and should instead emit a deprecation message in their ``__init__`` method. - The ``solution`` to using this deprecated. This is a descriptive string that should tell anyone using the deprecated function what to do in future. Examples might be: - For a replaced parser: "Please use the ``NewParser`` parser in the ``new_parser`` module." - For a specific method being replaced by a general mechanism: "Please use the ``search`` method with the arguments ``state="LISTEN"``." - The last ``version`` of insights-core that the functions will be available before it is removed. For example: - For version 3.1.0 the last revision will be 3.1.25. If the deprecation message indicate that the last version is 3.1.25, the function will be removed in 3.2.0. Insights-core release timeline ------------------------------ .. table:: :widths: auto ======= ===================== Version Expected release date ======= ===================== 3.4.0 June 2024 3.5.0 December 2024 3.6.0 June 2025 3.7.0 December 2025 3.8.0 June 2026 ======= ===================== .. note:: - We bump the insights-core revision every week. Please refer the `CHANGELOG.md file `_ for more info. - The minor version will be bumped after every 25 revisions. For example, after 3.1.25, we would move to 3.2.0 except for 3.0.300 which marks the first planned release. After 3.0.300, we bump the minor version to 3.1.0.