Insights Core Datasource Registry

This notebook shows the recommended method of writing datasources.

It also shows how to register datasources with Insights Core to provide alternative methods of collection while still taking advantage of our parser and combiner catalog.

It assumes familiarity with datasources from the Standard Components section of the Insights Core Tutorial.

[1]:
import sys
sys.path.insert(0, "../..")
[2]:
from insights import run
from insights.core import dr
from insights.core.spec_factory import simple_file, simple_command

Fixing datasource names

The simplest way to define a datasource is with a helper class from insights.core.spec_factory.

However, if you use one of these to define a datasource at the module level, you’ll notice that it doesn’t have a very useful name.

[3]:
hosts = simple_file("/etc/hosts")
print(dr.get_name(hosts))
insights.core.spec_factory.simple_file

We can fix that by including it in a subclass of insights.core.spec_factory.SpecSet.

This is the recommended way of writing datasources.

[4]:
from insights.core.spec_factory import SpecSet

class MySpecs(SpecSet):
    hosts = simple_file("/etc/hosts")

print(dr.get_name(MySpecs.hosts))
__main__.MySpecs.hosts

Making datasources dynamic

What if you have datasources on which many downstream components depend, and you want to provide different ways of collecting the data they represent? Maybe you want to execute a command in one context but read from a file in another. Parsers depend on a single datasource, and jamming multiple collection methods into a single implementation isn’t attractive.

Instead, you can define a subclass of insights.core.spec_factory.SpecSet that has insights.core.spec_factory.RegistryPoint instances instead of regular datasources. Then you can provide implementations for the registry points in the form of datasources that are members of subclasses of your original class. This keeps the alternative implementations cleanly separated while allowing parsers to depend on a single component.

Note that this doesn’t work like normal class inheritance, although it uses the class inheritance mechanism.

[5]:
from insights.core.spec_factory import RegistryPoint
from insights.core.context import ExecutionContext, HostContext

# We'll use HostContext and OtherContext as our alternatives.

class OtherContext(ExecutionContext):
    pass
[6]:
# Define the components that your downstream components should depend on.

class TheSpecs(SpecSet):
    hostname = RegistryPoint()
    fstab = simple_file("/etc/fstab", context=HostContext)


# Provide different implementations for hostname by subclassing TheSpecs and
# giving the datasources names that match their corresponding registry points.

class HostSpecs(TheSpecs):
    hostname = simple_command("/usr/bin/hostname", context=HostContext)

class OtherSpecs(TheSpecs):
    hostname = simple_file("/etc/hostname", context=OtherContext)

# Note that we don't and actually can't provide an alternative for TheSpecs.fstab
# since it's not a RegistryPoint.

Downstream components should depend on TheSpecs.hostname, and the implementation that actually runs and backs that component will depend on the context in which you run.

[7]:
results = run(TheSpecs.hostname, context=HostContext)
print(results[TheSpecs.hostname])
CommandOutputProvider("'/usr/bin/hostname'")
[8]:
results = run(TheSpecs.hostname, context=OtherContext)
print(results[TheSpecs.hostname])
TextFileProvider("'/etc/hostname'")
[9]:
results = run(TheSpecs.fstab, context=HostContext)
print(results[TheSpecs.fstab])
TextFileProvider("'/etc/fstab'")

RegistryPoint instances in SpecSet subclasses are converted to special datasources that simply check their dependencies and return the last one that succeeds. So, TheSpecs.hostname is just a datasource. When HostSpecs subclasses TheSpecs, the class machinery recognizes that HostSpecs.hostname is a datasource and is named the same as a RegistryPoint in an immediate super class. When that happens, the datasource of the subclass is added as a dependency of the datasource in the superclass.

If the datasources in each subclass depend on different contexts, only one of them will fire. That’s why when we ran with HostContext, the command was run, but when we ran with OtherContext, the file was collected.

Notice that the TheSpecs.fstab datasource can be run, too. If a subclass had provided a datasource of the same name, it would not have been registered with the super class but would instead have stayed local to that subclass.

Note also that the datasources in the alternative implementation classes aren’t special in any other way. You can run them directly, and components can depend on them if you want, although if you’re providing them as an implementation to a registry point, components really should depend on that instead of a particular implementation.

What happens if you have multiple subclass implementations for a given registry point, and more than one of them depends on the same context? In that case, the last one to be registered for that context is the one that runs.

Registering implementations for standard datasources

Providing alternative implementations for the standard Insights Core datasources is easy. The datasources on which the core parsers depend are all defined as RegistryPoints on the Specs class in insights.specs.

[10]:
from insights.specs import Specs

class UseThisInstead(Specs):
    hostname = simple_file("/etc/hostname", context=OtherContext)

results = run(Specs.hostname, context=OtherContext)
print(results[Specs.hostname])
print(results.get(Specs.hosts))
TextFileProvider("'/etc/hostname'")
None

Notice that Specs.hosts didn’t run! That’s because we haven’t loaded the module containing the default implementations, and we’ve only provided an implementation for Specs.hostname. Also, none of the defaults depend on OtherContext anyway.

What if you want to use the default datasources but only want to override a few of them, even for the same context?

Create a subclass that does exactly that:

[11]:
from pprint import pprint
from insights.specs import default  # load the default implementations

# Note that the default context is HostContext unless otherwise specified
# with a context= keyword argument.
class SpecialSpecs(Specs):
    hostname = simple_file("/etc/hostname")

results = run(Specs.hostname)

# show that the default didn't run
pprint(results[Specs.hostname])
pprint(results[SpecialSpecs.hostname])
pprint(results.get(default.DefaultSpecs.hostname, None))
print

results = run(Specs.hosts)

#show that the default ran
pprint(results[Specs.hosts])
pprint(results[default.DefaultSpecs.hosts])
TextFileProvider("'/etc/hostname'")
TextFileProvider("'/etc/hostname'")
None

TextFileProvider("'/etc/hosts'")
TextFileProvider("'/etc/hosts'")

If multiple datasources provide implementations for the same registry point and depend on the same context, then the last implementation to load is the one that is executed under that context.