Configuration Trees

Most configurations can be modeled as trees where a node has a name, some optional attributes, and some optional children. This includes systems that use yaml, json, and ini as well as systems like httpd, nginx, multipath, logrotate and many others that have custom formats. Many also have a primary configuration file with supplementary files included by special directives in the main file.

We have developed parsers for common configuration file formats as well as the custom formats of many systems. These parsers all construct a tree of the same primitive building blocks, and their combiners properly handle include directives. The final configuration for a given system is a composite of the primary and supplementary configuration files.

Since the configurations are parsed to the same primitives to build their trees, we can navigate them all using the same API.

This tutorial will focus on the common API for accessing config trees. It uses httpd configuration as an example, but the API is exactly the same for other systems.

[1]:
from __future__ import print_function
import sys
sys.path.insert(0, "../..")
[2]:
from insights.combiners.httpd_conf import get_tree
from insights.parsr.query import *

conf = get_tree()

conf now contains the consolidated httpd configuration tree from my machine. The API that follows is exactly the same for nginx, multipath, logrotate, and ini parsers. Yaml and Json parsers have a .doc attribute that exposes the same API. They couldn’t do so directly for backward compatibility reasons.

Basic Navigation

The configuration can be treated in some sense like a dictionary:

[3]:
conf["Alias"]
[3]:
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
[4]:
conf["Directory"]
[4]:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted
[5]:
conf["Directory"]["Options"]
[5]:
Options: Indexes FollowSymLinks
Options: None
Options: Indexes MultiViews FollowSymlinks
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec

Notice that the first pair of brackets are a query against the first level of the configuration tree. conf["Alias"] returns all of the “Alias” nodes. conf["Directory"] returns all of the “Directory” nodes.

A set of brackets after another set means to chain the queries using previous query results as the starting point. So, conf["Directory"]["Options"] first finds all of the “Directory” nodes, and then those are queried for their “Options” directives.

Complex Queries

In addition to simple queries that match node names, more complex queries are supported. For example, to get the “Directory” node for “/”, we can do the following:

[6]:
conf["Directory", "/"]
[6]:
[Directory /]
    AllowOverride: none
    Require: all denied

The comma constructs a tuple, so conf["Directory", "/"] and conf[("Directory", "/")] are equivalent. The first element of the tuple exactly matches the node name, and subsequent elements exactly match any of the node’s attributes. Notice that this is still a query, and the result behaves like a list:

[7]:
conf["Directory", "/", "/var/www"]
[7]:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

That’s asking for Directory nodes with any attribute exactly matching any of “/” or “/var/www”. These can be chained with more brackets just like the simpler queries shown earlier.

Predicates

In addition to exact matches, predicates can be used to better express what you want:

[8]:
conf["Directory", startswith("/var/www")]
[8]:
[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted
[9]:
conf[contains("Icon")]
[9]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*
AddIcon: /icons/binary.gif .bin .exe
AddIcon: /icons/binhex.gif .hqx
AddIcon: /icons/tar.gif .tar
AddIcon: /icons/world2.gif .wrl .wrl.gz .vrml .vrm .iv
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip
AddIcon: /icons/a.gif .ps .ai .eps
AddIcon: /icons/layout.gif .html .shtml .htm .pdf
AddIcon: /icons/text.gif .txt
AddIcon: /icons/c.gif .c
AddIcon: /icons/p.gif .pl .py
AddIcon: /icons/f.gif .for
AddIcon: /icons/dvi.gif .dvi
AddIcon: /icons/uuencoded.gif .uu
AddIcon: /icons/script.gif .conf .sh .shar .csh .ksh .tcl
AddIcon: /icons/tex.gif .tex
AddIcon: /icons/bomb.gif core.
AddIcon: /icons/back.gif ..
AddIcon: /icons/hand.right.gif README
AddIcon: /icons/folder.gif ^^DIRECTORY^^
AddIcon: /icons/blank.gif ^^BLANKICON^^
DefaultIcon: /icons/unknown.gif
[10]:
conf[contains("Icon"), contains("zip")]
[10]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip

Predicates can be combined with boolean logic. Here are all the top level nodes with “Icon” in the name and attributes that contain “CMP” and “zip”. Note the helper any_ (there’s also an all_) that means any attribute must succeed.

[11]:
conf[contains("Icon"), any_(contains("CMP")) & any_(contains("zip"))]
[11]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip

Here are the entries with all attributes not starting with “/”

[12]:
conf[contains("Icon"), all_(~startswith("/"))]
[12]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*

Several predicates are provided: startswith, endswith, contains, matches, lt, le, gt, ge, and eq. They can all be negated with ~ (not) and combined with & (boolean and) and | (boolean or).

It’s also possible to filter results based on whether they’re a Section or a Directive.

[13]:
conf.find(startswith("Directory"))
[13]:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

DirectoryIndex: index.html

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted
[14]:
query = startswith("Directory")
print("Directives:")
print(conf.find(query).directives)
print()
print("Sections:")
print(conf.find(query).sections)
print()
print("Chained filtering:")
print(conf.find(query).sections["Options"])
Directives:
DirectoryIndex: index.html

Sections:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted


Chained filtering:
Options: Indexes FollowSymLinks
Options: None
Options: Indexes MultiViews FollowSymlinks
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec

Notice that conf[startswith("Dir")].sections is not the same as conf.sections[startswith("Dir")]. The first finds all the top level nodes that start with “Dir” and then filters those to just the sections. The second gets all of the top level sections and then searches their children for nodes starting with “Dir.”

[15]:
print("Top level Sections starting with 'Dir':")
print(conf[startswith("Dir")].sections)
print()
print("Children starting with 'Dir' of any top level Section:")
print(conf.sections[startswith("Dir")])
Top level Sections starting with 'Dir':
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted


Children starting with 'Dir' of any top level Section:
DirectoryIndex: index.html

Ignoring Case

All of the predicates parsr.query defines take an ignore_case keywork parameter. They also have versions with an i prefix that pass ignore_case=True for you. So startswith("abc", ignore_case=True) is the same as istartswith("abc"), etc.

It’s not possible to ignore case with simple dictionary like access unless you use a predicate: conf[ieq("ifmodule")] gets all top level elements with a name equal to any case variant of “ifmodule” whereas conf["ifmodule"] is a strict case match.

Attribute Access

If you don’t have any predicates for a node, and its name doesn’t conflict with an attribute of the underlying object (which should be rare), you can use attribute access to query for it.

[16]:
conf.doc.Directory.Require
[16]:
Require: all denied
Require: all granted
Require: all granted
Require: all granted
Require: all granted
Require: method GET POST OPTIONS
Require: all granted

Query by Children

If you want to query for nodes based on values of their children, you can use a where clause. It has a few different modes of use.

The first is the same name and value queries as before:

[17]:
conf.doc.Directory.where("Require", "denied")
[17]:
[Directory /]
    AllowOverride: none
    Require: all denied

The second is by using the make_child_query helper that lets you combine multiple “top level” queries that include name and value queries.

[18]:
from insights.parsr.query import make_child_query as q

conf.doc.Directory.where(q("Require", "denied") | q("AllowOverride", "FileInfo"))
[18]:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

Note you can continue the traversal after a where:

[19]:
res = conf.doc.Directory.where(q("Require", "denied") | q("AllowOverride", "FileInfo"))
res.Options
[19]:
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec

The name and value queries inside of q can contain all of the predicates we’ve seen before, and q instances can be combined with & and | and negated with ~.

If you need to compare multiple attributes with each other or with other parts of the config structure, you can pass a function or lambda to where, and it will used to test each entry at the current level.

[20]:
conf.doc.Directory.where(lambda d: "denied" in d.Require.value or "FileInfo" in d.AllowOverride.value)
[20]:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

Truth and Iteration

Nodes are “truthy” depending on whether they have children. They’re also iterable and indexable.

[21]:
res = conf["Blah"]
print("Boolean:", bool(res))
print("Length:", len(res))
print()
print("Iteration:")
for c in conf["Directory"]:
    print(c.value)
print()
print("Indexing:")
print(conf["Directory"][0].value)
print(conf["Directory"][first].value)
print(conf["Directory"][-1].value)
print(conf["Directory"][last].value)

Boolean: False
Length: 0

Iteration:
/
/var/www
/var/www/html
/var/www/cgi-bin
/usr/share/httpd/icons
/home/*/public_html
/usr/share/httpd/noindex

Indexing:
/
/
/usr/share/httpd/noindex
/usr/share/httpd/noindex

This is also true of conf itself:

[22]:
sorted(set(c.name for c in conf))
[22]:
['AddDefaultCharset',
 'AddIcon',
 'AddIconByEncoding',
 'AddIconByType',
 'Alias',
 'DNSSDEnable',
 'DefaultIcon',
 'Directory',
 'DocumentRoot',
 'EnableSendfile',
 'ErrorLog',
 'Files',
 'Group',
 'HeaderName',
 'IfModule',
 'IndexIgnore',
 'IndexOptions',
 'Listen',
 'LoadModule',
 'LocationMatch',
 'LogLevel',
 'ReadmeName',
 'ServerAdmin',
 'ServerRoot',
 'User']

Attributes

The individual results in a result set have a name, value, attributes, children, an immediate parent, a root, and context for their enclosing file that includes its path and their line within it. The code below shows different attributes available on individual entries and results.

[23]:
root = conf.find("ServerRoot")[0]
print("Node name:", root.name)
print("Value:", root.value) # gets the value if the entry has only one. raises and exception if it has more than 1
print("Values:", conf.find("Options").values) # same as above except collects values of current results.
print()
print("Unique Values:", conf.find("Options").unique_values) # same as above except values are unique.
print()
print("Attributes:", root.attrs) # an entry may have multiple values
print("Children:", len(root.children)) #
print("Parent:", conf.find("Options")[0].parent.name)
print("Parents:", conf.find("Options").parents.values) # go up one level from the results.
print("Root:", "conf.find('LogFormat')[0].root # Omitted due to size") # root of current entry.
print("Roots:", "conf.find('LogFormat').roots # Omitted due to size")  # all roots of current results.
print("File: ", root.file_path) # path of the backing file. Not always available.
print("Original Line:", root.line) # raw line from the original source. Not always available.
print("Line Number:", root.lineno) # line number in source of the element. Not always available.
Node name: ServerRoot
Value: /etc/httpd
Values: ['Indexes FollowSymLinks', 'None', 'Indexes MultiViews FollowSymlinks', 'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec', '-Indexes']

Unique Values: ['-Indexes', 'Indexes FollowSymLinks', 'Indexes MultiViews FollowSymlinks', 'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec', 'None']

Attributes: ['/etc/httpd']
Children: 0
Parent: Directory
Parents: ['/var/www/html', '/var/www/cgi-bin', '/usr/share/httpd/icons', '/home/*/public_html', '^/+$']
Root: conf.find('LogFormat')[0].root # Omitted due to size
Roots: conf.find('LogFormat').roots # Omitted due to size
File:  /etc/httpd/conf/httpd.conf
Original Line: ServerRoot "/etc/httpd"
Line Number: 31
[24]:
port = conf.find("Listen").value
print(port)
print(type(port))
80
<class 'int'>

There’s also a .values property that will accumulate all of the attributes of multiple children that match a query. Multiple attributes from a single child are converted to a single string.

[25]:
conf["Directory"]["Options"].values
[25]:
['Indexes FollowSymLinks',
 'None',
 'Indexes MultiViews FollowSymlinks',
 'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec']

Useful functions

In addition to brackets, config trees support other functions for querying and navigating.

find

find searches the entire tree for the query you provide and returns a Result of all elements that match.

[26]:
conf.find("ServerRoot")
[26]:
ServerRoot: /etc/httpd
[27]:
conf.find("Alias")
[27]:
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
[28]:
conf.find("LogFormat")
[28]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

If you want the first or last match, access them with brackets as you would a list:

[29]:
print(conf.find("Alias")[0])
print(conf.find("Alias")[-1])
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
[30]:
r = conf.find("Boom")
print(type(r))
print(r)
<class 'insights.parsr.query.Result'>

Find takes an addition parameter, roots, which defaults to False. If it is False, the matching entries are returned. If set to True, the unique set of ancestors of all matching results are returned.

[31]:
print('conf.find("LogFormat"):')
print(conf.find("LogFormat"))
print()
print('conf.find("LogFormat").parents:')
print(conf.find("LogFormat").parents)
conf.find("LogFormat"):
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

conf.find("LogFormat").parents:
[IfModule log_config_module]
    LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
    LogFormat: %h %l %u %t "%r" %>s %b common

    [IfModule logio_module]
        LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

    CustomLog: logs/ssl_request_log %t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x  %r  %b
    CustomLog: logs/access_log combined

[IfModule logio_module]
    LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

[32]:
conf.find(("IfModule", "logio_module"), "LogFormat")
[32]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
[33]:
conf.find("IfModule", ("LogFormat", "combinedio"))
[33]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

select

select is the primitive query function on which everything else is built. Its parameters operate just like find, and by default it queries like a find that only searches from the top of the configuration tree instead of walking subtrees.

To support the other cases, it takes two keyword arguments. deep=True causes it to search subtrees (default is deep=False). roots=True causes it to return the unique, top level nodes containing a match. This is true even when deep=True. If roots=False, it returns matching leaves instead of top level roots.

  • conf.find(*queries) = conf.select(*queries, deep=True, roots=False)

  • conf[query] = conf.select(query, deep=False, roots=False)

[34]:
print(conf.select("Alias"))
print()
print(conf.select("LogFormat") or "Nothing")
print(conf.select("LogFormat", deep=True))
print(conf.select("LogFormat", deep=True, roots=False))
print()
print(conf.select("LogFormat", deep=True, roots=False)[0])
print(conf.select("LogFormat", deep=True, roots=False)[-1])
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html

Nothing
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

upto

Sometimes you’ve navigated down to a piece of data and want to work back up the tree from what you’ve found. You can do that by chaining .parent attributes or going all the way to the top with .roots. But what if your target isn’t at the top but is also several ancestors away. You can pass a query to .upto that will work its way up the parents of the current results and stop when it finds those that match.

[35]:
conf.find(("Options", "Indexes")).upto("Directory")
[35]:
[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

get_crumbs

What if you’ve issued a search and found something interesting but want to refine your query so you can see where the hits are within the structure? get_crumbs will show you the unique paths down the tree to your current results.

[36]:
print(conf.find("LogFormat"))
print()
print(conf.find("LogFormat").get_crumbs())
print()
print(conf.doc.IfModule.IfModule.LogFormat)
print()
print(conf.doc.IfModule.LogFormat)
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

['IfModule.IfModule.LogFormat', 'IfModule.LogFormat']

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common

Custom Predicates

It’s easy to create your own predicates to use with config trees. They come in parameterized and unparameterized types and can be used against names or attributes. If used in a name position, they’re passed the node’s name. If used in an attribute position, they’re passed the node’s attributes one at a time. If the predicate raises an exception because an attribute is of the wrong type, it’s considered False for that attribute. Note that other attribute of the node can still cause a True result.

[37]:
from insights.parsr.query.boolean import pred, pred2

is_ifmod = pred(lambda x: x == "IfModule")
is_user_mod = pred(lambda x: "user" in x)
divisible_by = pred2(lambda in_val, divisor: (in_val % divisor) == 0)
[38]:
print("Num IfModules:", len(conf[is_ifmod]))
print("User mod checks:", len(conf.find(("IfModule", is_user_mod))))
print("Div by 10?", conf["Listen", divisible_by(10)] or "No matches")
print("Div by 3?", conf["Listen", divisible_by(3)] or "No matches")
Num IfModules: 8
User mod checks: 1
Div by 10? Listen: 80
Div by 3? No matches