Configuration Trees
Most configurations can be modeled as trees where a node has a name, some optional attributes, and some optional children. This includes systems that use yaml, json, and ini as well as systems like httpd, nginx, multipath, logrotate and many others that have custom formats. Many also have a primary configuration file with supplementary files included by special directives in the main file.
We have developed parsers for common configuration file formats as well as the custom formats of many systems. These parsers all construct a tree of the same primitive building blocks, and their combiners properly handle include directives. The final configuration for a given system is a composite of the primary and supplementary configuration files.
Since the configurations are parsed to the same primitives to build their trees, we can navigate them all using the same API.
This tutorial will focus on the common API for accessing config trees. It uses httpd configuration as an example, but the API is exactly the same for other systems.
[1]:
from __future__ import print_function
import sys
sys.path.insert(0, "../..")
[2]:
from insights.combiners.httpd_conf import get_tree
from insights.parsr.query import *
conf = get_tree()
conf
now contains the consolidated httpd configuration tree from my machine. The API that follows is exactly the same for nginx, multipath, logrotate, and ini parsers. Yaml and Json parsers have a .doc
attribute that exposes the same API. They couldn’t do so directly for backward compatibility reasons.
Complex Queries
In addition to simple queries that match node names, more complex queries are supported. For example, to get the “Directory” node for “/”, we can do the following:
[6]:
conf["Directory", "/"]
[6]:
[Directory /]
AllowOverride: none
Require: all denied
The comma constructs a tuple, so conf["Directory", "/"]
and conf[("Directory", "/")]
are equivalent. The first element of the tuple exactly matches the node name, and subsequent elements exactly match any of the node’s attributes. Notice that this is still a query, and the result behaves like a list:
[7]:
conf["Directory", "/", "/var/www"]
[7]:
[Directory /]
AllowOverride: none
Require: all denied
[Directory /var/www]
AllowOverride: None
Require: all granted
That’s asking for Directory nodes with any attribute exactly matching any of “/” or “/var/www”. These can be chained with more brackets just like the simpler queries shown earlier.
Predicates
In addition to exact matches, predicates can be used to better express what you want:
[8]:
conf["Directory", startswith("/var/www")]
[8]:
[Directory /var/www]
AllowOverride: None
Require: all granted
[Directory /var/www/html]
Options: Indexes FollowSymLinks
AllowOverride: None
Require: all granted
[Directory /var/www/cgi-bin]
AllowOverride: None
Options: None
Require: all granted
[9]:
conf[contains("Icon")]
[9]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*
AddIcon: /icons/binary.gif .bin .exe
AddIcon: /icons/binhex.gif .hqx
AddIcon: /icons/tar.gif .tar
AddIcon: /icons/world2.gif .wrl .wrl.gz .vrml .vrm .iv
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip
AddIcon: /icons/a.gif .ps .ai .eps
AddIcon: /icons/layout.gif .html .shtml .htm .pdf
AddIcon: /icons/text.gif .txt
AddIcon: /icons/c.gif .c
AddIcon: /icons/p.gif .pl .py
AddIcon: /icons/f.gif .for
AddIcon: /icons/dvi.gif .dvi
AddIcon: /icons/uuencoded.gif .uu
AddIcon: /icons/script.gif .conf .sh .shar .csh .ksh .tcl
AddIcon: /icons/tex.gif .tex
AddIcon: /icons/bomb.gif core.
AddIcon: /icons/back.gif ..
AddIcon: /icons/hand.right.gif README
AddIcon: /icons/folder.gif ^^DIRECTORY^^
AddIcon: /icons/blank.gif ^^BLANKICON^^
DefaultIcon: /icons/unknown.gif
[10]:
conf[contains("Icon"), contains("zip")]
[10]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip
Predicates can be combined with boolean logic. Here are all the top level nodes with “Icon” in the name and attributes that contain “CMP” and “zip”. Note the helper any_
(there’s also an all_
) that means any attribute must succeed.
[11]:
conf[contains("Icon"), any_(contains("CMP")) & any_(contains("zip"))]
[11]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
Here are the entries with all attributes not starting with “/”
[12]:
conf[contains("Icon"), all_(~startswith("/"))]
[12]:
AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*
Several predicates are provided: startswith, endswith, contains, matches, lt, le, gt, ge, and eq. They can all be negated with ~ (not) and combined with & (boolean and) and | (boolean or).
It’s also possible to filter results based on whether they’re a Section
or a Directive
.
[13]:
conf.find(startswith("Directory"))
[13]:
[Directory /]
AllowOverride: none
Require: all denied
[Directory /var/www]
AllowOverride: None
Require: all granted
[Directory /var/www/html]
Options: Indexes FollowSymLinks
AllowOverride: None
Require: all granted
DirectoryIndex: index.html
[Directory /var/www/cgi-bin]
AllowOverride: None
Options: None
Require: all granted
[Directory /usr/share/httpd/icons]
Options: Indexes MultiViews FollowSymlinks
AllowOverride: None
Require: all granted
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
[Directory /usr/share/httpd/noindex]
AllowOverride: None
Require: all granted
[14]:
query = startswith("Directory")
print("Directives:")
print(conf.find(query).directives)
print()
print("Sections:")
print(conf.find(query).sections)
print()
print("Chained filtering:")
print(conf.find(query).sections["Options"])
Directives:
DirectoryIndex: index.html
Sections:
[Directory /]
AllowOverride: none
Require: all denied
[Directory /var/www]
AllowOverride: None
Require: all granted
[Directory /var/www/html]
Options: Indexes FollowSymLinks
AllowOverride: None
Require: all granted
[Directory /var/www/cgi-bin]
AllowOverride: None
Options: None
Require: all granted
[Directory /usr/share/httpd/icons]
Options: Indexes MultiViews FollowSymlinks
AllowOverride: None
Require: all granted
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
[Directory /usr/share/httpd/noindex]
AllowOverride: None
Require: all granted
Chained filtering:
Options: Indexes FollowSymLinks
Options: None
Options: Indexes MultiViews FollowSymlinks
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Notice that conf[startswith("Dir")].sections
is not the same as conf.sections[startswith("Dir")]
. The first finds all the top level nodes that start with “Dir” and then filters those to just the sections. The second gets all of the top level sections and then searches their children for nodes starting with “Dir.”
[15]:
print("Top level Sections starting with 'Dir':")
print(conf[startswith("Dir")].sections)
print()
print("Children starting with 'Dir' of any top level Section:")
print(conf.sections[startswith("Dir")])
Top level Sections starting with 'Dir':
[Directory /]
AllowOverride: none
Require: all denied
[Directory /var/www]
AllowOverride: None
Require: all granted
[Directory /var/www/html]
Options: Indexes FollowSymLinks
AllowOverride: None
Require: all granted
[Directory /var/www/cgi-bin]
AllowOverride: None
Options: None
Require: all granted
[Directory /usr/share/httpd/icons]
Options: Indexes MultiViews FollowSymlinks
AllowOverride: None
Require: all granted
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
[Directory /usr/share/httpd/noindex]
AllowOverride: None
Require: all granted
Children starting with 'Dir' of any top level Section:
DirectoryIndex: index.html
Ignoring Case
All of the predicates parsr.query defines take an ignore_case
keywork parameter. They also have versions with an i
prefix that pass ignore_case=True
for you. So startswith("abc", ignore_case=True)
is the same as istartswith("abc")
, etc.
It’s not possible to ignore case with simple dictionary like access unless you use a predicate: conf[ieq("ifmodule")]
gets all top level elements with a name equal to any case variant of “ifmodule” whereas conf["ifmodule"]
is a strict case match.
Attribute Access
If you don’t have any predicates for a node, and its name doesn’t conflict with an attribute of the underlying object (which should be rare), you can use attribute access to query for it.
[16]:
conf.doc.Directory.Require
[16]:
Require: all denied
Require: all granted
Require: all granted
Require: all granted
Require: all granted
Require: method GET POST OPTIONS
Require: all granted
Query by Children
If you want to query for nodes based on values of their children, you can use a where
clause. It has a few different modes of use.
The first is the same name and value queries as before:
[17]:
conf.doc.Directory.where("Require", "denied")
[17]:
[Directory /]
AllowOverride: none
Require: all denied
The second is by using the make_child_query
helper that lets you combine multiple “top level” queries that include name and value queries.
[18]:
from insights.parsr.query import make_child_query as q
conf.doc.Directory.where(q("Require", "denied") | q("AllowOverride", "FileInfo"))
[18]:
[Directory /]
AllowOverride: none
Require: all denied
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
Note you can continue the traversal after a where:
[19]:
res = conf.doc.Directory.where(q("Require", "denied") | q("AllowOverride", "FileInfo"))
res.Options
[19]:
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
The name and value queries inside of q
can contain all of the predicates we’ve seen before, and q
instances can be combined with &
and |
and negated with ~
.
If you need to compare multiple attributes with each other or with other parts of the config structure, you can pass a function or lambda to where, and it will used to test each entry at the current level.
[20]:
conf.doc.Directory.where(lambda d: "denied" in d.Require.value or "FileInfo" in d.AllowOverride.value)
[20]:
[Directory /]
AllowOverride: none
Require: all denied
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
Truth and Iteration
Nodes are “truthy” depending on whether they have children. They’re also iterable and indexable.
[21]:
res = conf["Blah"]
print("Boolean:", bool(res))
print("Length:", len(res))
print()
print("Iteration:")
for c in conf["Directory"]:
print(c.value)
print()
print("Indexing:")
print(conf["Directory"][0].value)
print(conf["Directory"][first].value)
print(conf["Directory"][-1].value)
print(conf["Directory"][last].value)
Boolean: False
Length: 0
Iteration:
/
/var/www
/var/www/html
/var/www/cgi-bin
/usr/share/httpd/icons
/home/*/public_html
/usr/share/httpd/noindex
Indexing:
/
/
/usr/share/httpd/noindex
/usr/share/httpd/noindex
This is also true of conf itself:
[22]:
sorted(set(c.name for c in conf))
[22]:
['AddDefaultCharset',
'AddIcon',
'AddIconByEncoding',
'AddIconByType',
'Alias',
'DNSSDEnable',
'DefaultIcon',
'Directory',
'DocumentRoot',
'EnableSendfile',
'ErrorLog',
'Files',
'Group',
'HeaderName',
'IfModule',
'IndexIgnore',
'IndexOptions',
'Listen',
'LoadModule',
'LocationMatch',
'LogLevel',
'ReadmeName',
'ServerAdmin',
'ServerRoot',
'User']
Attributes
The individual results in a result set have a name, value, attributes, children, an immediate parent, a root, and context for their enclosing file that includes its path and their line within it. The code below shows different attributes available on individual entries and results.
[23]:
root = conf.find("ServerRoot")[0]
print("Node name:", root.name)
print("Value:", root.value) # gets the value if the entry has only one. raises and exception if it has more than 1
print("Values:", conf.find("Options").values) # same as above except collects values of current results.
print()
print("Unique Values:", conf.find("Options").unique_values) # same as above except values are unique.
print()
print("Attributes:", root.attrs) # an entry may have multiple values
print("Children:", len(root.children)) #
print("Parent:", conf.find("Options")[0].parent.name)
print("Parents:", conf.find("Options").parents.values) # go up one level from the results.
print("Root:", "conf.find('LogFormat')[0].root # Omitted due to size") # root of current entry.
print("Roots:", "conf.find('LogFormat').roots # Omitted due to size") # all roots of current results.
print("File: ", root.file_path) # path of the backing file. Not always available.
print("Original Line:", root.line) # raw line from the original source. Not always available.
print("Line Number:", root.lineno) # line number in source of the element. Not always available.
Node name: ServerRoot
Value: /etc/httpd
Values: ['Indexes FollowSymLinks', 'None', 'Indexes MultiViews FollowSymlinks', 'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec', '-Indexes']
Unique Values: ['-Indexes', 'Indexes FollowSymLinks', 'Indexes MultiViews FollowSymlinks', 'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec', 'None']
Attributes: ['/etc/httpd']
Children: 0
Parent: Directory
Parents: ['/var/www/html', '/var/www/cgi-bin', '/usr/share/httpd/icons', '/home/*/public_html', '^/+$']
Root: conf.find('LogFormat')[0].root # Omitted due to size
Roots: conf.find('LogFormat').roots # Omitted due to size
File: /etc/httpd/conf/httpd.conf
Original Line: ServerRoot "/etc/httpd"
Line Number: 31
[24]:
port = conf.find("Listen").value
print(port)
print(type(port))
80
<class 'int'>
There’s also a .values
property that will accumulate all of the attributes of multiple children that match a query. Multiple attributes from a single child are converted to a single string.
[25]:
conf["Directory"]["Options"].values
[25]:
['Indexes FollowSymLinks',
'None',
'Indexes MultiViews FollowSymlinks',
'MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec']
Useful functions
In addition to brackets, config trees support other functions for querying and navigating.
find
find
searches the entire tree for the query you provide and returns a Result
of all elements that match.
[26]:
conf.find("ServerRoot")
[26]:
ServerRoot: /etc/httpd
[27]:
conf.find("Alias")
[27]:
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
[28]:
conf.find("LogFormat")
[28]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
If you want the first or last match, access them with brackets as you would a list:
[29]:
print(conf.find("Alias")[0])
print(conf.find("Alias")[-1])
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
[30]:
r = conf.find("Boom")
print(type(r))
print(r)
<class 'insights.parsr.query.Result'>
Find takes an addition parameter, roots
, which defaults to False
. If it is False
, the matching entries are returned. If set to True
, the unique set of ancestors of all matching results are returned.
[31]:
print('conf.find("LogFormat"):')
print(conf.find("LogFormat"))
print()
print('conf.find("LogFormat").parents:')
print(conf.find("LogFormat").parents)
conf.find("LogFormat"):
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
conf.find("LogFormat").parents:
[IfModule log_config_module]
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
[IfModule logio_module]
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
CustomLog: logs/ssl_request_log %t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x %r %b
CustomLog: logs/access_log combined
[IfModule logio_module]
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
[32]:
conf.find(("IfModule", "logio_module"), "LogFormat")
[32]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
[33]:
conf.find("IfModule", ("LogFormat", "combinedio"))
[33]:
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
select
select
is the primitive query function on which everything else is built. Its parameters operate just like find
, and by default it queries like a find
that only searches from the top of the configuration tree instead of walking subtrees.
To support the other cases, it takes two keyword arguments. deep=True
causes it to search subtrees (default is deep=False
). roots=True
causes it to return the unique, top level nodes containing a match. This is true even when deep=True
. If roots=False
, it returns matching leaves instead of top level roots.
conf.find(*queries) = conf.select(*queries, deep=True, roots=False)
conf[query] = conf.select(query, deep=False, roots=False)
[34]:
print(conf.select("Alias"))
print()
print(conf.select("LogFormat") or "Nothing")
print(conf.select("LogFormat", deep=True))
print(conf.select("LogFormat", deep=True, roots=False))
print()
print(conf.select("LogFormat", deep=True, roots=False)[0])
print(conf.select("LogFormat", deep=True, roots=False)[-1])
Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html
Nothing
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
upto
Sometimes you’ve navigated down to a piece of data and want to work back up the tree from what you’ve found. You can do that by chaining .parent
attributes or going all the way to the top with .roots
. But what if your target isn’t at the top but is also several ancestors away. You can pass a query to .upto
that will work its way up the parents of the current results and stop when it finds those that match.
[35]:
conf.find(("Options", "Indexes")).upto("Directory")
[35]:
[Directory /var/www/html]
Options: Indexes FollowSymLinks
AllowOverride: None
Require: all granted
[Directory /usr/share/httpd/icons]
Options: Indexes MultiViews FollowSymlinks
AllowOverride: None
Require: all granted
[Directory /home/*/public_html]
AllowOverride: FileInfo AuthConfig Limit Indexes
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
Require: method GET POST OPTIONS
get_crumbs
What if you’ve issued a search and found something interesting but want to refine your query so you can see where the hits are within the structure? get_crumbs
will show you the unique paths down the tree to your current results.
[36]:
print(conf.find("LogFormat"))
print()
print(conf.find("LogFormat").get_crumbs())
print()
print(conf.doc.IfModule.IfModule.LogFormat)
print()
print(conf.doc.IfModule.LogFormat)
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
['IfModule.IfModule.LogFormat', 'IfModule.LogFormat']
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
Custom Predicates
It’s easy to create your own predicates to use with config trees. They come in parameterized and unparameterized types and can be used against names or attributes. If used in a name position, they’re passed the node’s name. If used in an attribute position, they’re passed the node’s attributes one at a time. If the predicate raises an exception because an attribute is of the wrong type, it’s considered False
for that attribute. Note that other attribute of the node can still cause a True
result.
[37]:
from insights.parsr.query.boolean import pred, pred2
is_ifmod = pred(lambda x: x == "IfModule")
is_user_mod = pred(lambda x: "user" in x)
divisible_by = pred2(lambda in_val, divisor: (in_val % divisor) == 0)
[38]:
print("Num IfModules:", len(conf[is_ifmod]))
print("User mod checks:", len(conf.find(("IfModule", is_user_mod))))
print("Div by 10?", conf["Listen", divisible_by(10)] or "No matches")
print("Div by 3?", conf["Listen", divisible_by(3)] or "No matches")
Num IfModules: 8
User mod checks: 1
Div by 10? Listen: 80
Div by 3? No matches