Part 3 of Global Directory
Some LDAP clients are full-blown implementations of the protocol, others are just very simple. Some tools are technical and focus on a particular task, others permit users to browse through information such as contact details. This variety of angles on LDAP makes it useful to setup a somewhat clever service that makes the global directory much more usable.
This article is part of a series of articles about the global directory.
Layering for the Perfect Outgoing Hub
Assuming that we want a perfect OpenLDAP setup to service whatever LDAP client could come on our way, and yield as much connectivity to the global directory as possible, we need quite a complex setup. We will base our setup on layers of OpenLDAP, known as Overlays that gradually mold and modify LDAP queries to improve their behaviour.
Rock bottom: DNSSRV referrer
LDAP servers in the global directory publish an SRV record named ldap.tcp under their domain name, pointing to the LDAP server that can handle queries under the dc=,dc= prefix matching the domain name. For searches under baseDNs that end in dc=,dc= the DNS SRV module is used to locate this remote server.
Note however, that the DNS SRV backend is unfit as a basis for the following layering architecture; it will not properly handle overlays, so a plain hdb/bdb backend is used instead, and the DNS SRV service will have to run as a separate LDAP service, queried by an LDAP endpoint in the stack for the Outgoing Hub.
It is probably a good idea to avoid making DNS queries for anything but proper TLDs, so as to not overload DNS and LDAP with wild queries of half-typed names. To this end, it is possible to fill the underlying hdb/bdb database with entries for all the TLDs (and/or for SLDs in countries that use them, like co.jp) referring to the DNS SRV resolver that locates the LDAP server for the dc=,dc= requested.
Hiding referrals: CHAIN overlay
When queries are responded to with "go look there" referrals, there is no reason to pump that up and down our stack. In the end it is better to handle in the Outgoing Hub anyway, because not all clients are capable of chasing such referrals. The chaining overlay will pickup referrals and chase them on behalf of the client. After overlaying chaining, the directory really starts to look like a global directory.
Since SRV queries are sent to other hosts, there is some fragility involved, and potential delay. For that reason, the chaining overlay can cache referrals and the way they are resolved.
Speedup: Query caching
The next layer is intended to make it possible to reuse query results. The PCACHE or PROXYCACHE overlay implements just that; it even understands that all answers to a query for (x>1) are contained in a cached query for (x>0) so it is pretty clever.
Note that all DNs at this point are the DNs of the global cache. We will diversify DNs later, but at the level of the query cache they will have come together as the one original object that sits out there, somehwere in the global directory.
Trampoline: Frogger supports the Big Jump
This layer introduces a few things that we were missing to get the best possible Outgoing Hub that we could imagine. It is introduced in detail below. It edits DNs, searches and search results, all from the mindset that all LDAP clients should be in the best possible position to cross the Internet highway successfully.
Rewriting: Removed or Deferred
TODO: This is the client side. There may be some utility in rewriting search strings to better suit the requirements of a server, but the references below are invalid because they speak of server-side filtering. Clearly, the RWM overlay makes sense there, but it remains a question if the client-side really benefits.
The original intention was to use the RWM overlay for rewriting the things that Frogger is now doing. Attempts to make this work quickly ran into limitations of the RWM module, and also demonstrated that very accurate regular expression knowledge was required. I therefore decided to simplify the operator's life and put it into the "frogger" overlay.
There are a few uses for simple rewriting though; they could help to trim down search filters to a form that is acceptable to privacy concerns. The OpenPGP directory article provides an example of this.
Editing: Translucent objects
The final layer supports editing of objects, based on ACL settings in the underlying database. This is achieved with the TRANSLUCENT overlay provided by OpenLDAP.
As there can be a different baseDN for every individual directory configured in an LDAP client, it is possible to have more than one mirrors of the global directory; at the level of the translucent overlay, these mirrors have distinct DNs, meaning that the changes to objects in these mirrors are separately stored. In other words, such client versions of the global directory can be modified independently. Only when the ACL permits sharing a view on the same mirror will it be possible that one client can observe another client's edits.
Introducing "frogger", an Outgoing Hub Overlay for OpenLDAP
Various LDAP processing requirements of an outgoing hub can be handled with OpenLDAP, but I found that not all I wanted was possible yet. I decided to create my own overlay, which is OpenLDAP lingo for a modifier of a query being processed, to solve the issues I did not find addressed.
The name "frogger" stems from the old arcade-style game named Frogger, where a poor little frog has to cross a highway to fulfil an unspecified goal presumably of a species-propagating nature. Like the frog, LDAP needs a little help to steer its way accross the Internet highway and to make it to the other side.
Mirrorring the World in a Subtree
One of the requirements I had is that I wanted to mirror the global directory under my own domain. More accurately, in a location where I can edit them. OpenLDAP provides an overlay named "translucent" that will let queries shine through, but that will capture changes and store them locally. Any future references to such edited objects are modified accordingly, but the original object can still change and those changes will shine through the overlay.
Local edits on remotely maintained contact information, that is outright useful. It enables us to annotate or improve an object's data in any way we see fit, without anyone else seeing it. This is a powerful facility for all sorts of contact management applications. It also supports provision of accounts or other technical infrastructure to an object anywhere in the global directory, as if they were local users. This would be especially useful if those objects had public key material included in their directory entries.
Regardless of the purpose of local editing, it is useful to continue to be able to see the original object as well. To this end, it is useful to present the object in another location, especially in a directory area that you may edit. This comes down to mirrorring the global directory under your account. You would setup a baseDN that reflects this requirement.
Technically, what this means is that you setup a baseDN of the form sth=,sth=,dc=,dc= where sth=,sth= stands for one or more RDNs in immediate sequence that are not of the dc= form; the dc=,dc= stands for one or more dc= RDNs in immediate sequence. Imagine having an object from the global directory underneath; that would look like sth=,sth=,dc=,dc=,sth=,sth=,dc=,dc= and this interrupted sequence of dc= RDNs is meaningless, except when interpreted as a mirror of the global directory under an object.
The "frogger" overlay can cut the tree in half, and propagate an LDAP request in downward direction with only the embedded DN shown to the global directory. When entries are returned, it will postfix the last part again, and thus return these entries as though they were found under your own node. In fact, the baseDN fulfils no real purpose for a global directory, so it is always considered as an attempt to embed the global directory underneath. You can of course specify an empty baseDN if you do not want to mirror the global directory.
# TODO: Potential configuration options: olcFroggerMirrorBasePattern: .*
Expanding Mail Addresses, URIs and Phone Numbers
Given that the global directory follows a strict format --every DN must end in dc=,dc= form to have a DNS reference-- it is not possible to search for all the things that we would use if the directory were local. This is inherent in the idea of a global directory, which needs some way of locating remote nodes.
It is possible to expand commonly used identifiers for online use and turn them into the dc=,dc= form -- an email address like firstname.lastname@example.org could be turned into a (uid=ellert) search under dc=schoonoord,dc=nep and a weblocation http://brammert.schoonoord.nep/ could be represented as dc=brammert,dc=schoonoord,dc=nep. Since the URI format is very general, it is possible to retrieve uid and dc components from it, and expand the queries containing them to a form that can be used on the global directory. The result is that you can lookup email addresses and other URIs in the global directory. A few attributes that carry such values in LDAP are mail, labeledURI and telephoneNumber.
The expansion for a telephoneNumber can be achieved by treating it as an ENUM domain; a number like +12345 is then found as uid=12345,dc=5,dc=4,dc=3,dc=2,dc=1,dc=e164,dc=tree where the last two represent the e164.tree root for the ENUM tree.
While expanding attributes, multiple dc=,dc= forms may pop up, also from the original search request. In such cases, "frogger" will group requests based on the dc=,dc= assumed, and expand only those attributes before querying. It passes over all dc=,dc= values that are requested by the expanded query. To avoid duplicate answers, the entries will be filtered before passed on: entries are removed if they fall under another, more specific dc=,dc= that is also queried as part of this search.
Consider searching for (email@example.com) under uid=bakker,dc=orvelte,dc=nep -- there are two dc=,dc= to take into account, namely the baseDN of the original request and the domain name in the mail address. Since they differ, they are queried separately.
- One query is for (firstname.lastname@example.org) under uid=bakker,dc=orvelte,dc=nep at the LDAP server located under the orvelte.nep domain -- nothing will be appended to search results;
- One query is for (|(email@example.com)(uid=brammert)) under dc=schooonoord,dc=nep -- the original baseDN is removed from the query but uid=bakker,dc=orvelte,dc=nep will be appended to any search results; a found entry uid=brammert,o=Cave Enterprises,dc=schoonoord,dc=nep would be turned into uid=brammert,o=Cave Enterprises,dc=schoonoord,dc=nep,uid=bakker,dc=orvelte,dc=nep and right there it can be edited.
This logic cannot be built with the existing rewriting facilities of OpenLDAP, such as in the RWM overlay. The existing facilities are based on regular expressions, and theoretically classify as a state machine. To treat the nested structures of queries, one would need a stack machine, which is a theoretic classification of a higher functional level. Practically, it is also very difficult and error-prone to even approach this kind of facility with regular expressions; there is too much to take care of to even get close.
# TODO: Potential configuration options: olcFroggerExpandMailAttribute: mail olcFroggerExpandLabeledURIAttribute: labeledURI olcFroggerExpandTelephoneNumberAttribute: telephoneNumber
Interpretation of commonName
Not all tools will permmit entry of these particular attributes; some are aimed at searching for names in a local directory and will serve you with things like cn. This attribute is widely used in LDAP, so that makes some sense, but not if you need to distill locations to turn to.
The "frogger" module therefore makes a bold attempt to interpret URIs, email addresses and domain names in the cn attribute. It considers the cn as a space-separated list of opportunities for such interpretations. If it finds a possibility, it will append it as an option, using the OR compositional operator as used in search filters.
# TODO: Potential configuration options: olcFroggerInterpretTextAttribute: cn
An important issue involves the use of wildcards in interactive LDAP tools. While typing the cn field, these tools tend to query for the half-done information, with a "*" wildcard appended so the directory returns possible completions, which the tool can show to service the human user. Automated tools never do this of course, but user-operated tools are as important and useful to take into account.
If this behaviour were permitted on the global directory, then directory services all over the World would be bothered with your half-typed results. This is not only unkind to unleash upon them, it also involves potential security hazards. Imagine typing the URI of your favourite .com websites -- they would pass through Columbia, where it could match (cn=somesite.co*) if it were tried while you typed that much.
To avoid this dangerous behaviour, wildcards in the beginning of a search pattern will lead to them being excluded from interpretation; and wildcards at the end exclude the word that this wildcard terminates from interpretation. So, while looking for (cn=sip:firstname.lastname@example.org ellert.co*) the SIP address would be interpreted, but the second word would not.
In terms of end user behaviour, this means that typing a URI in an autocompleting commonName search field is insufficient to get it interpreted because in something like (cn=sip:email@example.com*) the entire SIP URI is excluded from interpretation; you should append a space to turn it into (cn=sip:firstname.lastname@example.org *) where the SIP address is no longer terminated with the wildcard, so it can be interpreted safely. Other tools might also accept pressing Enter to indicate that completion is not needed anymore. It does not seem probably that tools will prefix with a wildcard.
Getting the "frogger" overlay
TODO -- the overlay has not yet been finished.
TODO -- waiting for a frog while sitting in boiling water...