Part 2 of Global Directory

The first step towards playing a part in the global directory is publishing your own information. This requires you to setup an LDAP server and edit its contents. Not an easy task, but once setup it is one of those things that churns away in the background without much need for attention.

This article is part of a series of articles about the global directory.

How LDAP Stores and Retrieves Information

LDAP stores its most elementary pieces of information in attributes. These can contain things like a telephone number, a website URL or a photo.

Attributes have a name and a value; here are a few examples:

telephoneNumber: +12345

mail: ellert@schoonoord.nep

labeledURI: http://www.ellertenbrammert.nl/ Our fanclub

labeledURI: sip:reuzengrot@schoonoord.nep

Each line represents an attribute, consisting of a name, a colon and a value. The attributes are formally defined in LDAP, often in an RFC but there are also ways of defining your own, or reusing definitions made by other people. This is indeed common practice.

As part of the attribute specification, the syntax of the value field is declared, and whether there can be multiple attribute values, as is shown for the labeledURI.

Attributes are grouped into objects, also known as entries. Objects have a special attribute called objectClass with the object's class:

objectClass: inetOrgPerson
telephoneNumber: +12345
mail: ellert@schoonoord.nep
labeledURI: http://www.ellertenbrammert.nl/ Our website
labeledURI: sip:reuzengrot@schoonoord.nep
uid: ellert

In this case, the object is of class inetOrgPerson. An object can have more than one class. Each objectClass is either STRUCTURAL or AUXILIARY, and the rule is that each object must have exactly one STRUCTURAL class.

Every objectClass specifies zero or more attributes that MAY be present in an object, and zero or more that MUST be present. An AUXILIARY class is generally used to be able to attach extra attributes. For example, if the labeledURI would not have been supported already in the inetOrgPerson, it might have been allowed with a suitable AUXILIARY class, which is not harmful to do in this case either:

objectClass: inetOrgPerson
objectClass: labeledURIObject
telephoneNumber: +12345
mail: ellert@schoonoord.nep
labeledURI: http://www.ellertenbrammert.nl/ Our website
labeledURI: sip:reuzengrot@schoonoord.nep
uid: ellert

Objects are considered to be atomic units; updates to them are all-or-nothing and they can be retrieved as a unit, in part or in whole, from their given location. That location is defined as a sequence of attribute values:

uid=ellert,dc=schoonoord,dc=nep

The last part, dc=nep, is considered to highest part of the directory. The next part dc=schoonoord zooms in, and uid=ellert goes even furhter. Perhaps there are more objects underneath, or at the same level of this object like uid=brammert,dc=schoonoord,dc=nep. The entire sequence is known as the distinguishedName or DN of an object. The three parts of it are called relative DN or RDN. The lowest/first RDN is uid=ellert, and this attribute/value combination must also occur in the object that it locates. Our object satisfies that property, so it could be located at the presented DN:

dn: uid=ellert,dc=schoonoord,dc=nep
objectClass: inetOrgPerson
objectClass: labeledURIObject
telephoneNumber: +12345
mail: ellert@schoonoord.nep
labeledURI: http://www.ellertenbrammert.nl/ Our website
labeledURI: sip:reuzengrot@schoonoord.nep
uid: ellert

This format, with a dn: line and attributes and finally an empty line, is known as the LDAP Interchange Format, or LDIF. It is used for rather user-unfriendly interactions with a directory. There is a good reason why I advised looking for tools on your platform -- a GUI really helps to work with LDAP.

When going up or down in the LDAP tree, control may shift from one operator to another. This is reflected with referrals: when asking one LDAP server for information it might refer the request to another LDAP server, which it will mention. This is very much like the 3xx errors in HTTP.

To search in a directory, you mention a starting point known as the baseDN, and you express conditions on attributes, all wrapped in brackets in a rather trivial notation:

(|(uid=ellert)(uid=brammert)) under baseDN dc=schoonoord,dc=nep

This is likely to return two objects, namely the one above and his brother's.

Interestingly, LDAP objects can contain references to other objects. For example, to refer to a secretary which in this case would be the farmer's daughter caught by these mischievous giants:

dn: uid=ellert,dc=schoonoord,dc=nep
objectClass: inetOrgPerson
objectClass: labeledURIObject
telephoneNumber: +12345
mail: ellert@schoonoord.nep
labeledURI: http://www.ellertenbrammert.nl/ Our website
labeledURI: sip:reuzengrot@schoonoord.nep
uid: ellert
secretary: uid=jaantien,ou=Reuzengrot,dc=schoonoord,dc=nep

Likewise, Jaantien might have a manager attribute pointing to one of the giants.

The objects may include credentials such as (encrypted) passwords or X.509 certificates, which may be used to authenticate as (the real-world analogue of) an object. This is usually required before changes to a directory are possible.

Expressive power: This last example helps to clarify what the expressive power of LDAP is. Unlike SQL, where arbitrary expressions can be used to link together pieces of data in the database, the search filters of LDAP apply to one object at a time. It supports all logical compositions of comparisons between an attribute name and a value (possibly with * wildcards), but you cannot compare two attributes in the same or different objects. You could make a query for all objects that have a secretary, and you could make a query for Ellert's secretary, but you cannot in one query retrieve a list of people paired with their secretary. This model is very close to an object database model and so it satisfies the requirements of many computer programs, but it is useful to understand the limitation. For a global directory, LDAP has just the right size.

Introducing "razor", a Privacy Guard for LDAP

It may be problematic to publish personally identifying (contact) information, for reasons of privacy. One might be more tempted to respond to "give my all contact details for user smid@orvelte.nep" than "give me all contact details for users @orvelte.nep". The former query would normally be made by a party who ran into a user's account information, while the latter is a sign of someone trying to harvest more than you might be willing to share. The "razor" overlay distinguishes between such cases by revealing certain attributes or entire objects only when they match exactly with a query.

As an example, imagine a pseudonym uid=99877 being created, with an email address and other contact details, pointing to the node that describes the person itself:

dn: uid=99877
objectClass: posixAccount
uid: 99877
mail: 99877@orvelte.nep
labeledURI: sip:99877@orvelte.nep Call me during business hours
labeledURI: xmpp:99877@orvelte.nep Chat with me
seeAlso: cn=Harm Harmszoon,ou=Smidse,dc=orvelte,dc=nep

This could be one of multiple such records; they could be used as pseudonyms. It is useful from a privacy perspective to conceal such pseudonyms from generic inquiries, and to only show it to the limited group of remote peers that communicate with Harm Harmszoon using this online identity. To those, it is possible to retrieve all contact details if they only know one, and either query for that or for uid=99877,dc=orvelte,dc=nep. To this end, one would specify that one of the identifying attributes must be present for the object to pass:

razorObjectFilter: uid mail labeledURI

Note that this specifies that either of these attributes must be present; this is different from specifying that they must all be present:

razorObjectFilter: uid
razorObjectFilter: mail
razorObjectFilter: labeledURI

The forms can be combined, and an attribute can even be present in multiple lines; for example, if the uid must be combined with either mail or labeledURI to permit retrieval, one would specify:

razorObjectFilter: uid mail
razorObjectFilter: uid labeledURI*

Normally, the query may not specify * wildcards at all; if wildcards at the beginning or end are permitted, but should be removed from the query. The * asterisk after the attribute name means that not just equality matches are permitted against the labeledURI attribute, but substring matches as well. The fact that the * asterisk is postfixed implies no constraint on where in the search filter these * wildcards may occur; more control could however be exercised by modifying or removing certain search filters with the RWM module.

There is a second configuration option that makes the decisions per attribute, principally leaving the other attributes alone:

razorAttributeFilter: uid mail labeledURI*

Imagine an object with multiple uid values. Filtering on uid at the object level would leave the entire object if one requested uid is mentioned; additionally filtering for single attributes could be used to remove uid values that were not requested.

Exempting Authenticated Users

The "razor" overlay only impacts anonymous queries. The assumption is that authenticated users are local and trusted, and may access data without other restraint other than what is configured in the ACL.

Dropping Attributes regardless of Matching

TODO: Do we want/need this? An option that always removes particular attributes could be configured in an ACL?

Dropping Unmatched Attributes

Attributes can be listed to require an exact match in a search result, or else they will not be returned in the search result. This is done with a configuration option:

razorAttributeFilter: uid mail

Referential Integrity: Any uid or mail attribute in a prospective result is removed unless its value is exactly matched in a (uid=) or (mail=) filter without any * wildcards. When dropping the last of an attribute that is declared as a MUST in one of the object classes, then the entire object is removed from the search result.

Dropping Objects with Unmatched Attributes

As an alternative to removing individual attributes, it is also possible to remove entire objects. This is configured with another option:

razorObjectFilter: sn telephoneNumber

When an object that may be returned from a search contains one of the configured attributes, but its value is not matched in a search filter (sn=) or (telephoneNumber=) without any * wildcards, then the entire object is removed from the search result.

Dropping Objects with Unmatched RDNs

Another place where an attribute could be leaked, is in an RDN that is not part of the baseDN of a search operation. To this end, those RDNs are investivated for the attributes that are specified in razorSingleAttributes and razorObjectAttributes configuration parameters. Values that occur and that are not matched exactly in the search filter without any * wildcards are entirely removed from the search result.

Clever handling of labeledURI wildcards

It may be desirable to avoid too generous searches of labeledURI values, especially when these are used to provide things like SIP and XMPP addresses. On the other hand, labeledURI fields may contain an optional comment that would not generally be considered a problem if matched liberally. So, the choice between labeledURI and labeledURI* is difficult. The solution is to permit substring matches with the latter form, but to constrain the permissable search patterns in an RWM overlay:

rwm-rewriteContext searchFilter
...
# (labeledURI=xx*x) -> False
# (labeledURI=xx*x xxx) -> False
rwm-rewriteRule "(.*)\\\\((labeledURI=[^*() ]*)\\\\*[^()]*\\\\)(.*)" "$1(|)$3" ""

This rule expresses that a match with a * wildcard in the URI part of a labeledURI is never acceptable. Note however, that there should be room for a trailing * wildcard --which is to be expected from clients that wish to express that a labeledURI may be followed by a space and a comment-- so something to do first, is permit that firm by inserting a space before the * wildcard:

rwm-rewriteContext searchFilter
# (labeledURI=xxx*) -> (|(labeledURI=xxx)(labeledURI=xxx *))
rwm-rewriteRule "(.*)\\\\((labeledURI=[^*() ]+)\\\\*\\\\)(.*)" "$1(|($2)($2 *))$3" ""
# (labeledURI=xx*x) -> False
# (labeledURI=xx*x xxx) -> False
rwm-rewriteRule "(.*)\\\\((labeledURI=[^*() ]*)\\\\*[^()]*\\\\)(.*)" "$1(|)$3" ""

After this, it is possible to use labeledURI* because there will be no * wildcards in the URI part.

Installing the "razor" Overlay

This section only applies if you are interested in the facilities of the "razor" overlay as described above. Otherwise, you can proceed to the next section with a standard OpenLDAP installation.

TODO -- the "razor" overlay has not been implemented yet.

Setting up your Node on the Global Directory

TODO -- setup OpenLDAP and configuration file.

TODO -- enter initial objects.

TODO -- enter user objects.

The last step is to actually publish your server. You probably need to forward port 389 on your NAT box and open up a hole in your IPv6 firewall. After that, you can announce your service in DNS. This is generally done with a format like:

_ldap._tcp.orvelte.nep.        IN SRV 10 10 389 ldap.orvelte.nep.

Working with a Directory

You will want to look for a good tool to access LDAP. This might be a web-wrapper or a desktop utility addressing the directory over a network connection, of course using LDAP itself. You would need to authenticate, so an account must be setup prior to doing this. Some tools enable the import and export of vCards, which may be easier to work with. In general, it is useful to learn about commonly used attributes in LDAP: cn or commonName, o or organization, c or country, ou or organizationalUnit, uid for userID and dc for domainComponent. LDAP can be pesky when it enforces data integrity while you are trying to enter data, but that usually comes down to a need to learn what LDAP thinks about what you are trying to do.

Replicated Service

Depending on how important you think your directory information is, you might consider replicating it, rather than just making local backups. Replication is best done with SyncRepl, and would normally lead to quick updates on other nodes. There is a lot of variety here, please visit OpenLDAP.org for details.

It is probably worth mentioning that LDAP clients can also request SyncRepl service to stay up to date of changes.

Next: Configuring an Outgoing Hub with OpenLDAP