School of Population Health


Appendix 1. Lock and key algorithm

The problem of generating all partitions of size p1,...,pq from an attribute set, Tm, of melements is equivalent to the problem of selecting q groups of objects, such that there are piobjects in group i, from a pool of m objects. The "lock and key algorithm" (reference 9), devised by the author, provides the solution implemented in SPAN.

Suppose the m objects are labelled 1,2,...,m. Let Ii denote the set of objects in group i. If, for the moment, the requirement that all objects in one group are not contained in another is ignored, that is, embedding Ij ⊆ Ii is allowed, there are mCpi possibilities for each Ii. We identify each possibility by a key Ki, where Ki = 1,..., mCpi and the objects of the q groups are determined by a set keys K1,...,Kq. By generating a set of keys, the actual objects in each group can be obtained by ünlocking" the combination of each key to obtain Ii.

We discuss generating the keys in a moment. First, consider how to unlock a key K to obtain the combination I = (i1,...,ip) of a group of p objects from m. There are mCp combinations and suppose the keys correspond to an ordering in which ip is incremented the fastest and i1 the slowest. Thus I = (1,2,...,p) corresponds to K = 1, and I = (m-p+1,m-p+2,...,m) corresponds to K = mCp and i1 < i2 < ... < ip. Given i1 there are Li1 = m-i1Cp-1 possibilities for the remainingp-1 objects. Thus, if we define

lockandkey1

with S0 = 0 and find an i such that Si-1 < K ≤ Si then the first digit is i1 = i. The remaining combination, i2,...,ip is now a permutation of the digits i+1,...,m and K-Si-1 is the key to unlock the remaining combination. Thus with K , m and p reset to K-Si-1m-i, and p-1 respectively we proceed in the same way to find the second digit and so on.

To generate the key combinations K1,..., Kq consider first the case when p1 = ,..., = pq = p. There are c = mCp possible values for each Ki and, to avoid embedding, these have to be distinct. Therefore the problem is that of enumerating the cCq ways to select q objects from c, each enumeration satisfying K1 < K2 < ... < Kq. When the pi's are not all equal, the keys do not necessarily have to satisfy the full inequality K1 < K2 < ... < Kq unless pi-1 = pi for some iand it is then necessary to ensure that Ki-1 < Ki. Subject to these constraints the sequence of keys can be generated by the simulated nested loop device of Gentleman (reference 3). Except in the special case p1 = ,..., = pq = p the algorithm will generate some Ij that are embedded, and in this case it is necessary to check for embedding.

[Back to table of contents]