Interval algebra is used to describe relationships between time intervals over a time axis. Constraints are placed between each time interval so that when boolean expressions are constructed, they can be validated with the help of a SAT solver.
There are three main input data structures and some algorithms that allow for the generation of the boolean clauses that describe the constraints between all the time intervals, and they'll be described in the two following sections.
All the classes and methods can be found inside the allen/input
module, so
every reference to Python files will be relative to that.
Since the file format of the input files is briefly described with EBNF, I'll introduce some basic BNF that can be used to understand the rest of the specifications (I'm using the format as specified in here):
ws := ? any ws character ?;
nl := ? newline character ?;
relationship :=
"<" | ">" | "m" | "mi" | "o" | "oi" | "s" |
"si" | "d" | "di" | "f" | "fi" | "=";
relationships_list := "(", ws, {relationship, ws}, ")";
The first data structure is the inverse relationships table, in particular
found in this repository under data/inverse_relationships_table.txt
. It's
quite small and its only purpose is to describe, given a specific relationship
between two time intervals t1 and t2, what would the relationship be if the
time intervals were actually the opposite, that is, it specifies the
relationship that would be present between t2 and t1. The order in which we
define the relationships between two or more time intervals matters, since the
time axis is where the intervals reside. The table has a simple structure and is
defined by a number of rows (in particular 13 rows since all the possible
relationship types are actually 13) where each row contains first the normal
relationship (that we defined before as being from t1 to t2), a double
colon and the inverse relationship (that we defined before as being, this time,
from t2 to t1).
The function read_inverse_relationships_table()
inside inverse_relationships_table.py
module provides a quick way to load all the data inside a practical data
structure defined by the InverseRelationshipsTable
class. This class exactly
mirrors the contents of the file with a list-based data structure where each
item is a tuple.
Each line can briefly be described using BNF as:
line := relationship, ws, "::", ws, relationship, nl;
The second data structure is the ternary constraints table. Let's say that we take into account the existence of three time intervals, called t1, t2 and t3. We also specify that the relationship between t1 and t2 is r_t1t2 and that the relationship between t2 and t3 is r_t2t3, then, because of Allen's interval algebra, only a few select relationships between t1 and t3 will actually make sense, from the whole set. This table thus defines, for a given pair of relationships r_t1t2 and r_t2t3, what's the set of possible relationships that can be found between t1 and t3 and we could call that r_t1t3.
As with the previous structure, the function read_ternary_constraints_table()
inside ternary_constraints_table.py
quickly reads the table from the given
file path and, as before, returns a data structure that exactly mirrors the
contents of the file. In particular, the TernaryConstraintsTable
is the class
instance that is returned by the function. The table contains a series of lines
where each line is composed by the first two relationships r_t1t2 and r_t2t3
separated by a single colon and then a double colon precedes a list enclosed by
round brackets of all the possible r_t1t3 relationships. This table can be
found under this repository in the data/ternary_constraints_table.txt
file.
Each line can briefly be described using BNF as:
line := relationship, ws, ":", ws, relationship, ws, "::", ws, relationships_list, nl;
The third and last data structure is the list of time intervals groups. For each
input file in question, many groups can be contained inside it. A single group
is defined as being a list of relationships between a well known number of time
intervals, where each pair of time intervals might be related by one or multiple
relationships. An example of time intervals groups can be found inside the
data/time_intervals
folder, with various kinds of data sizes.
The function read_time_intervals_table()
reads all the time intervals groups
inside the input file and returns a list of TimeIntervalsGroup
instances
where each one of them represents the exact data provided by the group. As
usual, the file is line-based but the groups have a header that indicates the
number of time intervals, followed by some lines that define the relationships
between two time intervals (each time interval is referenced by a number from zero
up to one less than the total number of time intervals written in the header)
and a final line containing a dot that marks the end of the current group.
Each group can briefly be described with BNF as:
intervals_count := ? any positive integer number ?;
time_interval := ? any non-negative integer number less than the intervals count ?;
header := intervals_count, nl;
line := time_interval, ws, time_interval, ws, "::", relationships_list, nl;
end := ".", nl;
group := header, {line}, end;
Given the input data, some algorithms are used to generate boolean clauses whose satisfiability will be later checked by a SAT solver. In particular, we're always using:
- at least one,
- at most one,
- inverse implication
and the user gets to choose between:
- ternary implication, or
- expression reference.
All the above algorithms will now be explained in the following sections. Before
that, a brief explanation on what a boolean clause is. Generally, in the
context of boolean SAT solvers, the input data is a list of clauses. Each clause
is essentially a boolean expression composed by subsequent OR operators on individual
literals, where each literal can be plain or negated. Thus, a single boolean
clause can be for example x1 x2 -x3
whose actual meaning is x1 OR x2 OR (NOT x3)
.
In our case, each literal can either represent a relationship between two time
intervals (thus described by a triple: t1, t2, and r_t1t2) or it can be
referring to an expression
I will also refer to the configuration, indicating the current time intervals group taken into account and whose algorithms are being applied on.
Considering the fact that most of the time, in the time intervals group, for each set of relationships that can be found between a generic t1 and t2, the size of said set is bigger than one (in other words, it specifies that there can be more than one relationship between t1 and t2), we want to generate a clause that expresses the fact that at least one of those relationships has to be found in the given configuration. We express that constraint by ORing together all the literals into a single clause, where for each clause we work with time intervals t1 and t2 and we compose literals by using, in turn, every relationship that's possible between t1 and t2.
To give an example, let's say that between time intervals 3 and 4 there can only
be relationships before (<
), equal (=
) or overlapping (o
).
We then generate the at least one clause by using three literals, one for
each relationship and all of them using the same pair of time intervals. In the
following code block, the resulting clause is explained:
Line describing the relationships in the time intervals group:
3 4 :: ( < = o )
Literals as defined by what's required by the algorithm:
l1: relationship < between time intervals 3 and 4
l2: relationship = between time intervals 3 and 4
l3: relationship o between time intervals 3 and 4
Resulting clauses:
l1 OR l2 OR l3
It's pretty clear that for the clause to be true, it's enough to have one of those literals to be true, that is, to have one of the three possible relationships to actually be the one specified in the given configuration.
This algorithm is pretty similar to the at least one counterpart, but instead specifies the exact opposite: between two time intervals that can be related by more than one possible relationship, generate a clause that makes sure at most one of those relationships is the one given in the configuration and not more.
To do this, we construct not one but a series of clauses each indicating that
it's not possible for two relationships to be specified in the configuration at
the same time. Let's say that we have two time intervals 3 and 4, as before and
that the relationships between them are before (<
), equal (=
) or overlapping
(o
). We know that it's not possible for the configuration to have both
3 and 4 related by relationships before and equal, or by before and overlapping or again by
equal and overlapping. We construct a boolean expression that's an AND of two
negated literals, each one indicating that between 3 and 4 there's a different
relationship. We do that with all the possible combinations of relationships.
Since the resulting clauses aren't in the format required by the SAT solver that needs all literals ORed together to form a single clause, we need to take advantage of the properties of boolean algebra and use De Morgan's theorem to swap the AND of the two negated literals with the same two non-negated literals ORed together. The code block provides an example:
Generic boolean expression:
!x_{t1, t2, r1} & !x_{t1, t2, r2}
Line describing the relationships in the time intervals group:
3 4 :: ( < = o )
Literals as defined by what's required by the algorithm:
l1: relationship < between time intervals 3 and 4
l2: relationship = between time intervals 3 and 4
l3: relationship o between time intervals 3 and 4
Resulting clauses:
-l1 OR -l2
-l1 OR -l3
-l2 OR -l3
This is quite simpler than the rest of the algorithms as it expresses the fact that, given time intervals t1 and t2, if the relationship between them is something, then it should also be true that if we swap the order of the time intervals, the relationship between them should be the exact inverse of the previous relationship. Here too we take advantage of the propositional logic's rules to change an implication into a valid boolean expression that can be used in a SAT solver.
Generic boolean expression:
x_{t1, t2, r} => x_{t2, t1, inverse_of(r)} which is
!x_{t1, t2, r} | x_{t2, t1, inverse_of(r)}
Line describing the relationships in the time intervals group:
3 4 :: ( < = o )
Literals as defined by what's required by the algorithm:
l1: relationship < between time intervals 3 and 4
l2: relationship > between time intervals 4 and 3 (the inverse of 3 and 4)
Resulting clauses:
-l1 OR l2
This is one of the two algorithms that can be used to complete the list of clauses to be passed to the SAT solver. It puts into practice (actually, boolean expressions) what has been described in the section that describes the data structure. The below code block represents an example:
Generic boolean expression:
x_{t1, t2, r1} & x_{t2, t3, r2} => x_{t1, t3, r3} which is
!(x_{t1, t2, r1} & x_{t2, t3, r2}) | x_{t1, t3, r3} which is
!x_{t1, t2, r1} | !x_{t2, t3, r2} | x_{t1, t3, r3}
Lines describing the relationships in the time intervals group:
3 4 :: ( > )
4 5 :: ( s fi )
Lines needed in the ternary constraints table:
> : s :: ( > d f mi oi )
> : fi :: ( > )
Literals as defined by what's required by the algorithm:
l1: relationship > between time intervals 3 and 4
l2: relationship s between time intervals 4 and 5
l3: relationship fi between time intervals 4 and 5
l4: relationship > between time intervals 3 and 5
l5: relationship d between time intervals 3 and 5
l6: relationship f between time intervals 3 and 5
l7: relationship mi between time intervals 3 and 5
l8: relationship oi between time intervals 3 and 5
l9: relationship > between time intervals 3 and 5
Resulting clauses:
-l1 OR -l2 OR l4 OR l5 OR l6 OR l7 OR l8
-l1 OR -l3 OR l9
To be done.
Once all the required clauses have been generated, the allen/output
module
will take care of everything that's required to generate proper SAT input.
Essentially, each literal for every clause will be identified by a single
positive integer number thus the number_dict.py
provides facilities to uniquely
identify literals across clauses.
When two literals are found in two different clauses, the same integer number is used. Each clause is written by separating each contained literal with a space, eventually negating the integer number that represents the literal if it's negated and adding a zero at the end of the line to mark the end of the clause. Subsequent clauses are written in a separate line.