Generative theory of tonal music

The generative theory of tonal music (GTTM) is a system of music analysis developed by music theorist Fred Lerdahl and linguist Ray Jackendoff.^[1] First presented in their 1983 book of the same title, it constitutes a "formal description of the musical intuitions of a listener who is experienced in a musical idiom"^[1] with the aim of illuminating the unique human capacity for musical understanding.^[2]

The musical collaboration between Lerdahl and Jackendoff was inspired by Leonard Bernstein's 1973 Charles Eliot Norton Lectures at Harvard University, wherein he called for researchers to uncover a musical grammar that could explain the human musical mind in a scientific manner comparable to Noam Chomsky's revolutionary transformational or generative grammar.^[3]

Unlike the major methodologies of music analysis that preceded it, GTTM construes the mental procedures under which the listener constructs an unconscious understanding of music, and uses these tools to illuminate the structure of individual compositions. The theory has been influential, spurring further work by its authors and other researchers in the fields of music theory, music cognition and cognitive musicology.^[4]

Theory

GTTM focuses on four hierarchical systems that shape our musical intuitions. Each of these systems is expressed in a strict hierarchical structure where dominant regions contain smaller subordinate elements and equal elements exist contiguously within a particular and explicit hierarchical level. In GTTM any level can be small-scale or large-scale depending on the size of its elements.

Structures

I. Grouping structure

GTTM considers grouping analysis to be the most basic component of musical understanding. It expresses a hierarchical segmentation of a piece into motives, phrases, periods, and still larger sections.

II. Metrical structure

Metrical structure expresses the intuition that the events of a piece are related to a regular alternation of strong and weak beats at a number of hierarchical levels. It is a crucial basis for all the structures and reductions of GTTM.

III. Time-span reduction

Time-span reductions (TSRs) are based on information gleaned from metrical and grouping structures. They establish tree structure-style hierarchical organizations uniting time-spans at all temporal levels of a work.^[5] The TSR analysis begins at the smallest levels, where metrical structure marks off the music into beats of equal length (or more precisely into attack points separated by uniform time-spans^[6]) and moves through all larger levels where grouping structure divides the music into motives, phrases, periods, theme groups, and still greater divisions. It further specifies a “head” (or most structurally important event) for each time-span at all hierarchical levels of the analysis. A completed TSR analysis is often called a time-span tree.

IV. Prolongational reduction

Prolongational reduction (PR) provides our "psychological" awareness of tensing and relaxing patterns in a given piece with precise structural terms. In time-span reduction, the hierarchy of less and more important events is established according to rhythmic stability. In prolongational reduction, hierarchy is concerned with relative stability expressed in terms of continuity and progression, the movement toward tension or relaxation, and the degree of closure or non-closure. A PR analysis also produces a tree-structure style hierarchical analysis, but this information is often conveyed in a visually condensed modified "slur" notation.

The need for prolongational reduction mainly arises from two limitations of time-span reductions. The first is that time-span reduction fails to express the sense of continuity produced by harmonic rhythm.^[7] The second is that time-span reduction—even though it establishes that particular pitch-events are heard in relation to a particular beat, within a particular group—fails to say anything about how music flows across these segments.^[8]

More on TSR vs PR

It is helpful to note some basic differences between a time-span tree produced by TSR and a prolongational tree produced by PR. First, though the basic branching divisions produced by the two trees are often the same or similar at high structural levels, branching variations between the two trees often occur as one travels further down towards the musical surface.

A second and equally important differentiation is that a prolongational tree carries three types of branching: strong prolongation (represented by an open node at the branching point), weak prolongation (a filled node at the branching point) and progression (simple branching, with no node). Time-span trees do not make this distinction. All time-span tree branches are simple branches without nodes (though time-span tree branches are often annotated with other helpful comments).

Rules

Each of the four major hierarchical organizations (grouping structure, metrical structure, time-span reduction and prolongational reduction) is established through rules, which are in three categories:

The well-formedness rules, which specify possible structural descriptions.
The preference rules, which draw on possible structural descriptions eliciting those descriptions that correspond to experienced listeners’ hearings of any particular piece.
The transformational rules, which provide a means of associating distorted structures with well-formed descriptions.

I. Grouping structure rules

Grouping well-formedness rules (G~WFRs)

"Any contiguous sequence of pitch-events, drum beats, or the like can constitute a group, and only contiguous sequences can constitute a group."
"A piece constitutes a group."
"A group may contain smaller groups."
"If a group G₁ contains part of a group G₂, it must contain all of G₂."
'If a group G₁ contains a smaller group G₂, then G₁ must be exhaustively partitioned into smaller groups."

Grouping preference rules (G~PRs)

alternative form: "Avoid analyses with very small groups – the smaller, the less preferable."
(Proximity) Consider a sequence of four notes, n₁–n₄, the transition n₂–n₃ may be heard as a group boundary if:

(slur/rest) the interval of time from the end of n₂ is greater than that from the end of n₁ to the beginning of n₂ and that from the end of n₃ to the beginning of n₄ or if
(attack/point) the interval of time between the attack points of n₂ and n₃ is greater than between those of n₁ and n₂ and between those of n₃ and n₄.

(Change) Consider a sequence of four notes, n₁–n₄. The transition n₂–n₃ may be heard as a group boundary if marked by

(Register) the transition n₂-n₃ involves a greater intervallic distance than both n₁-n₂ and n₃-n₄, or if
(Dynamics) the transition n₂-n₃ involves a change in dynamics and n₁-n₂ and n₃-n₄ do not, or if
(Articulation) the transition n₂-n₃ involves a change in articulation and n₁-n₂ and n₃-n₄ do not, or if
(Length) n₂ and n₃ are of different length and both pairs n₁,n₂ and n₃,n₄ do not differ in length.

(Intensification) A larger-level group may be placed where the effects picked out by GPRs 2 and 3 are more pronounced.
(Symmetry) "Prefer grouping analyses that most closely approach the ideal subdivision of groups into two parts of equal length."
(Parallelism) "Where two or more segments of music can be construed as parallel, they preferably form parallel parts of groups."
(Time-span and prolongational stability) "Prefer a grouping structure that results in more stable time-span and/or prolongational reductions."

Transformational grouping rules

Grouping overlap (p. 60)

₁

₂

g₁ ends with event e₁,
g₂ begins with event e₂, and
e₁ = e₂

a well-formed surface grouping structure G' may be formed that is identical to G except that

it contains one event e' where G had the sequence e₁e₂,
e'=e₁=e₂
all groups ending with e₁ in G end with e' in G', and
all groups beginning with e₂ in G begin with e' in G'.

Grouping elision (p. 61).

₁

₂

g₁ ends with event e₁,
g₂ begins with event e₂, and

(for left elision) e₁ is harmonically identical to e₂ and less than e₂ in dynamics and pitch range or
(for right elision) e₂ is harmonically identical to e₁ and less than e₁ in dynamics and pitch range,

a well-formed surface grouping structure G' may be formed that is identical to G except that

it contains one event e' where G had the sequence e₁e₂,

(for left elision) e'=e₂,
(for right elision) e'=e₁,

all groups ending with e₁ in G end with e' in G', and
all groups beginning with e₂ in G begin with e' in G'.

II. Metrical structure rules

Metrical well-formedness rules (M~WFRs)

"Every attack point must be associated with a beat at the smallest metrical level present at that point in the piece."
"Every beat at a given level must also be a beat at all smaller levels present at that point in that piece."
"At each metrical level, strong beats are spaced either two or three beats apart."
"The tactus and immediately larger metrical levels must consist of beats equally spaced throughout the piece. At subtactus metrical levels, weak beats must be equally spaced between the surrounding strong beats."

Metrical preference rules (M~PRs)

(Parallelism) "Where two or more groups or parts of groups can be construed as parallel, they preferably receive parallel metrical structure."
(Strong beat early) "Weakly prefer a metrical structure in which the strongest beat in a group appears relatively early in the group."
(Event) "Prefer a metrical structure in which beats of level L_i that coincide with the inception of pitch-events are strong beats of L_i."
(Stress) "Prefer a metrical structure in which beats of level L_i that are stressed are strong beats of L_i."
(Length) Prefer a metrical structure in which a relatively strong beat occurs at the inception of either

a relatively long pitch-event;
a relatively long duration of a dynamic;
a relatively long slur;
a relatively long pattern of articulation;
a relatively long duration of a pitch in the relevant levels of the time-span reduction;
a relatively long duration of a harmony in the relevant levels of the time-span reduction (harmonic rhythm).

(Bass) "Prefer a metrically stable bass."
(Cadence) "Strongly prefer a metrical structure in which cadences are metrically stable; that is, strongly avoid violations of local preference rules within cadences."
(Suspension) "Strongly prefer a metrical structure in which a suspension is on a stronger beat than its resolution."
(Time-span interaction) "Prefer a metrical analysis that minimizes conflict in the time-span reduction."
(Binary regularity) "Prefer metrical structures in which at each level every other beat is strong."

Transformational metrical rule

Metrical deletion (p. 101).

B₁, B₂ and B₃ are adjacent beats of M at level L₁, and B₂ is also a beat at level L_{i 1},
T₁ is the time-span from B₁ to B₂ and T₂ is the time-span from B₂ to B₃, and
M is associated with and underlying grouping structure G in such a way that both T₁ and T₂ are related to a surface time-span T' by the grouping transformation performed on G of

left elision or
overlap,

then a well-formed metrical structure M' can be formed from M and associated with the surface grouping structure by

deleting B₁ and all beats at all levels between B₁ and B₂ and associating B₂ with the onset of T', or
deleting B₂ and all beats at all levels between B₂ and B₃ and associating B₁ with the onset of T'.

III. Time-span reduction rules

Time-span reduction rules begin with two segmentation rules and proceed to the standard WFRs, PRs and TRs.

Time-span segmentation rules

"Every group in a piece is a time-span in the time-span segmentation of the piece."
"In underlying grouping structure: a. each beat B of the smallest metrical level determines a time-span T_B extending from B up to but not including the next beat of the smallest level; b. each beat B of metrical level L_i determines a regular time-span of all beats of level L_i-1 from B up to but not including (i) the next beat B’ of level L_i or (ii) a group boundary, whichever comes sooner; and c. if a group boundary G intervenes between B and the preceding beat of the same level, B determines an augmented time-span T’_B, which is the interval from G to the end of the regular time-span T_B."

Time-span reduction well-formedness rules (TSR~WFRs)

"For every time-span T there is an event e (or a sequence of events e₁ – e₂) that is the head of T."
"If T does not contain any other time-span (that is, if T is the smallest level of time-spans), there e is whatever event occurs in T."
If T contains other time-spans, let T₁,...,T_n be the (regular or augmented) time-spans immediately contained in T and let e₁,...,e_n be their respective heads. Then the head is defined depending on: a. ordinary reduction; b. fusion; c. transformation; d. cadential retention (p. 159).
"If a two-element cadence is directly subordinate to the head e of a time-span T, the final is directly subordinate to e and the penult is directly subordinate to the final."

Time-span reduction preference rules (TSR~PRs)

(Metrical position) "Of the possible choices for head of time-span T, prefer that is in a relatively strong metrical position."
(Local harmony) "Of the possible choices for head of time-span T, prefer that is: a. relatively intrinsically consonant, b. relatively closely related to the local tonic."
(Registral extremes) "Of the possible choices for head of time-span T, weakly prefer a choice that has: a. a higher melodic pitch; b. a lower bass pitch."
(Parallelism) "If two or more time-spans can be construed as motivically and/or rhythmically parallel, preferably assign them parallel heads."
(Metrical stability) "In choosing the head of a time-span T, prefer a choice that results in more stable choice of metrical structure."
(Prolongational stability) "In choosing the head of a time-span T, prefer a choice that results in more stable choice of prolongational structure."
(Cadential retention) (p. 170).
(Structural beginning) "If for a time-span T there is a larger group G containing T for which the head of T can function as the structural beginning, then prefer as head of T an event relatively close to the beginning of T (and hence to the beginning of G as well)."
"In choosing the head of a piece, prefer the structural ending to the structural beginning."

IV. Prolongational reduction rules

Prolongational reduction well-formedness rules (PR~WFRs)

"There is a single event in the underlying grouping structure of every piece that functions as prolongational head."
"An event e_i can be a direct elaboration of another pitch e_j in any of the following ways: a. e_i is a strong prolongation of e_j if the roots, bass notes, and melodic notes of the two events are identical; b. e_i is a weak prolongation of e_j if the roots of the two events are identical but the bass and/or melodic notes differ; c. e_i is a progression to or from e_j if the harmonic roots of the two events are different."
"Every event in the underlying grouping structure is either the prolongational head or a recursive elaboration of the prolongational head."
(No crossing branches) "If an event e_i is a direct elaboration of an event e_j, every event between e_i and e_j must be a direct elaboration of either e_i, e_j, or some event between them."

Prolongational reduction preference rules (PR~PRs)

(Time-span importance) "In choosing the prolongational most important event e_k of a prolongational region (e_i – e_j), strongly prefer a choice in which e_k is relatively time-span important."
(Time-span segmentation) "Let e_k be the prolongationally most important region (e_i – e_j). If there is a time-span that contains e_i and e_k but not e_j, prefer a prolongational reduction in which e_k is an elaboration of e_i; similarly with the roles of e_i and e_j reversed."
(Prolongational connection) "In choosing the prolongationally most important region (e_i – e_j), prefer an e_k that attaches to as to form a maximally stable prolongational connections with one of the endpoints of the region."
(Prolongational importance) "Let e_k be the prolongationally most important region (e_i – e_j). Prefer a prolongational reduction in which e_k is an elaboration of the prolongationally more important of the endpoints."
(Parallelism) "Prefer a prolongational reduction in which parallel passages receive parallel analyses."
(Normative prolongational structure) "A cadenced group preferably contains four (five) elements in its prolongational structure: a. a prolongational beginning; b. a prolongational ending consisting of one element of the cadences; (c. a right-branching prolongational as the most important direct elaboration direct of the prolongational beginning); d. a right-branching progression as the (next) most important direct elaboration of the prolongational beginning; e. a left-branching ‘subdominant’ progression as the most important elaboration of the first element of the cadence."

Prolongational reduction transformational rules

Stability conditions for prolongational connection (p. 224): a. Branching condition; b. Pitch-collection condition; c. Melodic condition; d. Harmonic condition.
Interaction principle: "to make a sufficiently stable prolongational connection e_k must be chosen from the events in the two most important levels of time-span reduction represented in (e_i – e_j)."

References

^ ^a ^b Lerdahl & Jackendoff 1983, p. 1.
^ Lerdahl & Jackendoff 1983.
^ Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton; Chomsky, Noam (1965). Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press; Chomsky, Noam (1966). Topics in the Theory of Generative Grammar. The Hague: Mouton.
^ Jackendoff, Ray (1987). Consciousness and the Computational Mind. Cambridge, Massachusetts: MIT Press; Temperley, David (2001). The Cognition of Basic Musical Structures. Cambridge, Massachusetts: MIT Press; Lerdahl, Fred (2001). Tonal Pitch Space. New York: Oxford University Press; Lerdahl, F., & R. Jackendoff (2006). "The Capacity for Music: What Is It, and What's Special About It?" Cognition, 100.1, 33–72.
^ They have two functions: to establish tree-structure relations (time-span trees), and to provide rhythmic criteria to supplement pitch criteria that determine the structural importance of events (p. 119).^{[full citation needed]}
^ A time-span is a length of time spanning from one metrical event up to, but not including, the next event. (This is the minimal condition on time-spans.)
^ Harmonic rhythm is the pattern of durations produced by changes in the harmony at the musical surface.
^ Lerdahl & Jackendoff 1983, p. 122.

Sources

Lerdahl, Fred; Jackendoff, Ray (1983). A Generative Theory of Tonal Music. Cambridge, Massachusetts: MIT Press.

Bibliography on automation of GTTM

Keiji Hirata, Satoshi Tojo, Masatoshi Hamanaka. An Automatic Music Analyzing System based on GTTM.
Masatoshi Hamanaka, Satoshi Tojo: Interactive Gttm Analyzer, Proceedings of the 10th International Conference on Music Information Retrieval Conference (ISMIR2009), pp. 291–296, October 2009.
Keiji Hirata, Satoshi Tojo, Masatoshi Hamanaka: Techniques for Implementing the Generative Theory of Tonal Music, ISMIR 2007 (7th International Conference on Music Information Retrieval) Tutorial, September 2007.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "Implementing a Generating Theory of Tonal Music". Journal of New Music Research, vol. 35, no. 4, pp. 249–277, 2006.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "FATTA: Full Automatic Time-span Tree Analyzer", Proceedings of the 2007 International Computer Music conference (ICMC2007), vol. 1, pp. 153–156, August 2007.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "Grouping Structure Generator Based on Music Theory GTTM", Transactions of Information Processing Society of Japan, vol. 48, no. 1, pp. 284–299, January 2007 (in Japanese).
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "ATTA: Automatic Time-span Tree Analyzer based on Extended GTTM", Proceedings of the 6th International Conference on Music Information Retrieval Conference (ISMIR2005), pp. 358–365, September 2005.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "Automatic Generation of Metrical Structure based on GTTM", Proceedings of the 2005 International Computer Music conference (ICMC2005), pp. 53–56, September 2005.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "Automatic Generation of Grouping Structure based on the GTTM", Proceedings of the 2004 International Computer Music conference (ICMC2004), pp. 141–144, November 2004.
Masatoshi Hamanaka, Keiji Hirata, Satoshi Tojo: "An Implementation of Grouping Rules of the GTTM: Introducing of Parameters for Controlling Rules". Information Processing Society of Japan SIG Technical Report, vol. 2004, no. 41, pp. 1–8, May 2004 (in Japanese).
Lerdahl, F., & C. L. Krumhansl (2007). "Modeling Tonal Tension". Music Perception 24.4, pp. 329–366.
Lerdahl, F. (2009). "Genesis and Architecture of the GTTM Project". Music Perception 26, pp. 187–194.

[FOOTNOTELerdahlJackendoff19831-1] Lerdahl & Jackendoff 1983, p. 1.

[FOOTNOTELerdahlJackendoff1983-2] Lerdahl & Jackendoff 1983.

[3] Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton; Chomsky, Noam (1965). Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press; Chomsky, Noam (1966). Topics in the Theory of Generative Grammar. The Hague: Mouton.

[4] Jackendoff, Ray (1987). Consciousness and the Computational Mind. Cambridge, Massachusetts: MIT Press; Temperley, David (2001). The Cognition of Basic Musical Structures. Cambridge, Massachusetts: MIT Press; Lerdahl, Fred (2001). Tonal Pitch Space. New York: Oxford University Press; Lerdahl, F., & R. Jackendoff (2006). "The Capacity for Music: What Is It, and What's Special About It?" Cognition, 100.1, 33–72.

[5] They have two functions: to establish tree-structure relations (time-span trees), and to provide rhythmic criteria to supplement pitch criteria that determine the structural importance of events (p. 119).^{[full citation needed]}

[6] A time-span is a length of time spanning from one metrical event up to, but not including, the next event. (This is the minimal condition on time-spans.)

[7] Harmonic rhythm is the pattern of durations produced by changes in the harmony at the musical surface.

[FOOTNOTELerdahlJackendoff1983122-8] Lerdahl & Jackendoff 1983, p. 122.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]