Linguistics and Music

Musical Applications of Linguistic Analysis
A paper submitted for completion of the Hampshire College Division I Exam, comparing the theories of Leonard Bernstein and Fred Lehrdahl and Ray Jackendoff.

Written in 1983-84; theories of generative grammar and its relationship to musical structure have been revised somewhat since that time. Lehrdahl told me that he thinks there might be a closer relationship between phonology and musical structure than to transformational grammar.

Language and Communications Division I Exam
Arthur Kegerreis
Fall 1983

Musical Applications of Linguistic Analysis

  • I. Introduction
  • II. Symbology
    • A. Langer
  • III. Language: Spoken and Written
    • A. Linguistics and innateness: Noam Chomsky
    • B. Implications for musical innateness, linguistic applications
  • IV. Musical Linguistic Analysis
    • A. Inter relatedness and methodology
    • B. Bernstein
      • 1. Questions of relevance
        • issues
        • speculations
        • methodology
    • C. Jackendoff/Lehrdahl
      • 1. Origin of methodology and approach
      • 2. Response and elaboration
      • 3. Analytical Methodology
      • 4. Conclusions/Implications
    • D. Developmental Comparative Conclusions and Implications
      • 1. Bernstein/Jackendoff & Lehrdahl
  • V. Conclusion
    • A. Summary
      • 1. Symbology
      • 2. Spoken and Written Language: Linguistic Goals
      • 3. Musical Linguistic Analysis:
      • issues
      • approaches
      • methodology
      • conclusions; linguistic parallels?
      • implications
    • B. Implications of Conclusions Drawn from Musical Analysis Using Linguistic Methodology
  • VI. Bibliography

Division I Exam
School of Language and Communication 
Hampshire College 
Amherst, MA


Music is often described as a language, but is it a language that can be analyzed in the same way as written and spoken English?

In this exam I intend to briefly compare the symbolic systems utilized in written language and music. I will then discuss some of the goals of linguistic analysis as set forth by Noam Chomsky. I will then discuss some attempts to find parallels between the structuring of written and spoken language and written music.

Led by Leonard Bernstein, Ray Jackendoff, and Fred Lehrdahl, these efforts strive to discover the extent to which linguistic analytical techniques can be applied to music.

The results of these investigations are suggestive, not conclusive; therefore, I shall attempt to determine what implications their theories may carry.

Hopefully, a comparison of these differing yet interrelated approaches will lend some insight into the structural similarities and differences between music and the English language.

Speech and music are, at a purely physical level, different patternings of the same medium; sound. They are received by the same human sensory organs, the ears. Yet some aspect of cognitive function serves to distinguish one from the other, to the extent that separate ears become more adept at discerning the elements of each. (1)

In addition to speech and music, people learn to associate experience with sound patternings foreign to the aforementioned categories. Machines are found to produce characteristic rumblings, birds chirp, dogs bark. Environmental noises become signs indicative of the existence of a thing, event, or condition. They are symptomatic. Behavior patterns are discovered to be traits of these signs (sounds) and thus meaning is attributed to them. Speech sounds can be signs, but they operate primarily as symbols. Their nature is symbolic. They are “vehicles for the conception of objects.” (2) These symbols mean concepts, and therefore speech has the property of displacement. Some cognitive ability enables humans to isolate the traits and qualities of this symbolic sound patterning; to distinguish it from other sound patterns. People tend to perceive spoken language according to the knowledge they have of the language. (3) It would then follow that the ability to distinguish sound patterns as symbols or as signs would develop relative to associative exposure. As we will see, this is not necessarily the case. Nonetheless, sound patterns are then established as closed pattern systems with learned relational and associative behaviors. 

In music, sounds are subject to patterned tendencies. These are dictated by what is perceived as the stability of combined pitches. The interaction of these tendencies and their incorporation into a time structure (rhythm) determines the form of the pattern. It is the uniqueness of the pattern that distinguishes it from other sound patterns. In speech the uniqueness of a concept is derived from a sound pattern and its symbolic semantic context. Identical sound patterns can symbolically represent different concepts, depending on their context. 

Similar environmental sound patterns can be signs of different things, events, or conditions since similar sounds can be produced by various different sources. In these cases other senses are employed to make distinctions. 

Musical sound, however, is neither purely symptomatic nor symbolic in the same sense as speech. It is not specifically representative, and its sound patterns do not have fixed connotations. Its symbolic meaning is relative. Its pure meaning is in its structure. The relationship between the structural elements determines the essence of the structural pattern. This pattern, when viewed contextually, functions to determine meaning through uniqueness and relative influence on surrounding patterns. 

“Meaning is not a quality, but a pattern viewed with reference to one special term round which it centers; the pattern emerges when we look at the given term in its total relation to the terms about it. A term functions to give meaning.”

“Words have no value except as symbols (or signs); in themselves they are completely trivial. A symbol which interests us also as an object is distracting. It does not convey its meaning without obstruction.” (4) They are not distracting, they are objects of interest. The elements of music are its essence.

Langer has suggested that the strength of music’s form is that it can serve as a vehicle to express nonspecific human experience, albeit imprecisely.

As a nonspecific representational medium, music can attempt to express feelings or experience that could not be expressed through the specifically restricted meanings of words. The interpretation of musical sounds is then similarly subjective, and “meanings” vary from listener to listener. In these cases, meanings are more likely to be associative; furthermore, they are not fixed, so although a composition may have very real meanings to both the composer and the listener, the meanings will not necessarily (or probably) be the same. Thus, a musical organization of sounds may express nonspecific human experience through simultaneous and differing (therefore imprecise) meanings. Non-cognitive effects of music are similarly discrete, however they exhibit more consistency. Music is an affective form and is known to affect pulse rate, respiration, and concentration. It can also induce excitation or relaxation. However, the extent of response is variable relative to attention, interest, physiological reactivity, and other conditional factors. (5) 

The same structural elements basic to musical form are also present in language. Different cultures employ them to differing extents. In English, pitch variation plays a minor role in determining meaning. The significance of pitch variation exists at the structural sentence level, a characteristic of intonation languages. Pitch is used for emphasis, or to suggest grammatical functions, such as the inquisitive nature of a question. Thus, pitch is used for inflection. The minimal role of pitch importance here further separates speech sound patterns from musical sound patterns. 

The languages of other cultures attribute to pitch a more significant role in meaning determination. In tone languages, such as Vietnamese or Ghanaian, the pitch of syllables can completely change a word’s meaning. Thus pitch significance exists at syllabic rather than sentence levels. The distinction between musical and speech sound patterning in these cases is not as well defined. 

The music of many cultures employing tone languages is quite rhythmically complex. Polyrhythms, or rhythms consisting of superimposed differing metrical structures are commonly more fully developed in the music of these cultures. The closer relationship between pitch and semantic meaning in tone languages may have led these cultures to further develop the rhythmic components of their musical forms. This would account for the highly developed rhythmic components of Ghanaian music. 

Our cultural musical structure is not dependent on conceptual meaning. Whereas in speech, concept uniqueness is determined by the structure of the expressive form, in music the expressive form becomes the unique concept. Thus the musical form itself becomes the object of interest, rather than the expressive determinant for the specific transference of information. 

The study of language has been altered considerably through the work of Noam Chomsky. His theory of linguistic organization is rooted in the philosophic postulates of Descartes and is supported by the biological hypotheses proposed by Konrad Lorenz. Cartesian philosophy suggests that the reason that scientific theories gain acceptance is because of the common logical structurings of the human organism. This innate structure causes people to reason, act, and react similarly. Lorenz maintained that the biological structuring of non-human forms causes them to adapt in similar ways; hence the mind must also structure and acquire knowledge in similar ways. 

The fact that human communication contains traits uncommon to any form of animal communication suggests that an innate biological potential might have significant conceptual effects on the structure of our grammar. Language is a symbolic representational system and is open ended rather than closed; it combines a finite number of sounds in an infinite number of ways, each with a specific meaning. Closed systems also contain a finite number of sounds, however they are confined to a finite and specific number of meaningful combinations. Thus the expressive capacity of a closed system is sharply limited. Human language also has the potential for displacement. Displacement itself suggest the understanding of the ordering of events, thus the concept of time. Together these factors exemplify the human capacity for development and attribution of behavioral expectations to external objects. The referential and relative qualities of these traits implicate that subjective and objective perspectives play a significant role in meaning, understanding, and the utilization of cognitive faculties. 

Chomsky’s theories have attempted to determine how humans form and implement a vocabulary, conveying meanings that possess these characteristics. Many previous theories have failed because they couldn’t explain the development of language while maintaining these traits. Chomsky’s theories don’t explain all aspects of human grammar, but they do seem to provide a sounder basis for an understanding of it than any previously proposed. 

Chomsky suggest that the best way to determine the manner in which the mind acquires knowledge is to study the similarities between the development and the structuring of the language. He suggests that although differences between languages are initially more apparent, they are actually unimportant to the structuring of meaning. He maintains that the syntactic structures that order grammar are common to all languages. These elements, deep structures, determine the essential relationships between sentence elements. (6)

Surface structures are produced by these deep structures. Transformational processes develop the deep structures to clarify the active or passive nature of the sentences as well as to clarify the ordering of events. These structural elements together form the foundation of what Chomsky terms a “universal grammar.” This is an attempt on Chomsky’s part to account for the ordering principles governing the generation of any and all sentences within a language. The ordering principles he employs for the development of these principles are phrase structure and transformational rules. These rules describe the hierarchic structural levels of a sentence; the deep and surface levels. 

The subordinate discrete components of a sentence compose the surface structure. Phrase structure and transformational rules describe the processes through which these surface structures have been manipulated and transformed from their deeper and dominating structural levels. The hierarchical basis of these structures is most clearly displayed through the use of tree-like structures. These tree diagrams expose the deep and surface level sentence structures through branching. Items closer to the “trunk” of the tree are deeper level, dominant structures. Items at the end of the branches are surface level structures. Branches are used to indicate: 1.) sentences that are recombined to form new sentences; 2.) noun, verb, and prepositional phrases that compose sentences; 3.) the verbs and noun phrases that compose verb phrases; 4.) the prepositions and noun phrases that compose prepositional phrases; 5.) the articles, adjectives, and nouns that compose noun phrases; and 6.) the specific surface elements that correspond to the individual phrase constituents. The following example shows a tree for the sentence, “The very old man lives in a tree and eats oysters.” (7)

The descriptive rules that break sentences into phrases are phrase structure rules. Transformational rule enable the sentences produced by the phrase structure rules to be combined and/or transformed from active to passive tense. Thus, active components of a sentence such as, “The very old man lives in a tree house and eats oysters.” can be transformed in passive sentences such as, “The tree house is lived in by the very old man,” and “The oysters are eaten by the very old man.” Transformational rules employ “context sensitive” phrase structure rules. The foundation for these context sensitive rules however, are context independent phrase structure rules. 

Context independent phrase structure rules (8,9):
(the arrow “–>” reads “is composed of”)
1.) Sentence –> NP + VP
2.) NP –> V + NP
3.) VP –> V + NP
3a.) VP –> V + PP
4.) PP –> Prep + N
5.) Art. –> The, a, an
6.) N –> Adj + N
7.) N –> {man, house,…}
8.) Adj –> {very, old, fat,…}
9.) V –> {lives, eats,…}

With this sort of terminology, mathematical conventions such as bracketing can help implement these rules; (NP + VP (V + NP)). The phrase structure rules become more elaborate when considered as context sensitive and incorporated into transformational grammar. This is because the context “sensitivity” of transformational phrase structure rules must accommodate a variety of tenses. Phrases are developed into “underlying strings,” are combined correspondingly, and are permuted and deleted as necessary; hence they are “transformed” into a final sentence. 

Context sensitive phrase structure rules (10,11)

1a.) S –> NP + VP
1b.) S –> S + Conj + S
2a.) VP –> V + NP
2b.) VP –> V + PP
3.) PP –> Prep + NP
4.) NP –> {NP singular}
{NP plural}
5.) NP singular –> Art + N
6.) NP plural –> Art + N + “s”
7.) Art –> {the, a, an,…}
8.) N –> Adj + N
9.) N –> {man, oyster, tree, dog, book,…}
10.) V –> Aux + V
11.) V –> {hit, eat, live,…}
12.) Aux –> tense (+ modifier) (+ have, + en) (+ be, + ing)
13.) Adj –> {very, old,…}
14.) Modifier –> {will, can, may, shall, must}
15.) Conj –> {and, but, because, or,…}
16.) Prep –> {in, of, by,…}

Given these rules we can examine the more complex deep structure of our sentence and see how it was transformed to its present state.

1b.) S –> S1 + Conj + S2
1a.) S1 –> The very old man lives in the tree house.
15.) Conj –> and
1a.) S2 –> The very old man eats oysters.
1a.) S1 –> NP1 (The very old man) + VP1 (lives in the tree house)
5.) NP1 sing –> art (the) + N (very old man)
8.) N –> Adj (very) + N (old man)
8.) N –> Adj (old) + N (man)
2b.) VP1 –> V (lives) + PP (in the tree house)
12.) V –> Aux (+s) + V (live)
3.) PP –> Prep (in) + NP (the tree house)
5,) NP sing –> Art (the) + N (tree house)
8.) N –> Adj (tree) + N (house) 
1a.) S2 –> NP2 (the very old man) + VP2 (eats oysters)
5,) NP2 –> Art (the) + N (very old man)
8.) N –> Adj (very) + N (old man)
8.) N –> Adj (old) + N (man)
2a.) VP2 –> V (eats) + NP (oysters)
10.) V –> Aux (+s) + V (eat)
11.) V –> eat
6.) NP plural –> Art ( ) + N (oyster) + “s”

In the initial version of this sentence, a conjunction joins the two sentences, then a celation rule is employed to avoid the redundancy of noun phrase 2 and to eliminate the “house” in verb phrase 1. By employing context sensitive transformations the tenses of the two sentences are brought into agreement as is the active/passive nature of the sentences. Thus, deeper level structures help to determine specific relational meaning.

If, as Langer suggests, meaning is a relational function of terms, then by modifying the relationships between sentence elements, these transformational processes play a significant role in language. They function to establish a hierarchic referential relationship among the elements of a sentence. Because these structures determine the understanding of relations among symbols, they are the core determinants of meaning in language. It is quite likely that the perception of inflectional quality provides significant clues toward understanding and comprehension as well.

the innate interpretive structuring of a language in a person’s mind is what Chomsky describes as their “competence.” This competence refers to an innate “generative grammar;” structural limitations applied to a language while simultaneously maintaining logical relationships of meaning and offering extensive and diversified patterns of referential relationships. This competence serves to edit and preprocess the extraneous utterings of a speaker; the false starts and unintended words that are grammatically incorrect, yet are so characteristic of a speaker’s performance. This competence, then, is also a prerequisite for “performance,” or the actual utilization of a language by an individual to convey an intended meaning. (12)

A relevant consideration here is memory capacity, an elemental factor in relational understanding. Chomsky suggests that transformational structures facilitate relational and referential understanding. (13) We might conclude, then, that transformational structure serves to clarify subject, object, action, and modifications by keeping contextual elements within the comprehensive abilities of the short term memory. Within long term structures redundancy may be incorporated to clarify and facilitate understanding. 

Chomsky assumes that an initial cognitive stage in language interpretation is the preliminary identification of sounds as language units. (14) He proceeds to acknowledge the recognitory processes involved in abilities that he suggest may also be innate. Her he includes face recognition, personality and reaction predictions, mathematical principles that build on numerical and spatial intuition, and of particular interest here; recognition of melody under transposition and other modifications. Chomsky characterizes these as similar in that they all rely on minimal “degenerate” data to draw conclusions relating elements in question. This is also a property of language acquisition. 

Cognitive processes built on degenerate data are those that, like the aforementioned abilities, arrive at definitive conclusions with a minimum of evidence. This is a property of language acquisition. A child learns to build an infinite variety of sentences after exposure to an extremely minute vocabulary. This is a trait that seems to suggest that the linguistic “extemporization” process is an innate cognitive ability. In music these abilities are also quite remarkable. Lacking any representational associations, a sequence of sounds is recognized as having the same dimensions of pitch and time distribution, regardless of key. This ability is evident upon the first occurring repetition. This phenomena enables an improvised performance or an orchestral score to be recognized as an elaboration of a melody. It also enables a listener to relate the different movements of a composition. 

If language is indeed an innate structure, one would expect and hope that medical science could provide some supporting evidence of this hypothesis. Unfortunately, medical science has not yet provided specific and conclusive evidence of innate linguistic developmental structures.

Neurological research has suggested, however, that structural sound patterning is a cognitive prerequisite to meaning association, relational understanding, and therefore linguistic competence. The same is true of visual pattern recognition. Damage to two areas of the cortex, Broca’s area and Wernicke’s area, can result in difficulties producing comprehensible sentences. However, ability to recognize melody and to produce words is not impaired. (15) 

This research chiefly concerns sequential ordering in spoken and written language. It suggests that there is a cognitive structure that forms rules concerning interpretation of the relationships of tones and rhythms. Perhaps there is an innate human faculty that accommodates structural sound pattern rules, prior to imitation or symbolic processing. It could then follow that tonal and non-tonal structuring would be unaffected by semantic representational meaning. The subjective relational perspective of a child learning language would be an imperative consideration in the development of meaning. It is the relational nature of meaning that patterns expose. Therefore subjective contextual perspective might serve as a central focus relative to the imbedding process and further extensions of relative structural patterning. 

Here we have encountered the issue of contextual recognition, identification, and comprehension. Neurological psychological research has indicated that separate hemispheres of the brain have localized cognitive functions, one being primarily concerned with conceptual and emotive processing (including music), the other with intellectual processing, including language. (16) The intellectual hemisphere is generally opposite the hand with which a person writes. Wernicke’s and Broca’s areas are located in this hemisphere. However, the two hemispheres are interactive, and it seems that the conceptual and emotive hemisphere helps with the contextual preprocessing and comprehension of ordered sounds. 

This would suggest that the creative cognitive structures utilized in producing speech and music may not only be innate, but may be shared as well. 


We have seen that the ordering principles of language can be clarified and exposed through structural analysis. Music can also be analyzed and studied in a similar fashion. What remains to be seen is whether there are innate cognitive structures common to both. If there are, perhaps the analytic techniques of linguistics can be applied to music, exposing these parallels.

The plausibility of this approach could be supported by the fact that the elemental components of music, pitch and rhythm, are to a large extent, the elemental components of speech, utilized in a less rigorous and specific order. This is merely a consequence of the contextual representational and non representational symbolic qualities discussed earlier. Because the sounds in music are not confined by structures that determine meaning, they may be ordered in more diverse ways. Thus we find that grammatical rules arise from acoustic principles rather than structurally determine symbolic representational relationships. However, acoustic principles still define referential qualities, and in Langer’s terms, we still have a certain quality of meaning.

Through study of these acoustic relationships, we can discover an entire heirarchical structure of pitches and their relationships. By applying the human cognitive ability of displacement, we can add another dimension to these acoustic relationships. The interaction of time (rhythm) and acoustic phenomena (pitch relationships) thus determine the dimensions of music. It is a fairly straightforward task to syntactically analyze these dimensions, giving them a heirarchic structure. Yet this is only a preliminary step if we are to undertake goals similar to those of linguistics. These structural analyses must be varied enough to make suggestions concerning universals (or the lack thereof) in musical structuring. If these universals seem to exist, one might attempt to make conclusions about the possible innate structurings of music. These would have to be supported by evidence of rapid musical competence development arising from limited exposure to degenerate data. Because of the kinesthetic variables involved in instrumental performance, vocal music performance or simply musical recognition might initially be considered. Proceeding in this fashion, a Chomskian concept of competence could be developed for musical understanding. Kinesthetic variables could then be discarded or pro-rated in the consideration of compositional process and performance (in its usual musical sense). The implications for insights into (innate?) creative compositional and improvisational processes could then improve and extend considerably.

However, at present, musical analysis through linguistic procedures has not progressed beyond the syntactic structural stage. Efforts previously undertaken have led to the current “rules” of harmony, melody, counterpoint, and general music theory. The principles behind these analytic techniques have been under development for hundreds of years, but the ultimate goal has only been to define musical structures, not to draw conclusions regarding innate cognitive musical structures. In addition, much of the cognitive investigation has been mostly speculative. However, a more specific methodology has been developed for this analysis, rooted in linguistic procedure. It will be discussed forthwith.

Somewhat regrettably, much of the work that has been done in applying linguistic principles to music has been undertaken by either specialists in the linguistic field or musical theoreticians with somewhat insecure linguistic backgrounds. Fortunately, the work being done in each of these fields is becoming shared interdisciplinary knowledge. Linguists with some musical background are helping to develop the speculations of musical theoreticians. An example is Ray Jackendoff, who, together with Fred Lehrdahl, has developed a somewhat specific analytic system for musical analysis. Their work responds to, elaborates, and improves upon a series of lectures by conductor and musical theoretician Leonard Bernstein. (17) However speculative, the transcriptions of these lectures suggest interesting implications about linguistic strucutures in music and other art forms, such as poetry. Jackendoff has applied more rigorous analytic procedures to Bernstein’s speculative observations, resulting in this system of analysis. Bernstein has offered a myriad of stimulating insights into the structures of compositions. His linguistic analogies, however, rarely develop beyond ponderous speculations, and they frequently fall apart. This is one of Jackendoff’s major criticisms of Bernstein’s lecture series. It is perhaps because of Bernstein’s limited linguistic knowledge that he sometimes encounters these difficulties.

One of Bernstein’s most prominent difficulties is common; terminology. Through the study and analysis of language and music, one is apt to find considerable shared descriptive terminology. Phrase, meter, articulation, period, style, transformation; all these terms and many others have different meanings when applied to language or music. The occurrence of these similar descriptors leads many people to interpret these musical and linguistic terms in a similar fashion. However, although there are definite parallels in the syntactic structuring of musical and linguistic elements, the common descriptive terms usually have distinct purposive roles.

The similarities in terminology frequently arise because the structural composition of the elements of both are indeed similar. This similarity enables us to discuss each system in terms of the other. We should therefore be careful, however, to distinguish as clearly as possible the relationships that these terms represent. On a syntactic level many structural elements are actually shared and similarly described, however when we examine the semantic relational functions of these shared terms, we encounter difficulties. This is largely due to the differences in the symbolic characteristics of each.

Despite these symbolic differences, Bernstein attempts to describe music in the same semantic context as language. His resultant observations prove extremely subjective, pointedly personal interpretations of the music he discusses.

For example, he attempts to equate as semantic parallels these elements of music and language:

The problems here are obvious. Relational functions served by motives, phrases, and notes arise out of unit order, with rhythmic properties helping to develop musical units and ideas. Phrases help to comprise a section or a movement because they sound similar and thus, throughout the musical developments of a phrase, we can understand a similarity to an opening motive. Lacking that, we can understand some sort of continuity throughout a section. This continuity is frequently established through some sort of similarity in the composition. Linguistic continuity results not from repetition, though this may be employed, not from familiar sound patternings showing audible relationships to other sound patternings, but from the interpretation of the symbolic representations.

Further more, Bernstein tries to describe motives as noun phrases, limits the parallel semantic qualities of chords to those of modifiers, and discusses rhythmic elements as mere parallels of verbs. It seems quite obvious that the roles of all these musical elements are intertwined. Motives can establish chords, just as chords can help define motives. Rhythm is as important a modifier as chords are. Chordal qualities can be completely redefined in different rhythmic contexts. Therefore Bernstein’s attempted equivalences prove shortsighted and limited.

Bernstein proceeds to develop further analogies; his comparisons concern transformational grammar, his conclusions look something like this:

Here Bernstein fails to adequately expose parallel structures of meaning, and what he is actually describing is the different symbolic natures of music and language. He then suggest that language may be transformed beyond it’s literal common usage into a metaphorical non-specific level where it’s structure adds an element of artistic meaning: poetry. This is the super-surface structure represented in the above chart. Music is transformed directly to this level from it’s underlying elements, as it has no literal level of representational meaning. At this point Bernstein admits the shortcomings of his earlier analogies and proposes that music is a “language of metaphor,” combining, transforming, and developing metaphorical (suggestive, non-specific) components into larger, higher-order metaphorical forms. Although this is a plausible suggestion, the transformations of which he speaks are not equivalent to linguistic transformations. Bernstein’s transformations concern artistic organizational forms, while the linguistic transformations structurally determine meanings. As Jackendoff and Lehrdahl point out, musical trees represent elaborations of rhythmic and harmonic elements, rather than is-a relations among grammatical categories. Elaborational musical analysis plays a significant part in Jackendoff and Lehrdahl’s analytic procedure, as we will see later.

Through his discussion of musico-linguistic parallels in “meaning,” Bernstein has brought us no closer to establishing the existence of musical universals. In the area of syntactic consideration, however, Bernstein brings up some cogent argumentation concerning the physical acoustic basis of tonality. Jackendoff points out some inconsistencies among these points, yet his own theory also incorporates some of the principles of concern here. Specifically, relative distance from a fundamental harmonic element around the circle of fifths determines another element’s harmonic strength and importance. It may well be that the circle of fifths has an acoustic basis, drawn from the concepts Bernstein suggests.

Primary to the discussion of acoustics here is the harmonic overtone series. Overtones are the simultaneous sympathetic acoustical vibrations that characterize a sound event. Acoustic vibrations are rarely pure, and are almost always accompanied by vibrations that interact with it to characterize the sound we hear. These vibrations are overtones or harmonics. Comparative strength of pitches is perceived about a primary tone and its successive harmonics. This relationships was discovered by Pythagoreas, who experimented with vibrating anvils struck by hammers. He discovered that pitch intervals corresponded to ratios proportional to the mass of the vibrating object. The physical dimensions of instrumental elements used for pitch differentiation are governed by these ratios.

Because harmonic overtones help to establish the dominance of their fundamental tone, Bernstein suggests that they not only determine a key center, or fundamental note that begins a scale, but also determine the pitch elements of primary pentatonic scales. Bernstein demonstrates that the following successive overtones are produced by a fundamental, and suggests that the pentatonic scale, one of the most widely used pitch scales in the world, is derived from the first six harmonics.

The harmonic overtones may certainly help to define the pentatonic scale, however Jackendoff is quick to point out that it doesn’t account for the pentatonic scale containing a minor second. This interval occurs only between the fourteenth and fifteenth overtones. If the most fundamental overtones are implicated as the most fundamental pitch components of a scale, Jackendoff questions the placement of the intermediate overtones.

It is conceivable, however, that scale pitch components and harmonic relationships are intuitively derived from fundamental acoustic overtones. The most common non-unison interval produced in vocal music is the fifth, which is the primary harmonic overtone following the octave. The octave is the most frequently produced simultaneous interval in improvised vocal music, a result of the differing physical size amidst the singers, and therefore considered as a unison tone. The interval of the fifth is perceived as a fundamental element of stability in music. Successively applied, it can define all of the twelve tones utilized in western tonal music.

This arrangement is referred to as the circle of fifths in ascending order or the circle of fourths in descending order. Although this property is not unique to this interval, the acoustic proximity and strength of this overtone would implicate that it plays a primary role in the determination of pitch elements within a chromatic scale. Because the fifth establishes an acoustical “gravity,” it can determine a heirarchical relationship between pitches. The “gravitational” quality of this interval is a fundamental component of musical harmonic analysis, which describes the downward chordal progression as “resolution,” the movement from the “dominant” chord to the “tonic” or “fundamental” chord. Ascending fifths are said to “defy resolution,” descending fourths are said to “resolve.”

If we consider this primary acoutic overtone as a strong heirarchical indicator, we can arrive at a structural breakdown of a musical composition based upon elaboration and incorporation of this principle. A fundamental resolution or cadence may be considered to be the “deep structure” of a musical composition. Further elaborations and developments of this musical event characterize surface level structures and transformations. This is not unlike a system of musical analysis developed by Heinrich Schenker (18) which describes a composition as a “Satz,” which is both the German word for sentence and for a musical composition. As this deep structure is elaborated, one progresses towards the surface structure of the composition. We will soon see that Jackendoff and Lehrdahl employ a similar analytical procedure in what constitutes their “prolongational elaboration” of a composition.

Although the harmonic series can implicate the derivation of the chromatic scale, it unfortunately does not clearly explain the development of major and minor scale systems. These scales limit chromatic components in such a way that pitch heirarchies can be more clearly delimited, helping to imply directional movement and defining the tonal center. The chromatic scale is a weak directional indicator. Although its direction of movement is quite audible, it cannot clearly expose the tonal center of a musical composition in the same powerful way that major and minor scales do. Bernstein describes its effect with regard to the definition of fundamental tonal centers as ambiguous. He then cites quite relevant examples of ambiguity in music, demonstrating various composer’s skills in integrating ambiguity into acceptable compositions. The development of this ambiguity progressed a a rapid rate, and it eventually overcame itself, discarding harmonic principles of resolution to leave only the rhythmic qualities the responsibility of structural heirarchic definition. Thus, serial music was born. Dissonance had overcome its need for resolution, as Stravinsky had suggested. (19) It had not simply become acceptable in itself, but had begun to serve a less distinct purpose.

Stravinsky was a master of ambiguity. One of his musical innovations was the development of bi- or polytonality: compositions simultaneously developing in two or more keys at once. The tension this creates is quite effective, as demonstrated by the riotous audience reactions at the first performance of “The Rite of Spring.”

Bernstein’s theoretical analyses of compositions lead to interesting observations, however to analyze music in linguistic terms, a more thorough and rigorous methodology must be developed. Traditional musical analysis already provides many of the tools necessary for this analysis, yet more specific procedures are required for our purposes. To draw conclusions about musical perception and innate similarities in various musical compositions, an analytical structure must be developed that exposes patterns and relationships in a clear, thorough, and straightforward manner. This what Ray Jackendoff and Fred Lehrdahl have attempted to do. Their introductory work, now elaborated in book form, was entitled, “Toward a Formal Theory of Tonal Music.” Published in “The Journal of Music Theory,” in the spring of 1977, the work itself is not a theory, but a system of analysis.

They purport to maintain a goal of determining and defining the intuitive way of hearing a musical piece. They assume that a listener familiar with the idiom in question, western classical tonal music, is their subject.

To describe musical compositions, they employ a variety of structural descriptive units drawn from earlier musical and linguistic analytic systems. The analytical systems are similar to the earlier systems they were derived from, as the doubtlessly were intended to be. The concepts represented are somewhat different, although they certainly also draw from previous methodologies. Because they describe musical development as elaboration rather than as is-a representational relationships, their analysis strives to define the qualities of the elaboration. Their system employs two levels of analysis that lead to an initial phrasing elaboration structure. These elements help to determine the harmonic elaboration; the overall harmonic structure of the piece. This level of elaborational analysis is somewhat similar to the analytical methodology of Heinrich Schenker (20) described earlier, however it also helps to visualize heirarchical harmonic relationships more clearly than Schenker’s system. The reason for this is that it applies a tree-like structure, similar to those used in linguistic analysis.

Jackendoff and Lehrdahl’s phrasing analysis is termed the “time span elaboration” (or reduction). It arises out of rhythmic “metrical analysis” and “grouping analysis.” The harmonic developmental analysis is the “prolongational elaboration” (or reduction). In their words, “The tie span elaborations inour theory describe the sense of a piece in terms of different principles of recursive elaboration; their interplay has a great deal to do with the sensations of tension and relaxation in one’s hearing of a piece.”

In order to establish a methodological procedure for combining these domains of analysis, Jackendoff and Lehrdahl define certain structural “well-formedness conditions” defining structural descriptions, and “preference rule” designating a listener’s preferred way of hearing a piece. The result aims toward heirarchic patterning, exposing recursive patterns and other parallel structures. One procedural condition is that on any given heirarchic level, structures may not overlap, but must be discretely defined. Thus, a single note can belong only to a single phrase, a particular phrase only to a single phrase grouping, and so on. This does not imply that these elements cannot be repeated; the analysis if of music that is already complete; it is the structural description that cannot share elements. A heirarchic designation can assign only one specific analytic grouping to any single element in a composition. A problem arises in that musical compositions are often not heirarchic in nature, or may not submit to their wellformedness conditions and preferential descriptions. Elements often serve dual purposes; for example, a note may end one phrase and begin another. This is an example of the transformational difficulties the authors describe, these transformations being more strictly linguistic, rather than musical. Her, the transformational element arises out of elements that are shared by separate musical phrases, somewhat as a word may be used only once in a sentence, serving as a transformational element to link separate sentence structures and delete a repeated phrase. However, the sense implied is still not of the strictest linguistic quality, another consequence of the differing symbolic natures of these two subjects. The transformations of concern here address transitions from one heirarchic grouping to another, on the same heirarchic level. They do not in any way serve to change the active or passive nature of the music, as it cannot exist. If anything, it is purely active. Thus, these transformations are neither comparable to grammatical transformations or to musical transformations in the traditional senses, such as transposition. The authors readily admit the inability of their system to adequately consider these transformational incidences.

Grouping analysis is characterized by successive slurs (arcs) under the music. Dominating and strucrally stronger heirarchic levels are shown by longer slurs underneath the shorter (weaker) level slurs These slurs delineate the boundaries of sound groupings such as motives, themes, phrases, sections, and the entire piece. In addition, the grouping analysis helps to expose symmetry and parallelism in structure.

Metrical structures help define the grouping structure in an interactive fashion. The metrical analysis is also below the music, immediately above the grouping anlysis slurs. The heirarchic indicators are rows of dots; the more dots in a row, the stronger the metrical weight. For the most part, these simply seem to show the metrical tendencies of the chose tie signature, since the authors choose not to indicate elements with heavy tonal weight but weaker rhythimic placement. They claim that these are more clearly exposed if the weak beat is defined, with groupings and time span reductions exposing the structural weight of such elements. The metrical analysis does, however, reinforce the applied metrical weight at grouping boundaries. The criteria for metrical heirarchic preference are the cues for strong beats, parallel relationships to grouping structures, and the regularity of the patterns.

Time span reduction and its converse, time span elaboration, consider metrical and grouping analysis and combine them with other criteria for heirarchic organization. The result is a tree-like structure, with the smallest offshoots indicating subordination to their host branches. The dominant structural elements are selected by determining their coherence in relation to the structure; more clearly, their structure is relation to the rest of the structure. The most stable structure is considered the tonic, and consonances are more stable than dissonances. Melodic notes that correspond to chordal roots add stability to a chord, and the relative closeness of two chords on the circle of fifths dictates their stability. Thus we see that harmonic principles discussed earlier are employed on this level. A cadence is a principle element of structural stability. Because the relationships defined must be discrete and specific, branches may not cross. Larger branches indicate the boundaries of principle dominating groupings. In addition this tree indicates the duration of an elaborational grouping. It is perhaps the most useful element here for exposing parallel and embedded structures. Structural groupings are further clarified with a “b” indicating a structural beginning and a “c” indicating a structural ending or cadence.

The harmonic elaboration is exposed through the prolongational reduction, another tree-like structure with corresponding musical examples beneath it. The tree signifies heirarchical harmonic relationships without the primary metrical concerns of the time span reduction. Of primary significance here is the concept of progression in a piece. Branches to the right indicate progression, branches to the left show that a harmonic event is within the domain of a higher harmonic structure. Branches are joined at nodes, unlike branches in the time span reduction. This indicates the shared grouping of elements, rather than mere heirarchical relationships. The heriarchic levels are indicated by letters.

“A” is the lowest, most dominant level. In Schenker’s terms it is the “Satz.” (21) It is the fundamental harmonic event of which all successive levels are elaborations. In a sense, it is the deep structure; it exposes the primary dominant/tonic resolution that dominates the entire composition. As before, the similarity to linguistic terminology is apparent, but symbolically the heirarchic levels are different. Each subordinate level is merely a constituent elaboration of a dominant harmonic event; that perceived by the listener as more fundamental to the structure of the piece. The musical structures here progress through the tree in a way that most closely approximates context free structures on linguistic trees.

Thus, the prolongational reduction gives us information that is very different but actually as useful as information gained from the time span reduction. As mentioned earlier, a comparison of the two will indicate how the structures interact contributing to the final effect and perception of the music, Because of the thorough nature of this syntactic investigation, it may very well give us more powerful tools for examining the cognitive processes of creating and perceiving music.

Examples of the system… (were given on the attached three pages), included above and forthwith. Example 1 s labeled and illustrates the locations of the analytical and elaborational components of the system. Examples 2 and 3 give more substantial application of the system to a section of Schumann’s “Liederkreis.” Example 2 contains grouping, metrical, and time span analyses. The heirarchic levels of the time span elaboration are given in skeletal musical form below the group analysis. Letter to the left of the staff correspond to heirarchic branches on the tree. Example 3 is laid out similarly. It illustrates prolongational reduction, and level “e,” the deepest and most dominant level, is exposed in musical notation on the bottom, corresponding to the trunk of the tree on the top. Example 4 demonstrates examples of crossing branches or musical transformations.

Although the analytical systems examined so far can give us significant information about how a musical composition is perceived and understood, the conclusions that can be made about musical competence are limited. Important distinctions must be made between the processes of creating and perceiving music. Except for vocal music, which employs to some extent the same kinesthetic elements as speech, musical creation evolves from learned kinesthetic conditioning, vastly different from innate perceptual observations and understandings. Athletes, dancers, and musicians alike will all agree that patterns of movement and muscle usage become physiologically established and engrained with age. Once they are learned, they become harder to change. Patterns of muscle movement are easiest to develop at an early age, before usage patterns have become established. (22) A well learned muscular usage technique will appear to have a certain grace and ease simply because it lacks any unnecessary movements unrelated to the task it is intended to perform. This why virtuoso musicians appear to perform music with startling ease; they are performing with startling ease. (23) Failure to distinguish between the effects of kinesthetic conditioning and perceptual processes sometimes leads to incorrect conclusions. The Japanese educational philosopher and violinist Dr. Shinichi Suzuki has stressed the importance of early musical exposure for the development of musical competence. (24) However, he claims that an “ear for music” is not an innate ability, but a cultivated skill. His claim is that early exposure to “good music” will help develop a good musical sense. The success of his methods are often taken to support his claim. He suggest that “bad” music will develop a “bad ear.” The indications he has provided have implied that musical perception is characterized by rapid learning with limited exposure to possibly “degenerate” date. This would indicate that tonality is an innate cognitive structure. An important prerequisite for a physical action to be properly executed is an accurate “goal conception,” or a thorough mental and physical understanding of what action is to be performed. We might then conclude that it is not necessarily the ear that is “bad,” but rather it is an inaccurate goal conception and insufficient kinesthetic conditioning that exists to support the tasks of goal achievement. Suzuki’s claims concern the tasks of conditioning motor abilities to achieve a goal conception (musical pattern) which may be unclear because of limited exposure to the pattern rather than a discordant conception of tonality. Nevertheless, there can be no doubt that early kinesthetic conditioning when coupled with musical exposure can greatly facilitate development of musical performance skills with limited exposure to degenerate data.

We have thus seen that music consists of a non-representational symbolic structure. Although it is non-representational, it has relational qualities on a syntactic level. These determine whatever “meaning” a piece of music may have. The distributional elaborational patterning of these symbolic elements often resembles the patterning of referential symbols. This, coupled with the occurrence of common musical patterns proposed as universals (eg: the pentatonic scale), suggest innate structurings of language and music. However, to make any conclusions about innateness in music, a broader examination of diverse cultural musical forms must be undertaken. Clearly the musical elements of different cultures are not uniform. West tonality is not evident in the music of many primitive cultures. Cultures that employ tone languages incorporate rhythmic elaboration to a greater extent in their music. Perhaps, then, musical form is developed within a culture as a form that can express what language cannot; an innate parallel development responsive to speech. Although its form differs, it’s purpose and “meaning” may be universal. If this is so, the importance of coincident examination of language and music cannot be overemphasized.

The analytical systems discussed here have principally concerned themselves with western tonal music and western languages. To attempt to define musico-linguistic universals, the characteristics of non-western music and languages must be examined with the similar rigor and methodology. We have seen that musical structures can be broken down into constituent components, organized somewhat similarly to linguistic elements, although the expressive purposes of the two forms differ. If similar structural parallels can be observed in the music and languages of diverse cultures, only then can we begin to draw conclusions regarding the innate qualities of music and language.

This relationship may indeed be supported by physiological research. Neurological disorders have seemed to indicate that cognitive processing tasks common to both music and language may utilize the same area of the brain. This would seem to indicate shared innate structures

Within western tonal music we have found that harmonic overtones resemble, to some extent, tonal harmonic theory. They support “tonal gravity,” in the psycho-acoustic sense. They help to delineate a concept of “stability.” As a result, referential relationships are established. Harmonic qualities imply and demand resolution, but composers deny the resolution outright, graudally work towards it, or surrender to it.

We have seen that western tonal music can be broken down into heirarchical levels that resemble, to a certain extent, linguistic heirarchies. Fundamental harmonic events govern elaborations in a manner similar to the dominance of the deepest structural linguistic phrase constituents. Jackendoff and Lehrdahl’s analytical system provides a method better suited for a thorough analysis of music’s determinant dimensions (tonal, harmonic, and rhythmic) than any previously developed. The interactive relationships of these dimensions certainly warrant further investigation. The non-representational symbolic dualities of music suggest that computers could be a powerful tool for the exposition of these relationships, however the initial determination of these heirarchic classifications is of a somewhat subjective and necessarily human nature.

We have examined Langer and Bernstein’s philosophical ideas concerning meaning in the non-representational realm of music. If we accept the idea of music as a language that expresses what speech cannot and the idea of music as a metaphorical language, we can proceed to examine musico-linguistic relationships with some cogent presuppositions:

The meaning of music is non-representation; the relationships exposed within the form will be those of primary interest.

The metaphorical components of a composition can provide clues about the relational meaning of a composition. The organization and transformation of these metaphors can provide clues about the nature of the compositional process.

If we are seeking insight into the nature of the compositional process and are hoping to discover linguistic structural parallels, we might expect these parallels to arise within the organizational structures. What we are examining, then, is the way in which humans organize symbols. The concept of meaning is not the chief concern, but rather the structuring of the communicative processes. Evidence of innate structures common to both would be provided if we found that similar deterministic models could account for heirarchic subordinate/dominant and elaborational orderings in both areas. We should keep in mind that the musical composers and performers we are discussing are but a small subsection of the cultures being discussed. Kinesthetic conditioning distinguishes them from non-musicians, and if we begin to discover cross cultural similarities, they do not necessarily represent universal qualities; they could be innate qualities of only a minute subsection of these diverse cultures. The systems we have examined here have revealed some interesting parallels between the organization of western tonal music and language, yet in a sense they are just a beginning. Through further application of their procedural analysis we may find they will be revised and adapted to a diverse range of cultural conditions. This could certainly provide a fascinating myriad of discoveries and insights into the process of human though. As yet, however, we are still at the tip of the iceberg.

1. Roman Jakobson, “Verbal Communication” Scientific American, Sept. 1972.
2. Suzanne Langer, “Philosophy ina New Key” (NYC: Mentor, 1951).
3. ibid.
4. ibid.
5. MacDonald Critchley et al., “Music and the Brain: Studies in the Neurology of Music,” (London: William Heineman Medical Books, 1977).
6. Noam Chomsky, “Language and Mind,” (NY: Harcourt Brace Jovanovich, 1968).
7. Peter Lindsay & Donald Norman, “Human InformationProcessing,” (NY: Academic Press, 1977).
8. John Lyons, “Noam Chomsky,” (NYC: Viking Press, 1970).
9. Lindsay & Norman, op. cit.
10. Lyons, op. cit.
11. Lindsay & Norman, op. cit.
12. Noam Chomsky, “Reflections on Language, (NYC: Pantheon 1975).
13. ibid..
14. ibid, p. 12.
15. Norman Geschwind, “Specializations of the Human Brain, “Scientific American,” Sept. 1979
16. Robert Ornstein, “The Psychology of Conciousness,” (NJ: Penguin 1972).
17. Leonard Bernstein, “The Unanswered Question,” (Cambridge: Harvard University Press, 1976).
18. Felix Salzer, “Structural Hearing: Tonal Coherence in Music: Vol. I & II,” (NY: Charles Boni, 1952).
19. Igor Stravinsky, “The Poetics of Music,” (NYC: Vintage, 1959).
20. Salzer, op. cit.
21. ibid.
22. John Drowatsky, ,”Motor Learning; Principles and Practices,” (Minneapolis: Burger Publications, 1975).
23. Gerhard Mantel, “Cello Technique: Principles of Form and Movement,” (Bloomington: Indiana University Press, 1972).
24. Shinichi Suzuki, “The Suzuki Concept,” (Berkeley: Diable Press, 1973).

1.) Bernstein, Leonard. “The Unanswered Question.” Cambridge: Harvard University Press, 1976.
2.) Chomsky, Noam. “Language and Mind.” NY: Harcourt, Brace, Jovanovich, 1972
3.) Chomsky, Noam. “Reflections on Language.” NYC: Pantheon, 1975.
4. Critchley, MacDonald, et al., “Music and the Brain: Studies in the Neurology of Music,” (London: William Heineman Medical Books, 1977).
5.) John Drowatsky, ,”Motor Learning; Principles and Practices,” (Minneapolis: Burger Publications, 1975).
6.) Esau, Helmut. “Language and Communication.” Columbia, SC: Hornbeam Press, 1980.
7.) Geschwind, Norman. ‘Specializations of the Human Brain,’ “Scientific American,” Sept. 1979
8.) Hindemith, Paul. “A Composer’s World.” Garden City, NY: Anchor Books, 1961.
9.) Jackendoff, Ray. ‘The Unanswered Question: Review Article,’ “Language,” V. 53, no. 4, 1977.
10.) Jackendoff, Ray and Fred Lehrdahl. ‘Toward a Formal Theory of Tonal Music,’ “Journal of Music Theory,” 1977.
11.) Jakobson, Roman. ‘Verbal Communication’ “Scientific American,” Sept. 1972.
12.) Kendall, John. “The Suzuki Violin Method in American Music Education,” Reston, VA: Music Educators National Conference, 1966.
13.) Langer, Suzanne. “Philosophy in a New Key.” NYC: Mentor, 1951.
14.) Lindsay, Peter & Donald Norman, “Human InformationProcessing,” (NY: Academic Press, 1977).
15.) Lyons, John. “Noam Chomsky,” (NYC: Viking Press, 1970).
16.) Mantel, Gerhard. “Cello Technique: Principles of Form and Movement,” (Bloomington: Indiana University Press, 1972).
17.) McLaughlin, Terence. “Music and Communication” 1972.
18.) Messaien, Olivier. “The Technique of My Musical Language.” Paris: Alphonse Leduc, 1942.
19.) Nasmeur, Eugene., “Beyond Schenkerism.” Chicago: Univ. Of Chicago Press, 1977.
20.) Nketia, J.H. “African Music Culture.” Harvard University lecture and tape.
21.) Ornstein, Robert. “The Psychology of Conciousness,” (NJ: Penguin 1972).
22.) Salzer, Felix, “Structural Hearing: Tonal Coherence in Music: Vol. I & II,” (NY: Charles Boni, 1952).
23.) Stravinsky, Igor, “The Poetics of Music,” (NYC: Vintage, 1959).
24.) Suzuki, Shinichi, “The Suzuki Concept,” (Berkeley: Diable Press, 1973).

A Generative Theory of Tonal Music (The MIT Press) 

The entire Leonard Bernstein lecture series I refer to can be viewed on YouTube. There are six lectures. These are the lengths of each:
1) 1 hour 08 mins.
2) 1:36
3) 2:23
4) 2:23
5) 2:14
6) 2:58

The Leonard Bernstein Norton Lecture Series, “The Unanswered Question” on YouTube