In this paper, I make the following claims: (1) Subjective experience is tremendously useful in guiding productive research. (2) Studies of auditory scene analysis (ASA) in adults, newborn infants, and non-human animals (e.g., in goldfish or pigeons) establish the generality of ASA and suggest that it has an innate foundation. (3) ASA theory does not favor one musical style over another. (4) The principles used in the composition of polyphony (slightly modified) apply not only to one particular musical style or culture but to any form of layered music. (5) Neural explanations of ASA do not supersede explanations in terms of capacities; the two are complementary. (6) In computational auditory scene analysis (CASA) – ASA by computer systems – or any adequate theory of ASA, the most difficult challenge will be to discover how the contributions of a very large number of types of acoustical evidence and top-down schemas (acquired knowledge about the sound sources in our environments), can be coordinated without producing conflict that disables the system. (7) Finally I argue that the movement of a listener within the auditory scene provides him/her/it with rich information that should not be ignored by ASA theorists and researchers.
Three experiments explored the relationship between chroma-salience profiles of individual chords and tone profiles obtained after short chord progressions. Musicians' tone profiles for diatonic progressions of one, two, and three chords were compared with predictions of three models: a bottom- up stimulus model (number of times each chroma occurs in the progression), a top-down or schema-driven key model (best-fitting key profile of C. L. Krumhansl & E. J. Kessler, 1982), and an intermediate pitch model that includes both top-down and bottom-up components (cumulative pitch salience; R. Parncutt, 1989, 1993). For single chords, all predictors significantly matched tone profiles, except the key model applied to the diminished triad. For pairs of chords, the pitch and key models consistently outperformed the stimulus model, consistent with the assumption that a (top- down) key had been established; in the pitch model, the second chord influenced the tone profile more than the first (recency effect). Progressions of three chords comprised forward (e. g., FG-C) and backward (C-G-F) cadences in major and minor keys. The pitch and key models were successful for all progressions, but the key model predicted the tonic of backward cadences in C major and minor to be F. Predictions of the stimulus model were clearly worse than those of the other models, especially for backward cadences. Both primacy and recency effects were observed. In summary, the pitch model was the most consistently successful model over all experiments. To successfully predict tone profiles following chord progressions, it was necessary to account not only for recency (and primacy) but also for variations in pitch salience within chords. Results are consistent with a model of tonality induction in which bottom-up processes interact in real time with top-down processes of two kinds: recognition of harmonic pitch patterns and recognition of key profiles.