Post –46 by Gautam Shah (Lecture series: Space Perception -Article-I of 15)
Sound is caused by disturbance or vibration in an elastic material. That energy of vibration is perceived as a subjective experience by the human ear. Ears capture, transmit and transduce the sounds by discriminating the sounds of different frequencies, and perceiving the same in different manners.
Human ear can perceive sound above infra-sound 20 Hz, and below ultrasound 20000 Hz, but more importantly human ears can discern information from sound and noise. Range of Human voice is from 60 Hz to 10000 Hz, but 90 % intelligibility occurs 200 Hz to 4000 Hz.
A modern good quality PA system should be capable of 100 Hz to 6000 Hz and preferably 10000 Hz. For music the PA system should be 80 Hz to 10000 Hz and up to 15000 Hz for high quality theatre type of installation. Telephone voices have peculiar ‘unnatural’ feel because voice frequencies below 400 hertz and above 3,400 hertz are eliminated.
When machines are taught to speak like humans, the process of learning (AI) is to break human speech into phonemes (each of 30 micro seconds slots). This forms the basic set and used for enunciation, so when robots speak, they sound monotonic, stilted and mechanical.
Human speech consists of two parts: vowels and consonants. In general vowels are easily recognized because they are distinctive and especially the `deeper’ or the longer vowels occupy more time than any other speech component. They also consist mostly of lower speech frequencies. The formative characteristics of the mouth, based on the cavity resonance, are responsible for vowel sounds, and the main vehicle for the intelligibility of the speech.
Consonants are less easy because they occupy a very short time and so seem transient. These mostly are of higher speech frequencies (1200 Hz). There are many more of them than vowels and so offer speech audibility and perception. Consonants provide the rich sound variants that make different speeches different.
In addition to the formats, sibilants, it is a consonant, with characteristic hissing sound (such as sh, s, z, and zh), and stops of various types (consonant sounds such as b, p, d, t, g, k) are characterized by the momentary blocking of some part of the oral cavity, help in high intelligibility.
The sound perception and cognition system has the ability of compensating and filling in the required information in terms of vowels, consonants, and even words into speech or sentences. The time required to fill in the required information is provided by the quality of acoustics of the space. ‘A longer reverberation seems to elongate the spoken sound in time scale, but an excess of reverberation may mask the following sounds. A fast orator in a reverberating hall fails to impress the audience, whereas a slow speaker in well absorbent and non reverberating space may seem discontinuous.’
Speech intelligibility depends on the quality of space. The space, size, shape, materials and the PA system (if any) define how the speech will be perceived. Seasoned speakers or stage performers (actors, singers) have innate sense on how to improvise the tonal quality of delivery. They overcome (masking) the effect of background noise by raise the voice and change the range of frequency. In spaces with longer reverberation the pauses are widened. Speakers face the section of crowd they want to address, to direct the original sound and allow them (section of crowd) to read the lips and body gestures.
Speakers (orators, actors and singers) and Listeners, all hear original sound as well as reflected sounds, but in completely different spatial context (space, size, shape, materials and the PA system). For listeners the most important matter is the identification (real or mythical) of the source of sound, in spite of the ‘presence of many reflected sounds’. This helps in personalization or being part of the event. However if the time gap between the hearing of original sound and reflected sound is more than two seconds, the localization begins to be difficult. In long or a deep hall the P.A. system sounds arrives stronger and even before the arrival of direct sound creating confusion including visual and aural synchronization of lip and other body posture-gesture language with the spoken sound.
In case of speech, a short sound reverberation time, implies high absorption, which makes, in the rear seats, a speech registration difficult. On the other hand, a long reverberation time means, the sound of each syllable is heard against the reverberant sound of previous syllables.
This is the I st article of the series on SPACE PERCEPTION