|
EchoMatic Project Description
Bats, dolphins and blind people all use echo location to navigate and function effectively in their respective surroundings. Bats emit chirps at frequencies of 32 KHz which allows them to resolve the position, velocity and acceleration of objects the size of mosquitos. Dolphins use clicking to locate fish in murky water. Blind people use the tap of their cane not only to sense physical obstruction, but to obtain phase information from the acoustic return of the tap. Some blind people use tongue click location, but this regrettably, this is often discouraged by sighted teachers for social reasons. Blind freestyle bicyclists have defied this convention to navigate in urban environments with some success. The advent of digital signal processing (DSP) opens up the possibility of using phase sensitive processing. This enables creation of a sound picture of the environment that is more vivid than what is possible with tongue click location. The purpose of the Echomatic prototype is to take a sound generated by the user in the range of speech (150 to 2500 Hz), and multiply it by a user controllable factor. The resulting utterance can be transmitted in a tone burst (9600 to 200000 Hz). The current project goal is to transform vocalizations that the user has spoken or whistled six octaves above the original sound, a 64 fold frequency multiple. At these frequencies the sound becomes inaudible. These high pitched, inaudible sounds have the potential to provide the user with higher spatial resolution and phase information than lower frequencies provide, from bandwidth considerations. Ultrasonic sounds reflect and refract from objects in the environment and are returned to the user in a manner analogous to sonar. The resulting sounds are then further processed, down converted, and provided to the user as feedback. By allowing the user to maintain a nearly continuous feedback loop, the user is provided a pseudo visual of their environmental surroundings. Normal human eyes provide excellent spatial resolution (11,000 pixels squared), but relatively poor temporal frequency response(18 Hz). The human auditory system provides poor spatial resolution, but excellent temporal frequency response (14 - 18 kHz). So the essential feature of our system is to map the spatial detail that would have been perceived and processed by the users visual system, into the auditory system, exploiting the strong frequency response characteristics.. There are two options when a user is interacting with a new environment. The user can create an utterance that is best for "sounding out" the new surroundings. Or when this is too fatiquing a chirp toneburst is broadcast at high ultrasonic volume to the surroundings. 80 kHz signals have a range of about 6 feet, while 40 kHz signals have a range of about 40 feet. If no new chirp is put in the buffer, the user can elect to retransmit the previous chirp by clicking a thumb switch as often as desired while turning their head to detect phase differences in the acoustic return. The chirp, constructed from the users vocal apparatus may be as long as 4 seconds, but will typically be a short whistle perhaps a quarter of a second long. The aboriginal tongue click, a wooden sound used by the blind freestyle bicyclists has a duration of 15 milliseconds, an average frequency of 1 kHz and a peak frequency of about 2.5 kHz. After being up-multiplied this vocalized signal will be 64 times shorter in duration than the original signal. Therefore the transmitted chirp will typically be a short chirp lasting 4 milliseconds. The upmultiplied tongue click will have a duration of 0.23 milliseconds.
Details, Details Objects close to the user will return more rapidly, and will require a shorter duration chirp. Sound travels approximately 1000 feet per second in dry air. Therefore a 63 ms chirp that required 4 seconds to enunciate will travel 64 feet in the time it takes to transmit it, neglecting processing time. Since the sound must make a round trip it must travel twice the distance, so an object could be 31 feet away for this first case. A short chirp will travel 4 feet in the time it takes to enunciate it, sufficient for an object 2 feet away, so as the user gets closer to reflective objects, shorter chirps will be required to resolve them. The tongue click can sound objects as close as 1.4 inches. In order for the chirp to be frequency multiplied, it must be sampled at a rate at LEAST twice that of its upmultiplied frequency according to Nyquist. The whistle is the worst case, with a outgoing frequency of 196 kHz. For the prototype system a 200 kHz target sampling frequency will be used.
If you are running Internet Explorer, Click Here to Run Spreadsheet.
lvw 10/8/2000 11:37 PM revised 3/29/2002 11:31 AM
|