A New Speech Coding Strategy for Cochlear Implant
Transcription
A New Speech Coding Strategy for Cochlear Implant
Journal of Medical and Biological Engineering, 30(5): 335-342 335 Technical Note A New Speech Coding Strategy for Cochlear Implant Wei-Dong Wang1,†,* Hong-Yun Liu1,† Hu Yuan2 Qing Ang1 1 Department of Biomedical Engineering, General Hospital of Chinese PLA, Beijing 100853, China 2 Department of Otolaryngology, General Hospital of Chinese PLA, Beijing 100853, China Received 5 Aug 2009; Accepted 1 Feb 2010; doi: 10.5405/jmbe.30.5.10 Abstract Cochlear implants are widely accepted as the unique and most effective ways for individuals with severe to profound hearing loss to restore some degree of hearing. Speech coding strategies play an extremely important role in optimizing the cochlear implant user‟s communicative potential. Various speech coding strategies have been developed in the past fifty years to simulate the peripheral auditory system as naturally as possible. Most of the strategies are used to mimic the human cochlea‟s spatial encoding pattern, which stimulates auditory fibers at given frequencies based on the frequency characterizations of speech. However, these strategies cannot simulate the human cochlea‟s temporal encoding pattern well. Also, current spreading as well as channel interactions are major problems. This paper presents a new solution, which generates stimulating pulsatile series at zero-crossings in the domain of wavelet transform, called wavelet zero-crossings stimulation (WZCS). With amplitude modulation and phase information (zero-crossings) encoded, the WZCS is aimed at improving the recognition of tonal language speech and speech in multi-talker backgrounds. WZCS, frequency amplitude modulation encoding (FAME) and continuous interleaved sampling (CIS) were compared by computer simulation, and hearing test experiment results showed that the recognition of speech synthesized through WZCS was better in both quiet and noisy environment than CIS strategy. Results of the experiment also showed significant improvement with WZCS over FAME on tone recognition, both in quiet and noisy conditions. Further study demonstrated that WZCS could keep the temporal cues (or phase information), and some fine structure of speech remained in the stimulating pulsatile series. Most important is that the correlations of original sounds were found to be obviously higher than in signals reconstituted through CIS and FAME strategies. Thus, the application of WZCS strategy to cochlear implants may be a significant improvement. Keywords: Stimulating pulsatile series, One-octave wavelet transform, Zero-crossings 1. Introduction Cochlear implants are widely accepted as the unique and most effective way for individuals with severe to profound hearing loss to restore some degree of hearing. Typically, these medical interventions consist of a microphone, a speech processor, a transmitter, a receiver, and an electrode array which is located inside the cochlea. The speech processor is responsible for decomposing the input audio signal into different frequency bands or channels and delivering the most appropriate stimulation pattern to the electrodes [1,2]. Speech coding strategies play an extremely important role in optimizing the cochlear implant user‟s communicative potential, and new and better speech coding strategies have led to great strides forward during the past 30 years in the performance and widespread application of cochlear implants. In a word, remarkable progress has been made in the † These authors contributed equally to this work * Corresponding author: Wei-Dong Wang Tel: +86-010-66936921; Fax: +86-010-66936921 E-mail: wangwd301@126.com development of speech coding strategies and cochlear implants, but much room still remains for improvements and enhancements, especially for patients presently at low recognition in noise and for the problems of current spreading and channel interactions [1-5]. So far, there are two types of methods to synthesize the stimulating pulsatile sequences for cochlear implants. One school of thought, which is based on the vocal model, extracts fundamental frequency (F0), formants (F1, F2, F3) and other parameters of speech signal to synthesize corresponding stimulating pulsatile sequences; this approach is defined as feature extraction [6-8]. F0/F1, F0/F1/F2, F0/F1/F2/F3 and Multi-peak are all this kind of scheme. Another type coding, called filter bank strategy, is based on the hearing model. The principle of this kind strategy is that speech signal is passed through a digital band-pass filter bank and then is processed respectively to generate the stimulating pulsatile sequences. Filter bank strategy includes spectral maxima sound processor (SMSP), spectral peak (SPEAK), compressed analog (CA) and continuous interleaved sampling (CIS), asynchronous interleaved sampling (AIS), frequency amplitude modulation encoding (FAME) and so on [8-15]. In cochlear implants, 336 J. Med. Biol. Eng., Vol. 30. No. 5 2010 One-actave Wavelet transform Φ1(t) Absolute value LPF Sampling holder Modulation Differentiator A × Delay + Threshold1 Σ 1/Z - + Σ LPF Asynchronous algorithm Sampling holder Modulation Differentiator A Microphone MIC Delay Preemphasis Φ2(t) . . . + Threshold2 Σ 1/Z - + LPF 0.5 - Sign detector Absolute value × Σ Amplifier Asynchronous algorithm Sampling holder + ThresholdN Σ 1/Z - Cochlear implants simulate the physiological mechanism of normal hearing, so the hearing induced by electrical stimulation is different from that of acoustic stimulation. It is thought that by filtering audio information into multi-frequency bands and selectively stimulating the different locations of basilar membrane through electrode implanted in the cochlea, then sound information can be recognized by the brain [16-21]. The auditory system has traditionally been viewed as a frequency analyzer, which provides a faithful spectral representation of the acoustic waveform for higher-level processing. Also, studies indicate that the human cochlea is like a set of band-pass filters with equal relative bandwidth and regular central frequency distribution; when passed through the cochlea, the speech signal is wavelet-transformed and outputs of each band-pass filter activate corresponding auditory nerve fibers [21]. However, the insulation of implanted electrodes in cochlear implant systems is not so . . . × Σ 0.5 electrode Amplifier Figure 1. block diagram of WZCS strategy. 2. Materials and methods E2 electrode - Sign detector either feature extraction or the filter bank strategy is applied; the former strategies provide too little while the latter type provides too much indiscriminate information. Although cochlear implants system based on these strategies can already restore partial hearing to the deaf person, there are still great variations among individuals in their speech-communication ability. The problems of current spreading, channel interaction and „fixed stimulating frequency‟ and other issues are not solved well. This paper presents a new solution, which generates stimulating pulsatile series at zero-crossings in the domain of wavelet transform, called wavelet-zero-crossings stimulation (WZCS). This solution can mitigate the interactions among channels and preserve the temporal cues and some fine structures of speech remaining in stimulating pulsatile series, and with the flexible stimulating rate determined by original acoustic signal itself, which is not provided by some other strategies [16,17]. + D Modulation Differentiator D EN A Delay ΦN(t) E1 electrode - Amplifier Sign detector Absolute value D 0.5 Asynchronous algorithm good as hair cells, so it is impossible to eliminate the interactions between electrodes. To mitigate the interactions between electrodes, conventional strategies utilize interleaved sampling pulses for stimulation, which consequently result in the breakage of temporal cues and some fine structure of acoustic signal. The CA approach can keep the temporal cues and fine structure of speech, but the perception is adverse because of the interaction among electrodes. Other strategies with fixed-frequency biphasic pulse modulation also destroy the temporal cues and fine structure of audio signal [20,22]. Combining the amplitude modulation and frequency modulation appropriately, a strategy based on one-octave wavelet transform and zero-crossings is proposed in this paper. The strategy, which preserves temporal cues and some fine structure of original speech remaining in stimulus signals, is expected to enhance speech perception in noise, as well as tonal language recognition. Figure 1 is a flow diagram representing an acoustic simulation of the WZCS strategy. In the functional block diagram, input audio signal, which is captured by a microphone, is pre-emphasized to compensate the high-frequency components. Then the emphasized signal is presented into a set of wavelet functions with center frequency arranged from low to high to implement one-octave wavelet transform. The outputs of N channels are processed through two independent parallel pathways to extract the amplitude envelope and zero-crossings, which includes phase information in each band. Wavelet function Φ(t) is selected to implement one-octave continuous wavelet transform. The one-octave wavelet transform is a linear operation that decomposes the audio signal into components that appear at different scales. At each scale, the amplitude envelope of one-octave wavelet-transformed acoustic signal is extracted through full-wave rectification and low-pass filtering. The cut-off frequency of the low-pass filter determines the slowly varying rate information preserved in the envelope. In addition, the sampling holder box keeps the envelope detection and zero-crossing pulse extraction synchronous. Speech Processing Strategy for Cochlear Implant Simultaneously, in another pathway, the output audio signal of one-octave wavelet transform in each band is subtracted by a threshold, which is determined by the characteristics of noise at each scale, to decrease the noise effect [23]. The sign detector generates positive pulse when the processed signal passes through baseline from positive to negative, and it generates negative pulse when the processed signal passes through baseline from negative to positive. Delaying the pulse series generated by the sign detector one-unit time in each band, and zero-crossing pulsatile series (FM of phase information) are picked out through using the delayed signal to subtract the un-delayed pulse series. Sampling hold can keep the envelope signal and zero-crossings signal synchronous, and the amplifier make the amplitude of zero-crossings pulses equal 1. Then the stimuli are obtained by amplitude modulating each band‟s zero-crossings (frequency and phase information). The stimuli of each band is processed by an asynchronous algorithm, which utilizes a software program to detect the pulses of 8 channels at anytime; if there are two or more channels in which the pulse appears at the same time, the asynchronous algorithm will make the pulses stimulate in proper order to guarantee that only one electrode is stimulated at one time. Finally, the differentiators make the pulse biphasic to keep charge and current balance, synthesized speech signal can be obtained by summarizing each sub-band‟s stimuli. It is noticeable that the algorithm for generating simulation pulsatile sequences between WZCS and other strategies is absolutely different. Though the amplitudes of activating pulses is determined by the envelope in WZCS, CIS and SMSP strategies, time order of sequences or stimulating rate in AIS, CIS and SMSP is artificial and fixed, while in WZCS, that is dependent on the audio signal itself. Zero-crossings of the acoustic signal contain some phase information so the stimulus obtained from the WZCS could preserve some fine structure of acoustic signal. It is essential to point out that the analytical one-octave wavelet used in wavelet transform is diverse, such as Mexican hat function, Meyer wavelet, Gaussian function and so on. Take the Meyer wavelet for example; the band-pass characteristics of Meyer wavelet are shown in Figure 2(a), and the Fourier transform of Meyer is shown in figure 2(b). What is more crucial is that the Meyer wavelet is biorthogonal; sequentially, it can be configured as one-octave function according to the characteristic of band-pass. With this difference distinguished from conventional audio signal processing measures based on filter bank, the output of each channel can be reconstructed completely according to the zero-crossings of its output signal. That‟s the famous Logan‟s theorem; we describe in some detail the theorem because it provides a good understanding of mathematical issues [24,25]. Let f ( x ) L2 , L2 denotes the Hilbert space of measurable, square-integrable onedimensional functions, and let us suppose that the Fourier transform of f(x) has a support included in one-octave intervals. Logan‟s theorem proves that if f(x) does not share any zero-crossings with its Hilbert transform, then it is uniquely characterized by its zero-crossings. 337 (a) (b) Figure 2. Meyer wavelet and its spectrum. (a) Temporal characteristic of Meyer wavelet. (b) Amplitude and frequency characteristic of Meyer wavelet. WZCS is a type of strategy that incorporates spatial code and temporal code from the aspect of hearing physiology while taking mathematics and signal processing into consideration; it is as well an approach with multi-resolution in both temporal and frequency domains. Comparatively, other strategies, taking CIS for instance, could damage the temporal fine structure of acoustic signal, in which only the variation of intensity remains. The frequency resolution of the cochlea is about 30 Hz at 1000 Hz through calculation, but after being processed by the auditory central nervous system, it can be evaluated to 3 Hz. These indicate that cochlea not only processes the audio signal, but also provides excitation pulsatile series which could be perceived effectively by auditory central nervous system [26]. So with temporal code included in the WZCS strategy, the audio signal processed by one-octave wavelet transform can keep their temporal characteristics on the basis of zero-crossings. In order to demonstrate the advantages of WZCS compared to CIS and FAME strategies, corresponding measure was taken to compare the characteristics of the stimulus generated by these strategies. In the procedure of computer simulation, bandwidth of filters selected in CIS and FAME was the same as those in WZCS. All of them were 30-60 Hz, 60-120 Hz, 120-240 Hz, 240-480 Hz, 480-960 Hz, 960-1920 Hz, 1920-3840 Hz and 3840-7680 Hz. Figure 3 represents a piece of Chinese speech signal, “da jia hao”. Figure 3. Original speech signal. 338 J. Med. Biol. Eng., Vol. 30. No. 5 2010 We also conducted a hearing test experiment to resolve the differences in speech perception among CIS, FAME and WZCS strategies in quiet and in noisy environment, with the SNR of processed test materials fixed at 5 dB under the latter circumstance. Fifteen normal hearing and well-educated subjects were recruited and then listened test materials through headphones. Thirty Chinese sentences, 40 Chinese words and 50 tone variations (level tone, rising tone, falling-rising tone and falling tone) of Chinese characters were processed by CIS, FAME and WZCS, respectively, and then were presented to the subjects to test speech recognition [25,26]. All speech test materials were digitized at a sampling rate of 16 KHz and stored in a 16-bit format. All subjects were arranged in a sound-attenuated laboratory to perform the experiment. In the tone recognition experiment, a custom graphic user interface was created by MATLAB, to present 50 Chinese characters with different tones and collect responses. When a processed Chinese character with certain tone was presented stochastically, the subject had to choose an answer which they thought was correct by clicking the button corresponding to the presented tone. When one test condition was finished, the percent correct score was calculated for further statistical analysis. The synthesized stimulus was presented via a headphone (HYUNDAI CJC-860A), and the order of all experimental conditions was randomized for each subject. In the word and sentence recognition experiment, the subject was presented with words and sentence, respectively. The subject was instructed to type in as many words as possible from the words or sentence through a computer keyboard. The number of correctly recognized words was calculated to produce the final recognition rate. All words or sentences were presented stochastically through the headphone as well. Before the experiment, some synthesized speech materials were provided to the subjects via the headphone for practice, the procedure lasted about 5 minutes. In noisy environment, the SNR of synthesized speech materials equaled 5 dB and other conditions were the same with those in quiet. During the test procedure, guessing was encouraged, but no feedback was given after the experiment in tone, words and sentence recognition. 3. Results The results of computer simulation for CIS, FAME and WZCS are shown in Figure 4. From 1st to 8th channel‟s stimulus obtained through CIS, FAME and WZCS strategies are shown in Figure 4(a), respectively; the envelope of corresponding channel for both strategies was almost the same, but the stimulating rate of corresponding channels is very different. Figure 4(b) is the zoom-in details of Figure 4(a). As shown in Figure 4(b), for CIS strategy, the 8 channels‟ stimulating rates were the same and fixed at 900 pps. For FAME, the zero-crossings in each band were frequency-modulated by the corresponding band‟s center frequency and then band-limited by using slowly varying FM component to generate pulses. Thus the stimulating rate of FAME in each band was limited at about 400 pps [15]. For WZCS strategy, the stimulating rates changed from about 50 pps to more than 4000 pps in 8 channels, which were determined by the all the zero-crossings of wavelet-transformed speech signal in each sub-band. Because the zero-crossings include the frequency and phase information of original speech signal, the stimulus generated by WZCS contained partial fine structure of original speech signal. Figure 4(c) shows the synthesized speech signals “da jia hao” through the CIS, FAME and WZCS strategies; they were obtained by summation of each sub-band in the different strategies. Figure 5 shows the spectra of original signal and spectra of synthesized signals for the CIS, FAME and WZCS strategies. The spectra were obtained through using the integer FFT length 1024 to calculate the power spectral density of original and synthesized speech signals. It is obvious that the synthesized stimulus based on WZCS was more natural and closer to the original speech signal than that of CIS. The main difference between the three synthesized stimuli is that the frequency component, as shown in the figure, is diverse. The dash-dotted curve in Figure 5 is the spectrum of synthesized stimulus for WZCS, which inosculates the spectrum of the original signal (solid curve) very well with main frequency component. On the contrary, the frequency components of stimulus for CIS (dashed curve) and FAME (dotted curve) are widely different from those of the original signal. The results of experiment were analyzed using SPSS. Paired t-test between WZCS and CIS under different conditions with 3 kinds of test materials was carried out. Similarly analysis was taken between WZCS and FAME to test this new strategy. Table 1(a) demonstrates that the WZCS produced significantly better performance both in quiet and in noisy environment than CIS strategy (p < 0.01) for recognition of sentence, words and tones, with the largest improvement being about 11 percentage points at sentence and word recognition in quiet, and the largest WZCS advantage was about 31 percentage points at tone recognition in quiet, as shown in Figure 6(a). Figure 6(b) illustrates that in noisy environment, normal hearing subjects achieved at least 35 percentage points higher with WZCS than CIS strategy in test material recognition. Table 1(b), Figure 6(a) and Figure 6(b) show that normal-hearing subjects again achieved high recognition rate with WZCS and FAME strategy on test materials. They also produced significantly better performance with WZCS than FAME on tone recognition both in quiet and noisy environment (p = 0.019 in quiet and p = 0.004 in noise). In quiet and noisy environment, normal-hearing subjects produced effect similar between WZCS and FAME on sentence and word recognition (with p > 0.05). These results indicate that while current speech coding strategies can help cochlear implant users recognize what is said in quiet, they may get in trouble with perception of tones and what is said in noise. 4. Discussion The present study has offered strong evidence for the corresponding contribution of temporal envelope and phase Speech Processing Strategy for Cochlear Implant (a) (b) (c) Figure 4. (a) Stimulus of 8 channels for CIS, FAME and WZCS, (b) zoom-in details of stimulus processed by CIS, FAME and WZCS and (c) original speech signal and synthesized speech signals through CIS, FAME and WZCS. 339 J. Med. Biol. Eng., Vol. 30. No. 5 2010 340 Figure 5. Spectrum of original speech signal and synthesized speech signals. (a) 100 (b) 80 90 70 80 60 Recognition Rate(%) 70 60 50 50 40 40 CIS 30 CIS 30 FAME WZCS 20 Sentence Tone Words FAME WZCS 20 Sentence Tone Words Figure 6. (a) Recognition of different test materials with different strategies in quiet. (b) Recognition of different test materials with different strategies in noise. Table1. Results of paired t-test between WZCS and CIS as well as WZCS and FAME in quiet and noise. SNR Quiet 5 dB SNR Quiet 5 dB (a) Paired t-test between WZCS and CIS (p value) Sentence Words Tone 0.06 0.093 0.019 0.447 0.109 0.004 (b) Paired t-test between WZCS and FAME (p value) Sentence Words Tone <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 information. Heretofore, some speech coding strategies have definitely presented temporal envelope information to the cochlear implant users, whereas most or all phase information or frequency modulations were discarded. The analog coding strategies (CA or SAS), which deliver the compressed analogue outputs of sub-bands directly, allow some degree of phase of frequency information representation, but their effect is limited because of interactions among frequency bands and electrodes. CIS strategy uses high rate of stimulation to allow higher frequency components of speech through the envelope detector and does present phase or frequency information in this way, but how much the information cochlear implant users could be able to perceive is critical. While the temporal envelope cues from several frequency bands are sufficient to support speech recognition in quiet, the phase cue is critical for speech recognition in noise, particularly reflecting more realistic listening situations. In the FAME strategy, the carrier likely contains different FM cues for the target and masker, allowing the target envelope to form another stream for better sound segregation [5]. Similar with the FAME strategy, in the WZCS strategy, the temporal envelope was used to amplitude modulate the phase information instead of a common and fixed Speech Processing Strategy for Cochlear Implant carrier, preserving the listener‟s ability to separate target speech signal from noise. The proposed algorithm is based mostly on pioneering work on information in the zero-crossings of band-pass signals by Logan [24]. Some recent studies on fine structure and frequency modulation representation were also noted [13-14]. Based on the Hilbert transform, FAME and AIS, two new popular algorithms to obtain temporal envelope and frequency modulation or fine structure components from the original speech signal were developed [9,15]. In such a case, much more frequency bands should be divided for these two strategies to guarantee that the sub-band is a narrow band; otherwise, the temporal envelope derived from the Hilbert-transformed signal is not equal to that of the corresponding sub-band, theoretically. WZCS strategy adopts eight one-octave wavelet filters to realize the function of band-pass filters facilitate the procedure of implementation. The experiment results presented herein yield the very first results for the WZCS speech coding strategy. Taking into account the excellent results with significant improvements in the hearing tests using the new strategy, this implementation of a phase modulation strategy maybe could offer a new quality of hearing with cochlear implants. Prior studies have shown that continuously varying the presentation frequency improves speech recognition over constant-frequency strategies such as FAME strategy [13,15]. We hypothesized that the improvement could be achieved with a discrete number of presentations of temporal envelope and phase information. Our hearing test experimental results support this hypothesis, so the strategy was proved to be feasible. As the new strategy WZCS is difficult to apply directly to cochlear implants because the derived phase information generally varies too widely in range and too rapidly in rate. These could be the limitations for cochlear implant users to perceive it. While this strategy is very encouraging, there is still a great deal to be learned about electrical stimulation of the auditory nerve and many questions to be answered. To apply this presented strategy to cochlear implant, more further study is needs to be taken into account. 5. Conclusions Cochlear implant users have shown widely varying results due to many reasons, such as the history of their deafness, the procedure of their implant surgery, the speech coding strategies and so on. The success of cochlear implants owes to the improvement of speech coding strategies developed during the past decades. In this study, we have proposed the one-octave wavelet zero-crossings stimulation strategy and discussed this approach of synthesis stimulation pulsatile series, particularly with regard to computer simulation and hearing test experiment. The temporal dynamic characteristic of audio signal could be completely reconstructed at the zero-crossings of audio signal. Through computer simulation and hearing test experiment, we found that although present strategies with amplitude modulation may be sufficient for speech perception in quiet, they may not 341 work well in noise; WZCS strategy with zero-crossings modulation (phase information) obviously enhanced speech recognition in noise and tone perception. Though the existing strategies based on amplitude modulation and filter bank could provide better speech perception in quiet, results from previous and recent studies reveal that the utility of these amplitude and spectral cues are seriously limited to ideal listening conditions [13-15]. The WZCS strategy, generating stimulating pulsatile series at zero-crossings in the domain of one-octave wavelet transform, could solve the problem of channel interaction and noise effect. With varying stimulating rates determined by zero-crossings (phase information) of audio signal, the WZCS could preserve phase information and some fine structure remained in the stimulus. This characteristic is not provided with some other conventional strategies like CIS, SPEAK and so on. In conclusion, WZCS strategy encodes zero-crossings modulation and amplitude modulation extraction analysis, which highlights the limitation of current speech coding strategies in cochlear implants and the essentiality of encoding phase information or zero-crossings to improve speech recognition in noise and tonal language speech perception in realistic listening environments. Acknowledgements This work was supported financially by the National Natural Science Funds. We thank all the people who participated in the research work. We also appreciate the helpful comments made by several anonymous reviewers on a previous version of this manuscript. References F. G. Zeng, “Cochlear implants in China,” Audiology. 34: 61-75, 1995. [2] N. Waldo, B. Andreas, L. Thomas and E. Bernd, “A psychoacoustic “N of M”-type speech coding strategy for cochlear implants,” EURASIP J. Appl. Signal Processing, 18: 3044-3059, 2005. [3] B. S. Wilson, D. T. Lawson, M. Zerbi, C. C. Finley and R. D. Wolford, “New processing strategies in cochlear implantation,” Am. J. Otol., 16: 669-675, 1995. [4] B. S. Wilson, C. C. Finley, D. T. Lawson, R. D. Wolford, D. K. Eddington and M. R. William, “Better speech recognition with cochlear implants,” Nature, 352: 236-238, 1991. [5] K. B. Nie, S. Ginger and F. G. Zeng, “Encoding frequency modulation to improve cochlear implant performance in noise,” IEEE Trans. Biomed. Eng., 52: 64-73, 2005. [6] B. S. Wilson, C. C. Finley, D. T. Lawson, R. D. Wolford and M. Zerbi, “Design and evaluation of continuous interleaved sampling (CIS) processing strategy for multi-channel cochlear implants,” J. Rehabil. Res. Dev., 30: 110-116, 1993. [7] F. G. Zeng, “Temporal pitch in electric hearing,” Hear. Res., 174: 101-106, 2002. [8] K. H. Kim, S. J. Choi, J. H. Kim and D. H. Kim, “An improved speech processing strategy for cochlear implants based on an active nonlinear filterbank model of the biological cochlea,” IEEE Trans. Biomed. Eng., 56: 828-836, 2009. [9] J. J. Sit, A. M. Simonson, A. J. Oxenham, M. A. Faltys and R. Sarpeshkar, “A low-power asynchronous interleaved sampling algorithm for cochlear implants that encodes envelope and phase information,” IEEE Trans. Biomed. Eng., 54: 138-149, 2007. [10] C. M. Zierhofer, I. J. Hochmair and E. S. Hochmair, “Electronic [1] 342 [11] [12] [13] [14] [15] [16] [17] J. Med. Biol. Eng., Vol. 30. No. 5 2010 design of a cochlear implant for multi-channel high rate pulsatile stimulation strategies,” IEEE Trans. Rehabil. Eng., 3: 112-116, 1995. H. J. McDermott, A. E. Vandali, R. J. M. Van Hoesel, C. M. McKay, J. M. Harrison and L. T. Cohen, “A portable programmable digital sound processor for cochlear implant research,” IEEE Trans. Rehabil. Eng., 1: 94-100, 2002. H. J. McDermott, C. M. McKay and A. E. Vandali, “A new portable sound processor for the University of Melbourne Nucleus Limited multi-electrode cochlear implant,” J. Acoust. Soc. Am., 91: 3367-3371, 1992. K. B. Nie, B. Amy and F. G. Zeng, “Spectral and temporal cues in cochlear implant speech perception,” Ear Hear., 27: 208-217, 2006. X. Luo, Q. J. Fu, C. G. Wei and K. L. Cao, “Speech recognition and temporal amplitude modulation processing by Mandarin-speaking cochlear implant users,” Ear Hear., 29: 957-970, 2008. F. G. Zeng, K. B. Nie and S. S Ginger, Y. Y. Kong, V. Michael, B. Ashish , C. G. Wei and K. L. Cao, “Speech recognition with amplitude and frequency modulations,” Proc. Natl. Acad. Sci. USA, 102: 2293-2298, 2005. C. A. Miller, N. Hu, F. Zhang, B. K. Robinson and P. J. Abbas, “Changes across time in the temporal responses of auditory nerve fibers stimulated by electric pulse trains,” JARO, 9: 122-137, 2008. C. M. John, “Auditory cortex phase locking to amplitude-modulated cochlear implant pulse trains,” J. Neurophysiol., 100: 76-91, 2008. [18] K. Wang and S. A. Shamma, “Auditory analysis of spectro-temporal information in acoustic signal,” IEEE Eng. Med. Biol. Mag., 14: 186-194, 1995. [19] P. J. Blamey, R. C. Dowell and G. M. Clark, “Acoustic parameters measured by a formant-estimating speech processor for a multiple-channel cochlear implant,” J. Acoust. Soc. Am., 82: 38-47, 1987. [20] F. G. Zeng, S. Rebscher, W. Harrison, X. A. Sun and H. H. Feng, “Cochlear implants: system design, integration and evaluation,” IEEE Rev. Biomed. Eng., 1: 115-142, 2008. [21] C. N. Jolly, F. A. Spelman and B. M. Clopton, “Quadrupolar stimulation for cochlear prostheses: modeling and experimental data,” IEEE Trans. Biomed. Eng., 43: 857-865, 1996. [22] D. Marr (Ed.), Vision, New York: W. H. Freeman and Company, 1982. [23] H. Q. Wang, Q. Y. Zhang and J. B. Xue, “Research of speech de-noising method based on multi-resolution of wavelet transform,” Comput. Eng. Des., 27: 235-237, 2006. [24] B. Logan, “Information in the zero-crossings of band pass signals,” Bell Syst. Tech. J., 56: 487-510, 1977. [25] G. E. Loeb, C. L. Byers, S. J. Rebscher, D. E. Casey, M. M. Fong, R. A. Schindler, R. F. Gray and M. M. Merzenich, “Design and fabrication of experimental cochlear prosthesis,” Med. Biol. Eng. Comput., 21: 241-254, 1983. [26] L. Xu and E. P. Bryan, “Spectral and temporal cues for speech recognition: implications for auditory prostheses,” Hear. Res., 242: 132-140, 2008.