ELEC-C5340 Tilakuulon ja binauraalisen tekniikan perusteet

Transcription

ELEC-C5340 Tilakuulon ja binauraalisen tekniikan perusteet
ELEC-C5340
Tilakuulon ja binauraalisen tekniikan perusteet
Ville Pulkki
Department of Signal Processing and Acoustics
Aalto University, Finland
October 12, 2015
Tämä luento
Miten ihminen kuulee äänen tilaominaisuuksia
Pään aiheuttama muutos ääneen (HRTF / HRIR)
Binauraaliset tekniikat kuulokkeille
Introduction
Pulkki
Dept Signal Processing and Acoustics
2/3
October 12, 2015
Where and what?
Localization of sources
Beam-forming towards different directions
squawk
chirp chirp
brr
quack!
toot!
wind sound
swooh
clip
clop
swooh
rustle
Eye vs ear
Eye: lens projects light to retina. Retina has a great number of
light-sensitive cells. Retina cells are thus per se sensitive to direction.
Eye: very high spatial accuracy, limited range of wavelengths (400 800nm)
Ear: cochlea sensitive to vast range of wavelengths (2cm - 20m)
Ear: cochlea has no direct sensitivity to direction of sound
All human directional hearing is based on signal analysis in the brains
This chapter
Basic concepts
Head-related acoustics
Localization cues
Localization accuracy
Perception of direction in presence of echoes
Ability to listen selectively specific direction
Distance perception
Coordinate system
Azimuth-elevation / median plane
Frontal plane
z
Median plane
y
δ
r
ϕ
x
front
Horizontal plane
Coordinate system 2
Cone of confusion
r
δcc
Cone of confusion
ϕcc
Basic concepts
Listening conditions
Monaural hearing (hearing processes and cues that need signal from only
one ear)
BInaural hearing (differences in ear signals have also an effect)
Spatial sound reproduction methods
Monotic (signal fed to only one ear)
Diotic (same signal fed to both ears)
Dichotic (different signal fed to ears)
Dichotic listening with headphones
a)
b)
ph1
ph2
signal
delays
a1
a2
signal
attenuators
Head-related acoustics
Let us consider only free field first
How does incoming sound change as it hits the listener?
What is the transfer function from source to ear canal?
X
Yl
Yleft = HleftX
yleft =h left(t)*x(t)
Head-related acoustics measurements
Transfer function from source to ear canal
Place a microphone to ear canal or use dummy heads
Head-related transfer function, head-related impulse response
Depends heavily on the direction of the source
Head related transfer function (HRTF)
Measured with real subject for three directions
Magnitude response
0dB
-10
-20
-30
-40
10 2
-40
10 4 Hz
0dB
-30
-40
10 2
10 3
10 2
-40
10 4 Hz
10
0
0
-10
-40
10
2
10 3
2
-20
right ear
ϕ = 60
δ =0
-30
10 4 Hz
10 3
b)
left ear
ϕ =0
δ = 60
-30
-10
-20
right ear
ϕ =0
δ =0
-20
left ear
ϕ = 60
δ =0
-30
-10
-20
0
-10
-20
left ear
ϕ =0
δ =0
10 3
a)
0
-10
-40
10
4
10 Hz
right ear
ϕ =0
δ = 60
-30
4
10 Hz
10 3
c)
2
10 3
4
10 Hz
Head related impulse response (HRIR)
HRIR == HRTF in time domain, quite often HRIRs are also called
HRTFs
0.2
0.2
left ear
ϕ = 0o
δ = 0o
0.1
0
left ear
ϕ = 60o
δ = 0o
0.1
0
-0.1
-0.1
-0.2
1
0.2
a)
2
right ear
ϕ = 0o
δ = 0o
0.1
0
-0.1
1
0.2
b)
2
right ear
ϕ = 60o
δ = 0o
0.1
0
1
2
ms
1
0.2
c)
2
ms
right ear
ϕ = 0o
δ = 60o
0.1
0
-0.1
-0.2
0
0
ms
-0.1
-0.2
0
-0.2
0
ms
left ear
ϕ = 0o
δ = 60o
0.1
-0.1
-0.2
0
0.2
-0.2
0
1
2
ms
0
1
2
ms
HRTF in horizontal plane
Azimuth [degrees]
Horizontal plane HRTF left ear [dB]
0
0
31
63
94
124
153
−179
−150
−121
−92
−61
−30
−5
−10
−15
−20
0.1 0.2 0.4 0.8 1.6 3.2 6.4 12.8
Frequency [kHz]
−25
HRIR in horizontal plane
Azimuth [degree]
Horizonal plane HRIR left ear
0
31
63
94
124
153
−179
−150
−121
−92
−61
−30
0.8
0.6
0.4
0.2
0
−0.2
−0.4
0
0.5
1
1.5
Time [ms]
2
HRTF in median plane
Median plane HRTF left ear [dB]
0
Elevation
cc
[degrees]
−45
0
−5
45
−10
90
−15
135
−20
180
225
0.4
0.8
1.6
3.2
6.4
Frequency [kHz]
12.8
−25
HRIR in median plane
Median plane HRIR left ear
0.05
Elevation
cc
[degrees]
−45
0
45
90
0
135
180
225
0
0.5
1
1.5 2 2.5
Time [ms]
3
3.5
−0.05
Localization cues
HRTFs impose some spatial information to ear canal signals
What information do we dig out from those?
Binaural localization cues
Interaural time difference, ITD
Interaural level difference, ILD
Monaural localization cues: spectral cues
Dynamic cues
Interaural time difference (ITD)
Sound arrives a bit earlier to one ear than the other
ITD varies between -0.7ms and +0.7ms
JND of ITD is of order of 20µs
=
D
D
( + sin
2c
Computed from
equation
ϕ
Time difference ITD [μs]
600
ΔS
)
400
Measured
values
200
0
0
30
60
Azimuth angle ϕ [o]
90
Interaural time difference (ITD)
We are sensitive to
phase differences at low frequencies
time differences between envelopes at higher frequencies
low frequencies ~200 − ~1600 Hz high frequencies > ~1600 Hz
time delay btw envelopes
time/phase delay btw carriers
left
right
ITD
ITD
Interaural time difference
HRTFs measured from a human, ITD of noise sound source in free field
computed with an auditory model
0.2
0.4 0.7
1.1 1.7
2.6 3.9
5.7 8.5
12.418.2
1
ITD [ms]
0.5
0
−0.5
−1
−90
−60
−30
0
30
60
Azimuth [degree]
90
0.2
0.4 0.7
1.1
1.7
2.6
3.9
5.7 8.5
12.4
Frequency [kHz]
18.2
Interaural level difference (ILD)
Head shadows incoming sound
Effect depends on
wavelength, no shadowing at low frequencies
distance, larger ILD at very short distances (< 1m)
ILD varies between -20dB and 20dB
JND is of order of 1dB for sources near median plane
Interaural level difference
HRTFs measured from a human, ILD of sound source in free field
computed with an auditory model
0.2
0.7
0.4
1.1
1.7
2.6
3.9
5.7
18.2
8.5 12.4
30
ILD [phon]
20
10
0
−10
−20
−30
−90
−60
−30
0
30
60
Azimuth [degree]
90
0.2
0.4
0.7
1.1
1.7
2.6
3.9
5.7
Frequency [kHz]
8.5
12.4 18.2
Basic lateralization results
Subjects report the position of internalized auditory source on interaural
axis
b)
ph2
delays
a1
left earlier
0
2
6
perceived lateralization
right
2
signal
left later
4
left
perceived lateralization
6
4
6
-1500 -1000 -500
0
interaural delay
500 1000 1500
τph [μs]
(Toole and Sayers, 1965)
attenuators
a2
signal
left
stronger
4
2
left
weaker
right
ph1
0
2
left
a)
4
6
-12
-6
0
6
12
interaural level difference [dB]
(Sayers, 1964)
Addtional cues
Binaural cues (ITD and ILD) in principle resolve only the cone of
confusion where the source lies
Other cues are used to resolve the direction inside the cone
monaural spectral cues
dynamic cues (effect of head movements to cues)
θcc
sound
source
cc
cone of confusion
Monaural spectral cues
Effect of pinna diffraction
Effect of reflections/diffraction from torso
Monaural spectral cues
Subject 012 HRTF [dB]
Subject 017 HRTF [dB]
0
0
−5
45
90
−10
135
−15
180
225
0.8
1.6
3.2
6.4
Frequency [kHz]
12.8
0
−45
Elevation cc [degrees]
Elevation cc [degrees]
−45
0
−5
45
90
−10
135
−15
180
−20
225
0
−45
0.8
Elevation cc [degrees]
0
−5
45
90
−10
135
−15
180
225
0.8
1.6
3.2
6.4
Frequency [kHz]
12.8
−20
Subject 019 HRTF [dB]
−45
12.8
−20
Elevation cc [degrees]
Subject 018 HRTF [dB]
1.6
3.2
6.4
Frequency [kHz]
0
0
−5
45
90
−10
135
−15
180
225
0.8
1.6
3.2
6.4
Frequency [kHz]
12.8
−20
Monaural spectral cues
Source has to have relatively flat spectrum
Relatively reliable cue for most natural sounds
Narrow-band sounds hard to localize (mosquito in dimmed room)
Humans adapt relatively fast to new HRTFs, especially of visual cues
are available
The ears also grow throughout the life, constant adaptation
Dynamic cues
When head rotates, ITD and ILD and spectral cues change
Powerful though coarse cue
Resolves efficiently front/back/inside-the-head locations, but not fine
details
ITD & ILD
constant
ITD & ILD
change a lot
head rotation
ITD & ILD
change a lot
in opposite direction
Interaction between spatial hearing and vision
Both senses try to find out "where" is the source
If visual cue is clear, vision dominates
Ventriloquism, within about 30 area visual image captures auditory
image
If visual cue is blurry, or not prominent, auditory cue wins
In some cases two events occur, visual event and auditory event
Accuracy of localization
How well do we localize point-like sources?
What is the accuracy of directional hearing?
Azimuth / elevation / 3D
Accuracy of perception of spatial distribution of wide sound sources
Binaural techniques
Ear canal signals are the main input to hearing
Why not replicate only them?
Recording/reproduction/synthesis of ear canal signals
Challenges: dynamic cues (head movements), tactile perception
Binaural recording, headphone playback
careful microphone and headphone equalization
binaural cues and auditory spectrum reproduced as were in recording
in some cases this is appealing solution
Applications: personalized recording, academic use, noise
measurements, augmented reality audio
Binaural recording
Challenges
headphone equalization is problematic
listener head movements does change binaural cues inside-head
localization
front-to-back confusions
vision conflicts with audition
works best only with recordings made with your own head
Binaural synthesis, headphones
convolve monophonic sound tracks
with measured [individual] HRTFs
auditory objects can be positioned
in 3D virtual space
inside-head localization, front-back
confusions
need of individual HRTFs
c Bill Gardner
head tracking may be used to
resolve this
virtual reality, gaming, aviation
playback of surround audio content
over multiple virtual loudspeakers
Practical implementation of HRTF syntesis
HRIRs are relatively long FIRs (around 100-150 taps)
Computational constraints
HRIRs have zeros in start: can be replaced with simple delay
How to find the number of zeros in start?
Thresholding
Cross correlation between HRIRs, this gives time delay
How to implement HRIR coming after zeros?
Minimum-phase version of the remaining FIR filter (with fewer taps)
IIR filter with few taps
Introduction
Pulkki
Dept Signal Processing and Acoustics
3/3
October 12, 2015
Head related impulse response (HRIR)
HRIR == HRTF in time domain, quite often HRIRs are also called
HRTFs
0.2
0.2
left ear
ϕ = 0o
δ = 0o
0.1
0
left ear
ϕ = 60o
δ = 0o
0.1
0
-0.1
-0.1
-0.2
1
0.2
a)
2
right ear
ϕ = 0o
δ = 0o
0.1
0
-0.1
1
0.2
b)
2
right ear
ϕ = 60o
δ = 0o
0.1
0
1
2
ms
1
0.2
c)
2
ms
right ear
ϕ = 0o
δ = 60o
0.1
0
-0.1
-0.2
0
0
ms
-0.1
-0.2
0
-0.2
0
ms
left ear
ϕ = 0o
δ = 60o
0.1
-0.1
-0.2
0
0.2
-0.2
0
1
2
ms
0
1
2
ms