Surpassing Human-Level Face Verification Performance on LFW with GaussianFace (Supplementary Material) Chaochao Lu
Transcription
Surpassing Human-Level Face Verification Performance on LFW with GaussianFace (Supplementary Material) Chaochao Lu
Surpassing Human-Level Face Verification Performance on LFW with GaussianFace (Supplementary Material) Chaochao Lu Xiaoou Tang Department of Information Engineering The Chinese University of Hong Kong {lc013, xtang}@ie.cuhk.edu.hk Optimization constant items) by As discussed in the main text, learning the GaussianFace model can amount to minimizing the following marginal likelihood, LM odel = − log p(ZT , θ|XT ) − βM. (1) For the model optimization, we first expand Equation (1) to obtain the following equation (ignoring the constant items) LM odel = − log PT + βPT log PT + S βX PT,i log Pi − PT,i log PT,i , S i=1 (2) where Pi = p(Zi , θ|Xi ) and Pi,j means that its corresponding covariance function is computed on both Xi and Xj . To obtain the optimal θ and Z, we need to optimize Equation (2) with respect to θ and Z, respectively. We first present the derivations of hyper-parameters θ. It is easy to get S β X PT,i ∂Pi · + S i=1 Pi ∂θj S ∂PT,i βX (log Pi − log PT,i − 1) . S i=1 ∂θj The above equation depends on the form (ignoring the constant items) ∂Pi ∂θj ˜ = A(λIn + AKA)−1 A. where A Similarly, the derivations of latent space Z can be also obtained. GaussianFace Inference for Face verification ∂LM odel 1 ∂PT = β(log PT + 1) − ∂θj PT ∂θj + ∂ log p(Xi |Zi , θ) D ∂K ≈ − Tr K−1 ∂θj 2 ∂θj 1 −1 −1 ∂K + Tr K Xi X> K , i 2 ∂θj 1 ∂J ∗ ∂ log p(Zi ) ≈− 2 i ∂θj σ ∂θj 1 > ∂K ∂K ˜ =− Aa a a − a> 2 λσ ∂θj ∂θj ˜ ∂K AKa ˜ ˜ ∂K a , + a> KA − a> KA ∂θj ∂θj ∂ log p(θ) 1 = , ∂θj θj as follows ∂Pi ∂ log Pi =Pi ∂θj ∂θj ∂ log p(X |Z , θ) ∂ log p(Z ) ∂ log p(θ) i i i ≈Pi + + . ∂θj ∂θj ∂θj In the section of GaussianFace Model for Face Verification, when given a test feature vector x∗ we need to estimate it latent representation z∗ . In this paper, we apply the same method in (Urtasun and Darrell 2007) to inference. For convenience a brief review is given here. Given x∗ , z∗ can be obtained by optimizing LInf = D 1 kx∗ − µ(z∗ )k2 + ln σ 2 (z∗ ) + kz∗ k2 , 2σ 2 (z∗ ) 2 2 (3) where µ(z∗ ) = µ + X> K−1 K∗ , −1 σ 2 (z∗ ) = K∗∗ − K> K∗ , ∗K K∗ is the vector with elements kθ (z∗ , zi ) for all other latent points zi in the model, and K∗∗ = kθ (z∗ , z∗ ). The above three terms can be easily obtained (ignoring the Further Validations: Shuffling the Source-Target c 2015, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. To further prove the validity of our model, we also consider to treat Multi-PIE and MORPH respectively as the target-domain dataset and the others as the source-domain 1 0.98 GaussianFace-BC GaussianFace-FE GaussianFace-FE + GaussianFace-BC 0.96 0.94 0.92 0.96 0.94 Accuracy Accuracy GaussianFace-BC GaussianFace-FE GaussianFace-FE + GaussianFace-BC 0.92 0.9 0.9 0.88 0.86 0.84 0.82 0.88 0.8 0.86 0 1 2 The Number of SD 3 4 0.78 (a) 0 1 2 3 The Number of SD 4 (b) Figure 1: (a) The accuracy rate (%) of the GaussianFace model on Multi-PIE. (b) The accuracy rate (%) of the GaussianFace model on MORPH. datasets. The target-domain dataset is split into two mutually exclusive parts: one consisting of 20,000 matched pairs and 20,000 mismatched pairs is used for training, the other is used for test. In the test set, similar to the protocol of LFW, we select 10 mutually exclusive subsets, where each subset consists of 300 matched pairs and 300 mismatched pairs. The experimental results are presented in Figure 1. Each time one dataset is added to the training set, the performance can be improved, even though the types of data are very different in the training set. References Urtasun, R., and Darrell, T. 2007. Discriminative gaussian process latent variable model for classification. In ICML.