Short Update on Computing Complex Symmetric Eigenvalue
Transcription
Short Update on Computing Complex Symmetric Eigenvalue
Short Update on Computing Complex Symmetric Eigenvalue Problem Derived from KKM Reaction Theory -A Simplified Interface for Parallel Processing G. Arbanas, C. Bertulani, A. Kerman, K. Roche , K. Ushkala First order effort: map the complex problem to numerically real problem Err = Cx-cx N double precision single precision double precision single precision max backward max backward max backward max backward error: error: error: error: direct complex direct real direct complex direct real factorization - factorization factorization - factorization symmetric symmetric random random random random complex matrix complex matrix complex matrix complex matrix 64 7.35E-14 9.50E-07 2.52E-14 3.25E-07 128 5.22E-14 5.19E-07 8.41E-14 1.36E-06 256 1.15E-13 7.67E-07 1.88E-13 1.03E-06 512 7.13E-13 2.13E-06 5.80E-13 2.60E-06 1024 2.65E-13 1.93E-06 3.75E-13 2.16E-06 In the table, the feasibility of the transformation is checked. Be careful drawing conclusions about the accuracy data. This study fixes the memory demand and reveals the loss in accuracy when going from double to single precision w/ the transformation from complex to real arithmetic. (double)Complex variant : n*n*16 BYTES (double)Real variant : 4 * n * n * 8 BYTES (single)Real variant : 4 * n * n * 4 BYTES Try a Dense Direct Method This method is not generally stable. Block Cyclic Decomposition of Natural Data : kfil_2dbc_rd() ; kstr_2dbc_rd(); : kfil_2dbc_wr() ; kstr_2dbc_wr(); •need a method for generating a large data set •trying hard to create techniques that keep the user out of the decision making process for parallelization •want to leverage existing and useful software ScaLAPACK LAPACK PBLAS BLACS BLAS int csyeig( MPI_Comm commw , int n , double complex * a , double complex * z , double complex * c ) ; void test_cfnc( double complex * a , int m , int n , int seed , void (*gen_fnc) ( double complex * , int , int , int ) ) { (*gen_fnc)( a , m , n , seed ) ; } MPI Tests (more testing needed) • against KKM.f (n=512) • against zgeev() -for general complex systems (n=8192) • self-consistent tests ( |AZ-DZ| ) (n= 65536) • against Toeplitz form (n=32768) csym-mat-gen() n = 32768 PAPI_TOT_INS : Tot[ 111865019536195 ] Rt[ 109248056535 ] PAPI_FP_INS : Tot[ 2199627218944 ] Rt[ 2148089464 ] PAPI_L2_DCM : Tot[ 321977754 ] Rt[ 263514 ] PAPI_real_cyc = 157816348692 PAPI_real_usec = 68615804 PAPI_user_cyc = 157826000000 PAPI_user_usec = 68620000 Example XT5 Run: csyeig() n = 32768 PAPI_TOT_INS : Tot[ 7502114505715511 ] Rt[ 6706228558246 ] PAPI_FP_INS : Tot[ 967948054098509 ] Rt[ 1332067487703 ] PAPI_L2_DCM : Tot[ 3181343933369 ] Rt[ 2894696203 ] PAPI_real_cyc = 6407554186034 PAPI_real_usec = 2785893130 PAPI_user_cyc = 6406765000000PAPI_user_usec = 2785550000 Err|AZ-DZ| n = 32768 PAPI_TOT_INS : Tot[ 523619868353838 ] Rt[ 519582386553 ] PAPI_FP_INS : Tot[ 584968040019504 ] Rt[ 589249121346 ] PAPI_L2_DCM : Tot[ 306773490444 ] Rt[ 287988286 ] PAPI_real_cyc = 219179680380 PAPI_real_usec = 95295514 PAPI_user_cyc = 219121000000 PAPI_user_usec = 95270000 |AZ-ZD{a}|_inf ~ 4.05821e-10 [thy_err=3.68073e-07] real 5h49m11.314s herm-mat-gen() n = 16384 PAPI_TOT_INS : Tot[ 7034091200126 ] Rt[ 27481087715 ] PAPI_FP_INS :Tot[ 137581551616 ] Rt[ 537443960 ] PAPI_L2_DCM : Tot[ 87160426 ] Rt[ 259682 ] PAPI_real_cyc = 39183316211 PAPI_real_usec = 17036224 PAPI_user_cyc = 39192000000 PAPI_user_usec = 17040000 pzheev_()n = 16384 PAPI_TOT_INS : Tot[ 430200433828620 ] Rt[ 1667297054044 ] PAPI_FP_INS :Tot[ 103533019765351 ] Rt[ 428900801349 ] PAPI_L2_DCM : Tot[ 195552827152 ] Rt[ 697091250 ] PAPI_real_cyc = 1598982738786 PAPI_real_usec = 695209893 PAPI_user_cyc = 1598500000000PAPI_user_usec = 695000000 Err|AZ-DZ| n = 16384 PAPI_TOT_INS : Tot[ 65548973697866 ] Rt[ 260164538213 ] PAPI_FP_INS :Tot[ 73124481940711 ] Rt[ 294638569624 ] PAPI_L2_DCM : Tot[ 37741819129 ] Rt[ 144649371 ] PAPI_real_cyc = 109914371587 PAPI_real_usec = 47788858 PAPI_user_cyc = 109848000000 PAPI_user_usec = 47760000 |AZ-ZD{a}|_inf ~ 1.46099e-10 real 12m41.374s user0m0.180s sys 0m0.112s [thy_err=9.22587e-08] Playing with Hermitian Problems too Summary and Plan • parallel complex symmetric diagonalization routine • software that removes the user from the process of parallelizing their dense numerical problem • more testing necessary in the resource allocation / selection process • pkkm.c needs to be tested -right now FILE based version exists and is being tested; incore variant easier • io routines • more numerical testing