A comparison of sample size calculation algorithms
Transcription
A comparison of sample size calculation algorithms
A comparison of sample size calculation algorithms Hyun-Tae Kim Technology Center for Nuclear Control Korea Atomic Energy Research Institute Taejon, Korea Abstract When a sample is taken without replacement from a finite population which is suspected to have defects, the probability that sample will have defects or not is described by the hypergeometric density function. Usually a hypergeometric density function is approximated by two binomial density functions depending on the approximation condition, and which one satisfies approximation condition is not known always before sample size calculation. Therefore simultaneous application of two binomial density functions is often required. This paper compares three kinds of binomial approximation and a hypergeometric algorithm when applied to sample size calculation for various values of q, the over-all classification probability of classifying a defect as a defect when measured with up to three verification methods of the International Atomic Energy Agency (IAEA). The first approximation is the simply applied standard binomial approximation which is currently used by the IAEA. The second one is the correctly applied standard binomial approximation with simultaneous application of two binomial density functions. The third one is the improved binomial approximation developed by Mr. J. L. Jaech. standard binomial approximation is somewhat simply applied, therefore here called as simply applied standard binomial approximation (SB). On the other hand simultaneous application of two binomial density function gives more accurate sample size calculation than SB, therefore called as correctly applied standard binomial approximation (CB). Improved binomial approximation (IB), developed by Mr. J.L. Jaech, gives more accurate sample size calculation than CB. It is the purpose of this paper to present a comparison of sample sizes among these approximations8 with respect to hypergeometric algorithm7 (HY) for various values of q. Hypergeometric density function The exact probability that a randomly selected sample of n items containing d defects without replacement is given by the following hypergeometric density function: h(N, D, n, d) = D N − D d n − d N n (1) Standard binomial approximation1 The standard binomial approximation to h(N,D,n,d) is as follows: Introduction MC&A (Material Control and Accountancy) is one of essential parts of the IAEA’s conventional safeguards and strengthened safeguards. To verify that there is no diversion of nuclear material, the IAEA sampling plan specifies the number of items in a given stratum to be randomly selected and then measured by means of up to three verification methods.2,3 Here stratum usually means a grouping of items/batches having similar physical and chemical characteristics (e.g. volume, weight, isotopic composition, location). To determine the size of sample, it is necessary to give the non-detection probability β and the over-all classification probability q that a defect is properly classified as a defect given that it has been measured. The functional form for β involves summing a large number of terms, formula (5), each term containing as one of the factors a probability calculated by the hypergeometric density function, where N is the number of items in the population and d is the number of defects in the sample. The calculation of the non-detection probability is greatly simplified upon approximating the hypergeometric density function with a binomial density function. In the IAEA sampling plan, when N > 50, f = n/N 0.10, D n D d D−d h(N, D, n, d) ≈ b(D, f, d) = f (1 − f) d when N > 50, p = D/N 0.10, n D n d n −d h(N, D, n, d) ≈ b(n, p, d) = p (1 − p) d (2) (3) Improved binomial approximation4,5 The improved binomial density function, with some assumption, takes the form of the formula (3), with p and n replaced by p1 and n1 respectively, found by equating the first two moments of the hypergeometric density function to the first two moments of the binomial density function respectively and solving for p1 and n1. p1 and n1 are as follows: p1 = 1 − (N − n)(N − D) / (N(N − 1)) n1 = n ⋅ D / (N ⋅ p 1 ) (4) Non-detection probability Let q be the probability that a defect is properly classified as a defect given that it has been measured. Since the IAEA use up to three verification method in the inspection activities, q is the weighted average of q1, q2 and q3. Here q1 is the probability of classifying a defect as a defect when measured with a verification method 1. q2 and q3 are similarly defined. It is quite natural to assume that q < 1. Since the values of q2 from the 13th and the 19th columns of the reference 9 are 91% and 85%, and the values of q3 are 86% and 83%, q = 100%, 99%, 98%, 97%, 96%, 95%, 94%, and 93% were used in this paper. The non-detection probability is given as follows: when N > 50, p = D/N 0.10, n D ln( β ) (10) n= D ln (1 − q ) N Usually it is not known in advance whether D n or n D, therefore it is necessary to compare D with n after applying formula (9) or (10). If the assumed inequality is not met, the other formula is used to obtain the sample size. Improved binomial approximation (IB)4 With some assumption the improved binomial distribution is approximated as a statistical distribution. Its form, the formula (3) is used for sample size calculation with excellent result. Hypergeometric density function Min(n,D) d β= h( N , D, n, d ) (1 − q ) ∑ d=0 (5) Here Min(n,D) is the minimum value of n and D. Standard binomial approximation when N > 50, f = n/N 0.10, D n n D D β = (1 − q ) = (1 − f ⋅ q ) N (6) when N > 50, p = D/N 0.10, n D D n n β = (1 − q ) = (1 − p ⋅ q ) N (7) n1 = ln( β ) ln(1 − p1 ⋅ q) Since there is a symmetry of n and D in the formula (4), the formula (11) can be used irrespective of D n or n D, desirable phenomenon. Simply applied standard binomial approximation (SB)3,6 To counteract the possible diversion scenarios, up to three verification methods are used by the IAEA. The formula (12) is used by the IAEA for the calculation of the sample size for the verification methods 1, 2 and 3. The formula (10) is used for calculation of the sample size of the verification methods 2 and 3. 1 n = N (1 − β D ) Improved binomial approximation (11) (12) Comparison of sample size calculation algorithms n β = (1 − p 1 ⋅ q ) 1 (8) Sample size Estimation of the value of q is the most important one. q values, from 100% down to 93%, are located at the upper-left corner of the sub-tables of Table 1. Table 1 shows sample sizes calculated by aforementioned three binomial approximations and a hypergeometric algorithm. Given , sample sizes are calculated by the formulas (5) through (8). The 1st column is the non-detection probability with values 5%, 10%, 50% and 80%. Correctly applied standard binomial approximation (CB) The 2nd column N is the number of items in the stratum in inspection. The formulas (6), (7) and (8) are approximate expressions of the formula (5). when N > 50, f = n/N 0.10, D n 1 N n = (1 − β D ) q (9) The 3rd column x is the average weight of item in the stratum in inspection with the same unit as the goal amount M. Values 1.0 and 0.4 were used with the same unit as M in the 4th column. The 4th column M is the goal amount (generally, 1 significant quantity) with values 8 and 75. The 5th column D1 is [M/γx], the rounded-up number of defects with defect fraction γ1 = 1.0. The 6th column n has four sub-columns SB, CB, IB and HY. SB was calculated with the formula (12). CB was calculated with the formulas (9) and (10). IB was calculated with an iterative algorithm for the formulas (11). HY was calculated with an iterative algorithm for the formula (5). The 7th column diff has three sub-columns S-, C- and I-. Here S- is the value of the first sub-column SB of n minus the fourth sub-column HY of n. C- is the value of the second sub-column CB of n minus the fourth sub-column HY of n. I- is the value of the third sub-column CB of n minus the fourth sub-column HY of n. The sub-column shows nonnegative values when q = 100%, 99%, and 98%, but shows negative values when q is equal to and less than 97%. Therefore SB is not an conservative approximation to HY when q is equal to and less than 97%. Since the values of the sub-column S- are greater than those of C- and I-, SB is a poor approximation to HY compared to CB and IB. The values of the sub-column C- show no negative values for all the values of q used in the Table 1. Therefore CB is a conservative approximation to HY. Also the values of the sub-column IB show no negative values for all the values of q used in the Table 1. Therefore CB is also a conservative approximation to HY. Since the values of CB is always equal to or greater than IB. IB is a more better approximation than CB to HY. The 8th column r_diff has three sub-columns S-, C- and I-. Here S- , C- and I are defined as follows: S- = SB - HY HY C- = CB - HY HY I- = IB - HY HY (13) (14) (15) Table 1 was calculated the Microsoft Excel using the Visual Basic for Application. The columns diff and r_diff of the sub-tables of Table 1 are summarized in Table 2. The subcolumn SB of n of the sub-table q = 100% of Table 1 is very close and conservative to the sub-column HY of n of the subtable q = 98% of Table 1. Therefore if we can assume that q = 98% then SB is a good approximation to HY. But it is more desirable to use CB, IB, or HY to get statistically accurate sample size. CB and IB can be easily implemented in the pocket calculator. Although calculation of HY requires many more steps than IB and CB, with currently used powerful personal computers we feel no calculation speed difference among CB, IB, and HY. With personal computer, 16-bit or 32-bit, we can use HY directly for safeguards inspection activities. Table 2. Relative degree of approximation to HY Table 1 SB CB IB q= conservative conservative conservative 100% poor good very good conservative conservative conservative q = 99% poor good very good conservative conservative conservative q = 98% poor good very good not conservative conservative conservative q = 97% poor good very good not conservative conservative conservative q = 96% poor good very good not conservative conservative conservative q = 95% poor good very good not conservative conservative conservative q = 94% poor good very good not conservative conservative conservative q = 93% poor good very good Conclusion From Table 1 and 2, SB (simply applied binomial approximation), an approximation algorithm used by the IAEA, is not a conservative approximation to HY (hypergeometric algorithm) when q is equal or less than 97%, but CB (correctly applied standard binomial approximation) and IB (improved binomial approximation) are conservative approximation algorithms to HY. IB is a more better approximation to HY than CB. Although IB is a more better approximation to HY than CB, an iterative algorithm is required in the calculation of IB. Furthermore the improved binomial distribution is approximated as a statistical distribution. With currently used powerful personal computers, we feel no calculation speed difference among CB, IB, and HY. Since SB can be thought as a poor approximation to HY, it is recommended to use CB, IB, or HY. To apply these methods further investigation of the estimation of the value of q is required. References 1. V. K. Rohatgi, Statistical Inference, New York: John Wiley & Sons, 1984, pp. 341-342 2. International Atomic Energy Agency, IAEA Safeguards Statistical Concepts and Technique, Vienna, IAEA/SG/SGT/4, IAEA, 1989 3. J. L. Jaech and M. Russell, Algorithm to Calculate Sample Sizes for Inspection Sampling Plans, IAEA STR-261 Rev. 1, 1991 4. J. L. Jaech, “An improved binomial approximation to the hypergeometric density function”, Journal of Nuclear Material Management: 36-41 (January 1994). 5. W.D. Sellinschegg, “Statistical Analysis employed in IAEA Safeguards”, International Nuclear Safeguards 1994: Vision for the Future Vol. 1, IAEA-SM-333/224, IAEA (July 1994) 6. Mingshih LU, “Detection probabilities for random inspections in variable flow situations”, International Nuclear Safeguards 1994: Vision for the Future Vol. 1, IAEA-SM333/124, IAEA (July 1994) 7. Hyun-Tae Kim, et al., “A Study on the application of hypergeometric distribution to the IAEA inspection sample size allocation algorithm(Korean)”, Proceedings of the Korean Nuclear Society Spring Meeting: 1093-1098, Ulsan, Korea (May 1995) 8. Hyun-Tae Kim, “A Comparison between IAEA inspection sample size allocation algorithms(Korean)”, Proceedings of the Korean Nuclear Society Autumn Meeting: 1029-1034, Seoul, Korea (October 1995) 9. Hyun-Tae Kim, “A Comparison of sample size allocation between simply applied standard binomial approximation, correctly applied standard binomial approximation, and improved binomial approximation” Proceedings of the 37th Annual Meeting of the Institute of Nuclear Materials Management: 113-118, Naples, FL, U.S.A. (July 1996) -------------------------------------------------------------------------Mr. Hyun-Tae Kim is a principal researcher working for the Technology Center for Nuclear Control (TCNC) and is the Secretary of the INMM Korea Chapter. He is in charge of the safeguards software development at the TCNC. He had received an MBA from ChungNam National University, Korea. His fields of interest are safeguards information processing and fuzzy information processing. Address: Technology Center for Nuclear Control Korea Atomic Energy Research Institute P.O.Box 105, Yusung Taejon, Korea Telephone: +82-42-868-8939 FAX: +82-42-861-8819 Internet e-mail: htkim@nanum.kaeri.re.kr