Automatic Leukocyte Classification

Transcription

Automatic Leukocyte Classification
P. Suapang and S. Chivaprecha
39
Automatic Leukocyte Classification
Piyamas Suapang
1
and Sorawat Chivaprecha2 ,
ABSTRACT
Numbers of white blood cells in different classes
help doctors to diagnose patients. A technique for
automating the differential count of white blood cell
is presented. The proposed system takes an input,
color image of stained peripheral blood smears. The
process involves segmentation, feature extraction and
classification. The segmentation procedure, a novel
simple algorithm, is proposed for localization of white
blood cells and the different cell components are separated with automatic thresholding. Features extracted from the segmented nucleus are motivated by
the visual cues of shape, color and texture. This research uses the Artificial Neural Network for implemention and uses the different combinations of feature sets. The results presented here are based on
trials conducted with normal cells. For training the
classifiers, a library set of 233 patterns is used. The
tested data consists of 134 samples and produced correct classification rate close to 88.10 %.
Keywords: Leukocyte Segmentation; Leukocyte
Classification; White Blood Cell Count
1. INTRODUCTION
A typical blood microscope image has been digitalized by a CCD and acquired by a frame-grabber
system [1]. The microscope inspection of blood slides
provides important qualitative and quantitative information concerning the presence of hematic pathologies as shown in figure 1. Principal cells present in
the blood are red blood cells, and the white cells
(leucocytes). Leucocyte cells containing granules
are called granulocytes (composed by neutrophil, basophil, eosinophil). Cells without granules are called
agranulocytes (lymphocyte and monocyte). The percentage of leucocytes in human blood typically ranges
between the following values: neutrophils 50-70%.
eosinophils 1-5%. basophils 0-1%. monocytes 2-10%,
lymphocytes 20.45% [2]. These cells provide the major defense against infections in the organism and
their specific concentrations can help specialists to
discriminate the presence or not of very important
families of pathologies (i.e. the presence of mononuManuscript received on June 9, 2015 ; revised on December
12, 2015.
1 Biomedical Engineering Program, Department of Physics,
Rangsit University, Bangkok, Thailand.
Email: piyamas suapang@.yahoo.com
2 Department of Telecommunications Engineering, Engineering Faculty, King Mongkuts Institute of Technology Ladkrabang, Bangkok, Thailand.
Email:
kcsorawa@telecom.kmitl.ac.th
Fig.1:: A typical blood microscope image.
(a)
(b)
(c)
(d)
(e)
Fig.2:: Examples of five major groups of human
leukocytes in peripheral blood: (a) Eosinophil (b)
Basophil (c) Lymphocyte (d) Neutrophil and (e)
Mmonocyte.
cleosis, hepatitis diabetes, allergy, arthritis, anaemia,
and many others).
Some typical examples of these types are shown in
figure 2, where neutrophils has small granules in cytoplasm and only one nucleus, with a variable number
of lobes; eosinophils has bilobed nucleus and coarse
cytoplasmic granules; basophils include many cytoplasmic granules over the nucleus; monocytes has a
kidney-shaped nucleus and slightly basophilic in the
cytoplasm and lymphocytes has round nucleus and is
devoid of specific granules. Table I summarizes the
characteristic features of these cells and their relative
size and number in normal blood.
From decades this operation is performed by experienced operators, which basically perform two main
analyses. The first is the qualitative study of the
morphology of the cells and it gives information of degenerative and tumoral pathologies such as leukemia.
The second approach is quantitative and it consists of
differential counting the white blood’s cells. Unfortunately, the accuracy of cell classification and counting
is strongly affected by individual operator’s capabilities. In particular, the identification and the differential count of blood’s cells is a time-consuming and
repetitive task that can be influenced by the operator’s accuracy and tiredness. The automated classification of the peripheral white blood cells (leukocytes)
has been the subject of this study. The peripheral
blood leukocytes provide a very interesting and chal-
40
INTERNATIONAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING
VOL.8, NO.1 2015
Table 1:: The features of Leukocytes
Granulocytes
Features
Diameter
Nucleus
Neutrophil
Eosinophil
Basophil
Monocyte
Lymphocyte
12-15 m
12-15 m
12-15 m
12-20 m
6-18 m
Kidney shaped
Round
U-shaped, S-shaped,
Segmented, bilobed
2-5 segmented
Granules
Agranulocytes
Azurophilic granules; ,
Poorly shown,
S-shaped
Eosinophilic granules
Specific granules
Basophilc granules,
Basophilic
of different sizes
bluish-gray
Scanty, light blue
works can be used to classify the white blood cells in
these regions of interest and their relative accuracies
can be compared.
2. METHODOLOGY
2. 1 Segmentation
Fig.3:: The structure of modules composing the
leukocytes classification system.
lenging medium for the study for biological image
processing and classification. The taxonomy of the
various blood cells has had a classic history in histology [3], and a qualitative, verbally descriptive taxonomy has been established in the hematological literature [4]. One of the rapidly emerging areas of pattern
recognition has been medical picture processing and
image classification [5], [6]. Radioisotope scanning
[7], [8], breast cancer diagnosis [9], the classification of
Papanicolaou smears [10], chromosome analysis [11],
[12], and the classification of white blood cells [1316], have all proven to some extent to be amenable to
pattern recognition techniques.
The goal of this research is to determine whether
all steps in a fully automated system for the classification of white blood cells from microscopic images can be realized using image processing and supervised learning techniques. As a component of
this, we will attempt to determine which of a number of investigated classification techniques provides
the best automated classifier for the classification of
white blood cells into their five major types (Neutrophils, Lymphocytes, Monocytes, Eosinophils and
Basophils), based on a limited data set of visual images. This paper focuses on the automated detection
and classification (the haematologists’ term for this
process is differentiation) of white blood cells from
color images captured from a microscope. Figure 3
shows useful regions of interest, each centered on a
white blood cell, can be automatically extracted from
color microscopic images. Secondly, image processing
techniques can be used to automatically extract a feature set from these regions of interest that is useful
for the classification of white blood cells. Finally, supervised classification techniques such as neural net-
First, the captured image file was split into its
three component bands (red green and blue as shown
in figure 4). The result was three grayscale files one
for each of the red green and blue components of the
image captured by the camera. Histogram analysis
was used to examine three grayscale components (corresponding to the red, green and blue bands) of 30
images covering all five basic white blood cell types.
It was found that the green component was consistently a better discriminator between the purple nuclear material and the rest of the image.
We did utilize thresholding to produce a binary
bitmap image from the green band bitmap for each
image. An automatic threshold value was selected
by minimum value to discriminate between nuclear
and nonnuclear pixels, as every white blood cell has
a nucleus. We used only the green band for this nuclear thresholding, as, in this band, the nuclear material is much darker than either the cytoplasm (with
the exception of basophilic cytoplasm) or the background. Experimentation showed that an automatic
threshold value of 95 (on a scale of 0-256) gave an
acceptable discrimination between nuclear and nonnuclear pixels, leaving nuclear pixels black on a white
background as shown in figure 5.
Due to edge effects on the captured images (the
camera produced noticeable dark bands at the very
edges of the images, as seen in figure 4 and figure
5), there were many nonnuclear pixels around the
perimeter of the image. An edge erosion filter, which
simply set all of the pixels within 3 pixels of the edge
of the image to white, was developed and used. This
Fig.4:: (a) Raw captured image and (b, c, d) 3 single
color components.
P. Suapang and S. Chivaprecha
41
Fig.7:: Extracted image of white blood cell.
Fig.8:: Finding circle.
Fig.5:: Green component image and thresholded
bitmap.
Fig.6:: The results of (a) thresholding and (b) erosion.
some headroom. The rectangular aspect ratio was
later found to be unnecessary, a square ROI would
have been simpler. This process produced a series of
smaller images like the one in figure 7.
Most white blood cells are roughly circular in
shape, though some (monocytes, in particular) may
deviate significantly from the circular). For the purposes of obtaining a good feature set, representative
of the features of the cell only, we chose to try to find
the largest circular area entirely within the cell. This
process produced finding circle as shown in figure 8.
2. 2 Feature Extraction
removed the dark bands around the edge of the image
as shown in figure 6.
As a first step towards identifying a region of interest, an algorithm was developed to identify blobs
(continuous connected groups of blackpresumably nuclearpixels) within a bitmap file and print out their
details, including number of pixels in the blob and
centroid (arithmetic mean of the x and y position
values) of the blob. The x value of the centroid was
found by summing the x values of all pixels in a blob
and dividing the sum by the number of points in the
blob. The y value of the centroid was found in a similar fashion, summing the y values of all pixels in the
blob and dividing by the number of pixels. The (x,
y) location represented by this centroid was used as
the centre of the blob of nuclear material. The centre
(cx, cy) and number of points were recorded for each
blob found.
A Region of interest (ROI) was defined around the
centre (cx, cy) of each blob. This region of interest was defined to be a rectangle 110 by 110 pixels in
size. The size of the ROI was originally chosen to preserve the aspect ratio of the camera’s image, be large
enough to accommodate the largest leukocyte with
The segmented circular cell region was processed
on a pixel by pixel basis and statistical information
(mean, standard deviation, maximum value and minimum value) was collected for each of 5 color bands.
The color bands were the red, green and blue captured by the hardware (camera + composite video
capture card) together with the color ratios green/red
and green/blue. The number of pixels within the circle was also determined. The red, blue and green
band pixel values were just the 8 bit (0-255) values
captured by Matrox Morphis (MOR/2VD/84*) capture card. The green/blue color ratio for a given pixel
was determined by dividing the green band value by
the blue band value. The green/red value was similarly obtained by dividing the red band value by the
blue band value.
2. 3 Neural Network Architecture for Pattern
Classification
The neural networks are processing structures
”consisting of many interconnecting processing elements (neurons)”. These artificial neurons are connected together to form neural networks. An ex-
42
INTERNATIONAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING
VOL.8, NO.1 2015
Table 2:: The features of Leukocytes.
Number of pixels in circle
Number of circle edge pixels
Number of nuclear pixels
X̄
S.D.
X̄
S.D.
X̄
S.D.
Neu
6808.31
1362.10
290.58
29.17
2426.45
533.36
Eosi
12607.39
3747.76
393.17
60.45
4546.22
1388.94
Baso
14357.71
4464.72
420.29
62.47
5700.71
925.84
Lym
5084.51
942.12
251.25
23.19
2481.21
463.56
Mono
10430.25
5068.42
355.58
65.98
3582.04
603.78
Number of nuclear edge pixels
Number of nuclear pixels/
pixels in circle
Number of circle edge pixels
Number of nuclear edge pixels/
Number of
X̄
S.D.
X̄
S.D.
X̄
S.D.
Neu
655.02
276.75
0.36
0.04
2.24
0.86
Eosi
1018.26
400.98
0.36
0.06
2.56
0.86
Baso
982.14
535.10
0.41
0.06
2.24
0.90
Lym
473.26
378.39
0.49
0.02
1.85
1.33
Mono
798.72
276.92
0.370.07
0.07
2.28
0.81
(Number of nuclear edge pixels/
Number of nuclear edge pixels/
Number of circle edge pixels/
Number of nuclear pixels)/
Number of nuclear pixels
Number of pixels in circle
(Number of circle edge pixels/
Number of pixels in circle)
X̄
S.D.
X̄
S.D.
X̄
S.D.
Neu
0.27
0.11
0.04
0.001
6.33
2.52
Eosi
0.24
0.10
0.03
0.010
7.39
3.34
Baso
0.17
0.07
0.03
0.001
5.85
3.39
Lym
0.19
0.13
0.05
0.001
3.83
2.85
Mono
0.23
0.10
0.04
0.001
6.52
2.79
Red pixel value
X̄
Green pixel value
S.D.
X̄
Blue pixel value
S.D.
X̄
S.D.
Neu
12188791.40
82769.36
12359095.73
46540.65
12162288.74
81247.04
Eosi
11800272.17
234635.82
12124605.65
134409.78
11834367.09
197198.01
Baso
11721998.29
103397.88
12110284.43
72337.32
11657510.29
127915.85
Lym
12181788.78
88758.59
12354375.56
53731.40
12138643.53
82304.77
Mono
11913040.83
598702.58
12153595.91
596521.49
11874442.01
601881.29
Green pixel value /
Green pixel value /
Blue pixel value
Nuclear Texture
Red pixel value
X̄
S.D.
X̄
S.D.
X̄
S.D.
Neu
0.41
0.06
0.45
0.05
12.98185884
2.991436519
Eosi
0.55
0.05
0.53
0.03
23.86688696
5.068758689
Baso
0.46
0.04
0.49
0.04
22.76578014
2.466787328
Lym
0.39
0.07
0.45
0.07
13.22287196
3.682092695
Mono
0.46
0.06
0.50
0.06
17.98124036
3.637544634
tremely simple example of such a network is shown
in figure 9. In this example, a number of inputs are
each connected to each of a number of neurons in an
intermediate layer. The neurons in the hidden layer
are each connected to all the output neurons (one in
this case). This is an example of a fully connected
feed forward neural network. Feed forward networks
of this type can be trained by back propagation. This
is a procedure that trains the network by making
small adjustments to the weights of each neuron in
P. Suapang and S. Chivaprecha
43
Table 3:: The results of testing by Leukocytes features set.
Leukocytes
Correct
Samples
Incorrect
Numbers
%
Numbers
%
Neutrophils
39
35
89.74
4
10.26
Eosinophils
30
23
76.67
7
23.33
Basophils
5
4
80.00
1
20.00
Lymphocytes
30
30
100.00
0
0.00
Monocytes
30
26
90.00
4
10.00
Total
134
118
88.10
16
11.90
Fig.9:: Architecture of a Three-layered Pattern
Classification Neural Network.
Fig.10:: Training process.
3. RESULTS AND DISCUSSION
the direction that reduces the error at that neuron’s
output. The input layer used white blood cell feature 15 features. The hidden layer was designed by 5
nodes. Finally, the results of output layer were equal
the number of white blood cells in different classes
and determined from the probability of class membership. The whole process can be schematized for
training process and testing process as showed in figures 10 and 11, respectively.
Figure 12 shows that the discrimination between
nuclear and nonnuclear pixels is selected after an automatic threshold value. As seen in table II, these
leukocytes features is determined from the basis of
the feature set. These parameters can be used as
efficient features for inputs of classifiers. The test applied these leukocytes features are carried out and the
results are shown in table III. The results presented
here are based on trials conducted with normal cells.
For training the classifiers, a library set of 233 patterns is used. The tested data consists of 134 samples and produced correct classification rate close to
88.10%.
44
INTERNATIONAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING
Fig.11:: Testing process.
VOL.8, NO.1 2015
Hence, it is the only block to the automation of the
entire process from data acquisition (which could now
have been performed automatically if an automated
microscope stage was affordable or at hand), through
white cell segmentation, to classification. Counting
the classified instances is seen as trivial. And, use of
other features in the blood cell. Some of these features will require higher resolution images. In particular, the images acquired for this research were
of too low a resolution to show the characteristic
granularity that is a key element in the haematologists differentiation of the granular leukocytes (neutrophils, eosinophils and basophils from lymphocytes
and monocytes. Whats more, Extension to other
types of white blood cells. Blast cells, for instance,
are characteristic of certain types of Leukemia and
would indicate further tests if found in blood. Being able to automatically classify these and flag samples accordingly could be a real boon to haematologists. This would of course require leukemic blood
with these unusual cells in evidence to be available
and some image acquisition (and manual classification by haematologists for the training set). And,
tests of statistical significance. Tests of the statistical significance of the differences in performance of
the different classification algorithms could be conducted, given a number of trials with different data
sets.
5. ACKNOWLEDGEMENT
Fig.12:: The nucleus segmentation.
This work is partially supported by Rangsit Research Institute at Rangsit University, Department of
Telecommunications Engineering at King Mongkuts
Institute of Technology Ladkrabang, and Department
of Industry Physics and Medical Instrument at King
Mongkuts University of Technology North Bangkok.
The authors also gratefully acknowledge the helpful
comments and suggestions of the reviewers, which
have improved the presentation.
4. CONCLUSION
References
This paper presented a methodology to achieve a
fully automated detection and classification of leucocytes by microscope color images identifying the
following classes: Basophil, Eosinophil, Lymphocyte,
Monocyte and Neutrophil. Experiments show that
the final classification module implemented by means
of a parallel classifier composed by back propagation neural classifiers achieves an accurate solution
with minor computational complexity than traditional nearest neighbor classifier. Results indicate
that the morphological analysis of bloods white cells
is achievable and it offers remarkable classification accuracy.
Further studies will be focused on other methods of
determining the boundaries of cells. The method chosen here was, perhaps, a little nave and worked poorly
in a number of instances. This is the only step in
the process where manual intervention was required.
[1] [1] F. Cillesen, W. Der Meer, Atlas of Blood
Cell Differentiation, Elsevier Science B.V., Amsterdam The Netherlands. 1998.
[2] Keith Breden Taylor and Julian B. Schorr, Blood,
Colliers Encyclopaedia. vol. 4 (1978).
[3] A. Hughes, A History of Cytology, New York:
Aberlard-Schuman, 1959.
[4] G. A. Daland and T. H. Ham, A Color Atlas
of Morphologic Hematology, Cambridge, Mass.:
Harvard University Press. 1967.
[5] R. S. Ledley, High-speed automatic analysis of
biomedical pictures, Science. October (1964) 216223.
[6] P. G. Stein, L. E. Lipkin, and H. M. Shapiro,
Spectra II: general-purpose microscope input for
a computer, Science. vol. 166 (1969) 328- 333.
[7] J. W. Butler, Automatic analysis of bone autoradiographs, in Pictorial Pattern Recognition, G. C.
P. Suapang and S. Chivaprecha
Cheng et. al., Eds. Washington, D.C.: Thompson
Book Co., 1968, pp. 75-85.
[8] M. R. Evans and J. W. Sweeney, A computer technique for investigating and rationalization of scintillation scan reading, Methods Inform. Med. vol.
6 no. 1 (1967) 24-27.
[9] F. Winsberg et al., Detection of radiographic abnormalities in mammograms by means of optical
scanning and computer analysis, Radiology. vol.
89 (1967) 211-215.
[10] G. L. Wied et al., Taxonomic intracellular analytic system (TICAS) for cell identification, Acta.
Cytol. vol. 12(1968) 180-204.
[11] J. W. Butler, M. K. Butler, and A. Stroud, Automatic classifications of chromosomes-II, in 1964
Proc. Rochester Conf. Data Acquisition and Processing in Biology and Medicine. New York: Pergamon, 1965.
[12] J. Hilditch and D. Rutovitz, Chromosome recognition, Ann. N.Y. Acad. Sci. vol. 157 (1969) 339364.
[13] Pramit Ghosh, Debotosh Bhattacharjee, Mita
Nasipuri and Dipak Kumar Basu, Automatic
White Blood Cell Measuring Aid for Medical Diagnosis. Process Automation, Control and Computing (PACC). 20-22 July (2011) 1-6.
[14] Sawsan F. Bikher, Ahmed M. Darwish, Hany A.
Tolba, Samir I. Shaheen. Segmentation and Classification of White Blood Cells. IEEE International Conference on Acoustics, Speech, and Signal Processing. vol. 6 5-9 June (2000) 2259 - 2261.
[15] Neelam Sinha, A.G.Ramakrishnan. Automation
of Differential Blood Count. TENCON 2003:
Conference on Convergent Technologies for the
Asia-Pacific Region. vol. 2 15-17 Oct. (2003) 547
- 551.
[16] I. Cseke,“A fast segmentation scheme for white
blood cell images,” in Proceedings of the 11th
IAPR International Conference on Pattern Recognition, Conference C: Image, Speech and Signal
Analysis, pp. 530533, 1992.
[17] J. W. Bacus, “An automated classification of the
peripheral blood leukocytes by means of digital
image processing,” Ph.D. dissertation, Univ. Illinois, Chicago, 1970.
[18] [2] C. W. Barnett, “The unavoidable error in the
differential count of the leukocytes of the blood,”
J. Clin. Invest., vol. 12, pp. 77-85, 1933.
[19] J. W. Butler, M. K. Butler, and A. Stroud,
“Automatic classifications of chromosomes-II,” in
1964 Proc. Rochester Conf. Data Acquisition and
Processing in Biology and Medicine. New York:
Pergamon, 1965.
[20] J. W. Butler, “Automatic analysis of bone autoradiographs,” in Pictorial Pattern Recognition,
G. C. Cheng et. al., Eds. Washington, D.C.:
Thompson Book Co., pp. 75-85, 1968.
[21] T. D. Caspersson, G. Lomakka, G. Svenson,
and R. Saftstrom, “A versatile ultramicrospectrograph for multiple-line and surface scanning
45
high resolution measurements employing automated data analysis,” Expt. Cell Res., vol. 3, pp.
40-51, 1955.
[22] H. J. Conn, Biological Stains, 7th ed. Baltimore,
Md.: Williams and Wilkens, 1961.
[23] G. A. Daland and T. H. Ham, A Color Atlas
of Morphologic Hematology. Cambridge, Mass.:
Harvard University Press, 1967.
[24] M. R. Evans and J. W. Sweeney, “A computer technique for investigating and rationalization of scintillation scan reading,” Methods Inform. Med., vol. 6, no. 1, pp. 24-27, 1967.
[25] J. Hilditch and D. Rutovitz, “Chromosome
recognition,” Ann. N.Y. Acad. Sci., vol. 157, pp.
339-364, 1969.
[26] A. Hughes, A History of Cytology. New York:
Aberlard-Schuman, 1959.
[27] M. Ingram, P. E. Norgren, and K. Preston, Jr.,
“Automatic differentiation of white blood cells,”
in Image Processing in Biological Sciences, D. M.
Ramsey, Ed. Berkeley, Calif.: Univ Calif. Press,
1968.
[28] M. Ingram and K. Preston, Jr., “Automatic
analysis of blood cells,” Sci. Amer. vol. 223, pp.
72-82, 1970.
[29] L. W. Koster, “Color photography of biological
stains,” J. Biol. Photogr. Ass., vol. 35, no. 1, Feb.,
1967.
[30] R. S. Ledley, “High-speed automatic analysis of
biomedical pictures,” Science, pp. 216-223, Oct.
1964.
[31] M. L. Mendelsohn, B. H. Mayall, J. M. S. Prewitt, R. C. Bostrom, and W. G. Holcomb, “Digital transformations and computer analysis of microscopic images,” in Advances in Optical and
Electron Microscopy, V. E. Coslett, Ed. New
York: Academic Press, 1968, pp. 77-150.
[32] N. J. Nilsson, Learning Machines. New York:
McGraw-Hill, 1965.
[33] J. M. S. Prewitt and M. L. Mendelsohn, “A
general approach to image analysis by parameter extraction,” in Proc. Computers in Radiology,
Chicago, 1966.
[34] P. G. Stein, L. E. Lipkin, and H. M. Shapiro,
“Spectra II: general-purpose microscope input for
a computer,” Science, vol. 166, pp. 328- 333, 1969.
[35] G. L. Wied et al., “Taxonomic intracellular analytic system (TICAS) for cell identification, Acta.
Cytol., vol. 12, pp. 180-204, 1968.
[36] F. Winsberg et al., “Detection of radiographic
abnormalities in mammograms by means of optical scanning and computer analysis,” Radiology,
vol. 89, pp. 211-215, 1967.
[37] I. T. Young, “Automated leukocyte recognition,”
Ph.D. dissertation, Massachusetts Inst. Technol.,
Cambridge, 1969.
46
INTERNATIONAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING
Piyamas Suapang received bachelor
degree of physical sciences from Nareasuan University, Phitsanulok, Thailand
in 1999. In 2004, she graduated master
degree of sciences in medical instrumentation from King Mongkut’s University
of Technology North Bangkok, Bangkok,
Thailand in 2004. She achieved doctoral degree of electrical engineering
from King Mongkuts Institute of Technology Ladkrabang, Bangkok, Thailand
in 2014. Along the past ten years, she
has focused my research on medical image processing. In addition, her current research are computer vision, pattern recognition and computer-assisted diagnosis (CAD) in medicine.
Sorawat Chivaprecha received bachelor degree of telecommunication engineering from Suranaree University of
Technology, NakhonRatchasima, Thailand in 1998. Afterward, he obtained
master and doctoral degrees of electrical engineering from King Mongkuts
Institute of Technology Ladkrabang,
Bangkok, Thailand, in 2002 and 2008,
respectively. His research interests are
digital filter design and implementation,
VLSI in the area of digital signal processing, digital system design and FPGA applications, informatics, software defined radio technology and remote sensing
satellite system.
VOL.8, NO.1 2015