Sychra et al. - Research Paper

Monte Carlo probabilities of cluster formation by image noise.

Jerry J. Sychra, Michael J. Blend, Noam Alperin, and Steven U. Brint

University of Illinois at Chicago

Correspondence should be addressed to:
Jerry Sychra, Ph.D.
Dept. of Radiology
University of Illinois
1740 W. Taylor St.
Chicago, Illinois 60612

Submitted for publication: October 15, 1998

Keywords: voxels, image noise, Monte Carlo analysis, brain mapping


When voxel values of a small tumor or of a brain activation are very low it is difficult to distinguish them from the random image noise.  A tumor or an activation is usually represented by a cluster of voxels. Consequently, when the probability of an observed cluster being formed by noise is negligible, the cluster can be interpreted as  a result of a deterministic process. The probability of a cluster generated by random noise is associated with the spatial autocorrelation of the noise. Assuming that the image noise is a Gaussian random field, we have used Monte Carlo approach to calculate cluster probabilities for selected autocorrelation values. The obtained results can be used by the  reader to detect tumors or activations in actual images.


In many 2-D or 3-D functional images the sought after event (e.g., brain activation, tumor) is represented by a small cluster of voxels with elevated intensity. Often, the corresponding intensity threshold is close or even lower than the intensity of the most noisy voxels. When the voxel intensity alone can not be used for the detection of the event, it is the clustering tendency of the event's voxels that may aid the image analysis. However, random noise may also form clusters of elevated intensity. Consequently, the mere existence of a cluster can not be used to confirm an event if there is a significant probability that the cluster is a result of noise. In this article we present two Monte Carlo based methods for the estimation of the probability that a cluster is formed by noise. 

A significant amount of published work 1-3 has approached this problem in the analytical mathematical direction. We have chosen a Monte Carlo approach for its (1) flexibility of modeling the noise process, ROI size and shape, and of the cluster defining voxel connectivity, and (2) because it does not depend on constraining assumptions that are needed to simplify mathematical derivations. Admittedly, our approach is computationally more demanding.


Let us assume that the voxel intensity I(xi,yi,zi) in the voxel [xi,yi,zi] of a (functional) image can be expressed as


where Io is the deterministic component of the signal I and N is a Gaussian noise. Because of the image generation process the noise is spatially correlated. When a voxel has intensity higher than the threshold that separates a very small subset of the highest intensity voxels from the rest, the probability that its neighbor has intensity also above the threshold is greater when the spatial autocorrelation value is high. Consequently, the probability of formation of a cluster is higher with higher autocorrelation values, and one must take them into consideration when calculating the clustering probability. One may assume2,3 that the noise N can be modeled by a convolution of approximately Gaussian point spread kernel G and of spatially uncorrelated noise No:




wx,wy,wz > 0, c > 0 is the normalization constant and m is the size of the carrier of the kernel


and the noise satisfies


where (54 bytes)o2 is the noise variance and (54 bytes) is the Kronecker delta. (In case of 2-D images the relationships above can be modified by dropping the third dimension).

The noise autocorrelation cx in the direction of the coordinate axis x at the distance d is then estimated as


where the noise variance


From (2) to (8) then follows


In case of continuous Gaussian kernel and of random Gaussian noise field the summation is replaced by integrals, and the correlation cx(d) is Gaussian,


However, in case of the discrete kernel (3), Eq. (9) can be rewritten as


i.e. while the smoothing function is Gaussian the used correlation model is Gaussian only approximately. The non-Gaussian factor (cox - cx ) is shown in Fig. 1 as a function of cx for the single pixel distance, d=1. (Significant values of this factor may caution against unjustified use of continuous models, such as continuous Gaussian random fields, in statistical analysis of noisy images). Analogous expressions can be derived for the autocorrelations cy and cz, respectively. Consequently, given the autocorrelations cx, cy and cz for d=1 one can generate a noise image N(i,j,k), i,j,k=1,2,..., by Monte Carlo method constrained by (2) to (6). The coefficients wx used in the generation of noise images were calculated by inverting (11) numerically at d=1.

It is reasonable to assume that the probability of clustering depends only on the autocorrelation value at the clustering distance of the pixels, i.e. on the autocorrelation value at the distance d=1 in case of 4-neighbor connectivity, and on the distances d=1 and d=2, respectively, in case of 8-neighbor connectivity. However, from (3) and (9) follows


 i.e. the autocorrelation at the diagonal clustering distance d=2 is determined by the autocorrelations at the axial distances d=1 and consequently there is no need to introduce it explicitly into the Monte Carlo process.

As the sought event (brain activation, tumor, etc.) is characterized by a cluster of voxels with intensities above a given threshold, noise may either enlarge the cluster by adding more pixels to it or may create a "false" cluster by noise only. The knowledge of the probability with which a "false" cluster is created is obviously important in the event detection process: it allows us to reject a cluster if the probability of it being made by noise is higher than beforehand accepted threshold. Estimates of these probabilities by Monte Carlo approach is the goal of this article. They will be obtained in the following way (the actual computer implementation - mainly the ordering and grouping of operations - may slightly differ to economize the computer use):

First, using the method described above, we will generate M (several hundred thousand) 2-D noise images with a priory requested autocorrelation cx = cy . All calculation will be done on a circular region of interest (ROI) of approximately 10,000 pixels. Instead of using a fixed threshold we will search for a thresholds that result in a predetermined number of voxels with intensities above the threshold on the given ROI, say 10, 20, 30, ... 200 pixels, respectively. (The increment of the number of thresholded pixels by ten is arbitrary and does not affect the described method in general).

During the second step clusters - if any - are detected. We consider separately two types of 2-D clusters: those defined by 4-neighbor and 8-neighbor connectivity, respectively. The existence (or absence) of clusters is recorded by updating cluster histogram H(i,t,s), where i is the image (experiment) ID number, t is the threshold ID number (t=1,2,...,20) and s identifies the cluster size. For example, H(1000,15,4) = 2 means that there are 2 clusters of 4-pixel size formed by the 150 brightest pixels in the 1000th noise image's ROI. The estimate of the probability P(t,k,s) of at least k clusters, each cluster containing at least s pixels is




and we will denote this cluster situation as the event [t,k,s]. The probability P(t,k,s) is used in the "single reading" mode. For example, (1) define 10,000 pixel ROI, (2) select the connectivity mode (4- or 8-), (3) select how many (10t) most intense pixels should be thresholded up, (4) find these pixels, (5) locate, measure and count clusters, and (6) accept or reject clusters based on the probability P. Note that our initial selection was not the signal threshold but the number of thresholded pixels, which in turn determines the threshold. (If we have a certain probability level and event type (number and size of clusters) in mind, we can consult the P graph - see below - to decide what number of thresholded pixels should be sought). The disadvantage of this approach is a possible rejection of a cluster that may be formed by pixels of significantly higher intensity than the found threshold, i.e. a cluster that would be found even if a smaller number of thresholded pixels would be requested and consequently would have a lower probability of being formed by noise. This can be overcome by the conditional probability approach: Let us start, for example, with only 10 thresholded pixels (t=1). If a sought event [1,k,s] takes place, its probability of being formed by noise is P(1,k,s). However, if it does not take place, let us add the next 10 most intensive pixels (t=2). The probability P' of the event [2,k,s] is now less then P(2,k,s) and its estimate is given by,


where the logical variable L'i is calculated from


In other words, P'(t,k,s) is the estimate of the probability of at least k clusters made by noise, each cluster containing at least s pixels, under the condition that this event has not taken place at any of the higher step-wise defined thresholds corresponding to 10, 20, ..., 10(t-1) thresholded pixels, respectively. (Let us note that we do not keep track of individual clusters when changing thresholds and consequently in the rare situation when clusters coalesce after lowering the intensity threshold no correction for "lost" clusters by coalescence is calculated, i.e. the true values of the histogram H are used only). Obviously, P is a cumulative function of P'.

The evaluated functional image I is often a linear combination of K input  images Ij,


If (1) is valid for the input images and if their noise components satisfy


where (54 bytes)j2 is the noise variance and cj's are noise autocorrelations, respectively, of the j-th input image, i,j=1,2,...,K.  Then the noise component N of the functional image (17) satisfies


and the noise autocorrelation cw of the image I


where  w stands for x, y, and z, respectively and


Consequently, one can use autocorrelation measurements on the original input images in derivation of the noise cluster probabilities of the combinatory functional image (17). As long as (1), (17) and (18) are valid, this approach may be applicable, for example, in case of SPECT and PET brain activation images, SPECT dual radioisotope tumor imaging4, fMRI brain activation imaging by image differences, and principal component or factor images of fMRI or radionuclei image sequences. Obviously, when the original images have the same spatial autocorrelations of noise, the noise autocorrelation in the resulting functional image (17) is also the same. 

We have calculated (see below) estimates of the probabilities P and P' by the Monte Carlo method on circular ROIs of about 10,000 pixels. One can show that if the size of the ROI is moderately changed with only a small change of its shape (for example, by changing the ratio of the ROI area and of the square of its boundary length only slightly(1)), the corresponding probabilities P and P' can be estimated by a simple transformation of the probabilities P and P' calculated on the original ROI. Let us assume that an image contains two ROI's and that the probabilities of an event on these ROI's are p1 and p2, respectively. The probability p12 that the event takes place on the union of these ROI's is


(We neglect events taking place across the common boundary of the ROI's). In a case of subsequent doubling (k>0) or halving (k<0) an ROI |k|-times the corresponding probabilities pk are


respectively, where p is the original probability (again, we assume that cluster sizes under consideration are much smaller than the size of the ROI). As any ROI size can be approximated by a sum of progressive halves and doubles of the original ROI size (in a manner similar to writing a binary floating point number), the corrected probabilities P and P' can be constructed as the analogical combinations of pk and p-k probabilities by using (23) in the corresponding iterative manner. For the actual estimates of the multiplicative correction factor see the following section.


We have calculated estimates of the unconditional and conditional probabilities P and P', respectively, in several studies (each study represents a unique combination of parameters described above). First, each study is based on over half a million simulated noise images. All studies were based on circular ROI's of ten thousand pixels each. Individual studies differed by the cluster connectivity condition (4-, and 8-neighbors) and by the spatial correlation of the noise (0, .25 and .5, respectively). Each study required over 100 hours on a PC with 200 MHz Pentium Pro II processor and was programmed in IDL in Windows NT environment (about 800 hours of total run time). The results of individual studies are presented as graphs in Fig. 2 to 15 for cluster sizes 2 to 8 pixels, and for minimal number of clusters one to five (or more). For each study, estimates of autocorrelation values at single pixel distance were calculated on more than ten thousand randomly selected images. The mean autocorrelation values differed less than 0.001from the requested values 0, 0.25 and 0.50, respectively, and the corresponding standard deviations were smaller than 0.01. 

Example #1 (Fig. 10 )  autocorrelation=0.25, 4-neighbor connectivity): The probability that at least one 3-pixel or larger cluster will be found in 10,000 pixel circular ROI when 50 thresholded pixels are available to form such a cluster is approximately 0.06 (the probability of at least two such clusters is less than 0.002).

Example #2 (Fig. 11) : If we have checked for the occurrence of the same kind of cluster as in the Example #1 at thresholds yielding 10, 20, 30 and 40 pixels and have not detected any, but found one or more when 50 pixels were thresholded up, the probability that such cluster(s) would be created by noise is approximately 0.03. 

Intuitively, when the event probability is very small (say, less than 0.01), doubling or halving both the number of thresholded voxels and the size of the ROI results in approximately doubling or halving the event probability on the new ROI, respectively. Using the approach described above we have calculated (Fig. 14) the correction factor R that enable us to estimate the event probability Pnew on new ROI


where Snew is the size of the new ROI, Sref is the area of the reference ROI for which the event probability Pref is known (tabulated). Note that for the reference probability Pref  <   0.07 the multiplicative correction factor R for ROI size fractions Snew / Sref between 0.5 and 1.1 differs from unit by less than 0.02.

Example #3 ( Fig. 2 and Fig. 14 ) :  Let us assume that the ROI has 5000 pixels and that 75 pixels are thresholded up. What is the unconditional probability Pref of 5-pixel cluster on this ROI? The reference ROI that we used to tabulate probabilities P and P' has 10000 pixels. Consequently, one can expect that the same threshold would result in approximately 150 pixels in the reference size ROI. The corresponding probability Pref of 4-pixel cluster on the reference ROI is approximately 0.03 (Fig. 2). As the ROI fraction Snew / Sref = 0.5 and R 1.013 (Fig. 14), the probability Pnew  approx.gif (57 bytes)  0.015. 

In the actual data acquisition the autocorrelation values depend on the scanner's parameters. For an illustration, we have scanned  calf brains in a plastic jar by GE SIGNA 1.5 T  MRI scanner with EPI pulse sequence, TR= 1066 ms, TE= 60 ms, 3  slices (thickness 7 mm, gap 1 mm), image size 64x64 pixels, FOV = 220 mm ,  total of 170 images of each slice in the sequence. The first four images of each slice were excluded from the analysis. After subtracting the time-average image to remove the "deterministic component", and after we defined relatively large brain ROIs (1100 to 1200 pixels)   on each slice, the following average autocorrelation values in brain ROIs were found: cx(1) 0.42 (in left to right direction) and cy(1) 0.18 (in superior to inferior direction, along the scanner's axis). The autocorrelation estimates varied from image to image with the standard deviations on the three slices between 0.06 and 0.08; actual autocorrelation differences between subsequent images were less than half as large, probably due to a slow "drift" of the scanner's acquisition parameters. (In case of brain activation studies one may consider to use as a reference an image sequences acquired under "no activation / quiet brain" conditions).

To make our results easier to use by the reader  we have generated a relatively large number of graphs of the cluster probabilities. The actual IDL computer code used in the Monte Carlo simulation  is available on request by e-mail from It permits the user to enter an arbitrary combination of cox and coy autocorrelation values. The complete list of graphs follows:

Fig_1 Autocorrelation difference (cox - cx ) between the continuous and discrete models at one pixel distance as a function of cx (5x5 Gaussian smoothing kernel and the continuous Gaussian kernel are using the same coefficients w - see Eqs. (3), (10) and (11).

Unconditional probabilities P and conditional probabilities P' (with the thresholded pixels' count incrementing by 10) for circular ROI of 10,004 pixels:
Fig__2  ( P)  and Fig__3 (P') : noise autocorrelation 0.00, and 8-neighbor connectivity.
Fig__4  ( P)  and Fig__5 (P') : noise autocorrelation 0.25, and 8-neighbor connectivity.
Fig__6  ( P)  and Fig__7 (P') : noise autocorrelation 0.50, and 8-neighbor connectivity.
Fig__8  ( P)  and Fig__9 (P') : noise autocorrelation 0.00, and 4-neighbor connectivity.
Fig_10 ( P)   and Fig_11 (P') : noise autocorrelation 0.25, and 4-neighbor connectivity.
Fig_12  ( P)  and Fig_13 (P') : noise autocorrelation 0.50, and 4-neighbor connectivity.

Fig_14 Cluster probability correction factor to compensate for change of ROI size.

Some segments of curves in Figures 2 to 13 for high (P, P' > 0.1) or very low ( P, P' < 0.001) probabilities, or for large clusters, are quite noisy. This is caused by limited number of noise images available during the Monte Carlo analysis. However, the associated inaccuracy of the curves at these probability ranges is of no practical importance: when the  probability is greater than 0.1, one usually does not reject the hypothesis that the clusters are a noise product, and when the probability is less than 0.001, the hypothesis is usually rejected anyway.

The used approach has the advantages in its intrinsic flexibility: (1) any connectivity model can be built in, (2) spatial noise autocorrelation values do not have to be isotropic (they may even vary locally), (3) the continuity assumption is not required, (4) simplifying assumptions to accomodate easy analytical derivations are not required, and (5) an arbitrary ROI may be used.


The authors wish to express their thanks to Drs. M. Mafee and D. Hier for their support, and to Dr. D. Fiat for his advice. This work has been also supported by internal UIC Neuro-science grant and by Cytogen, Inc..


  1. KJ Worsley, AC Evans, S Marrett, P Neelin. A three-dimensional statistical analysis for CBF activation studies in human brain. J. of Cer. Blood Flow and Metabolism, 12:900-918, 1992

  2. KJ Friston, KJ Worsley, RSJ Frackowiak, JC Mazota, AC Evans. Assessing the significance of focal activations using their spatial extent. Human Brain Mapping, 1:210-220, 1994

  3. J Xiong, J Gao, JL Lancaster, PT Fox. Clustered pixel analysis for functional MRI activation studies of the human brain. Human Brain Mapping, 3:287-301, 1995

  4. JJ Sychra, Q Lin, MJ Blend. The detection of metastatic prostate cancer with simultaneous dual radioisotope SPECT images. RSNA-EJ, 1, 1997

© 1999 Epress Inc.