Normalized cuts for image segmentation

Wei Liu (u0614581)
weiliu@sci.utah.edu


introduction

This project implemented normalized graph cuts for data clustering and image segmentation (they are same problems). First I give a brief introduction of the method, then I compared the effects of different definition affinity matrix, and the parameters of them. Then I compared graph cuts and normalized graph cuts on simple image. Last I test the method on middle complex real world image.

Method

Standard graph cuts: The graph cuts method aims to optimized a constrained objective function

$\displaystyle \vec x\T \mat A \vec x + \lambda (\vec x\T \vec x -1)$    

where $ \vec x$ is indicator vector with each element representing if the data point in this cluster or not. The elements of $ \vec x$ can only take discrete values $ \{0, 1\}$ by definition but for the purpose of optimization, this can be relaxed. The solution is the eigenvectors of the affinity matrix $ \mat A$ associated with its largest eigenvalues.

Normalized graph cuts: Conventional graph cuts tends to classify small number of isolated data points as clusters, and this is often not correct. Normalized graph cuts (NGC) defines a new objective function by normalizing the original graph cuts criteria:

Ncut$\displaystyle (A, B) = \frac{\mbox{cut}(A, B)}{\mbox{Asso} (A, V)} + \frac{\mbox{cut} (A, B)}{\mbox{Asso}(B, V)}$ (1)

And the optimized solution of above objective function is actually the solution of the following generalized eigen-decomposition problem

$\displaystyle (\mat D - \mat W) \vec y = \lambda \mat D \vec y$    

The eigenvector with second smallest eigenvalues is the solution that minimize (1). Users can also use more than one eigenvectors for clustering of multi-class prlblems.

Pattern recognition viewpoint: The eigenvectors (for either conventional or normalized graph cuts) can be see as `features' of data points, in the pattern recognition viewpoint. Hence for clustering, we can use more eigenvectors than the number of clusters, and use general clustering method like K-means for clustering, with eigenvectors as input feature points. This strategy is better than manually thresholding eigenvectors (as the book chapter does), as it does not need human intervention.

Affinity matrix: Affinity matrix is a $ N\times N$ matrix with each element $ A(i,j)$ the affinity between data point $ x_i$ and $ x_j$ . In this project we use intensity-based affinity and distance-based affinity. The kernel size $ \sigma$ of these two affinity is tricky. For intensity based affinity, the kernel size $ \sigma_i$ should be small, because two data points with large intensity difference should not be in same cluster by all means. for distance-based affinity, the kernel size $ \sigma_d$ should not be too small, because even two data points are far away, they can still be in same cluster, hence the restriction of distance affinity should be looser.

Graph structure: There are different method to generate the graph, like the $ \varepsilon$ -neighborhood or $ k$ -nearest neighborhood method. In this project we just generate a full-connected graph. That is, each data point is connected with all other points, with different weights. For large data set, we may need a sparse representation of affinity matrix, and may need the above $ \varepsilon$ -based or nearest neighborhood method to generate sparse affinity matrix.

Normalization of distance affinity and intensity: Different data set have different scale of distance between data points. For $ 1000\times 1000$ image, the largest distance between two data points is $ 1000\sqrt{2}$ , but the largest distance for a $ 10\times 10$ image is $ 10\sqrt{2}$ . I think it is good to normalized the distance to the same scale $ [0, 1]$ , such that one parameter setting (i.e. the $ \sigma$ value) apply to most images. Analogous to this, intensity affinity also need normalization.

Experiments

Graph cuts on Synthetic data: we first use a set of synthetic data points in 2D space for testing graph cuts. The goal is to use few data points with known ground truth to validate graph cuts behavior. In figure 1, I generated 80 data points in four clusters, with 20 points in each cluster. Only distance-based affinity matrix is used as there is no intensity information available for this data set. Distance are normalized such that the scope of distance between any two points are in $ [0, 1]$ . After eigendecomposition, I visualized the eigenvalue and then manually choose a threshold for each eigenvector. The clustering result is given in figure 2. It is not surprising that for this very simple problem, even the most naive algorithm is able to get a good clustering.

Figure 1: graph cuts on synthetic data point in 2D space. Top row left: the generated data with ground truth colored for each cluster; middle: affinity matrix based on normalized distance ( $ sigma_d = 0.1$ ). Right: huffled the data points and get the affinity matrix. Bottom from left to right: the 3 eigenvectors with largest eigenvalues of affinity matrix. Red dotted line is the threshold mannually set for clustering.
\includegraphics[width = 0.3 \textwidth]{test_data.eps} \includegraphics[width = 0.3 \textwidth]{test_1_aff.eps} \includegraphics[width = 0.3 \textwidth]{test_1_aff_shfl.eps}
\includegraphics[width = 0.3 \textwidth]{test_1_ev_1.eps} \includegraphics[width = 0.3 \textwidth]{test_1_ev_2.eps} \includegraphics[width = 0.3 \textwidth]{test_1_ev_3.eps}

Figure 2: Clustering results of data set in figure 1. Left is the true clusters, right is the clusters from graph cuts.
\includegraphics[width = 0.3 \textwidth]{test_data.eps}          \includegraphics[width = 0.3 \textwidth]{test_1_res.eps}

Compare affinity matrix: In this experiment we compare distance-based affinity and intensity-based affinity and the combination of both. Same to previous test, both distance difference and intensity difference are normalized to the scope of $ [0, 1]$ . (This normalization is not confused with the normalization of `normalized graph cuts'.)

First we use ohly distance-based affinity. In such case, the affinity does not use any information from the image, so I expect to see in the result some big, coherent region but not like the image's region. The result in figure 3 confirmed my thoughts. In figure 3, it is hard to find a good threshold for both eigenvectors. This is because, without image information, the eigenvectors obtained from distance-based affinity matrix only represent the internal coherence of the image, and is hence smooth. After thresholding at an arbitrary value as shown in the figure, the result is a circle region and two rectangle region.

Figure 3: Clustering results on a synthetic $ 50\times 50$ gray level image. Top left: original image. Middle: affinity matrix based on distance ( $ sigma_d = 0.1$ ). Right: first eigenvector and threshold. Bottom left: second eigenvector and threshold. Right: clustering result. colored region is the labels obtained from graph cuts, overlaid on original image. The number of clusters given by users is three.
\includegraphics[width = 0.3 \textwidth]{toy1_50.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1000_aff.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1000_ev_1.eps}
\includegraphics[width = 0.3 \textwidth]{toy1_50_1000_ev_2.eps}          \includegraphics[width = 0.25 \textwidth]{toy1_50_1000_res.eps}

Next we use only intensity-based affinity. I would expect this scenario will have much better result as it use information from image. Because it is less possible if two pixel have big difference on intensity, intensity is a accurate and trustable measurement of affinity compared to spatial distance, so I choose a much smaller kernel size $ \sigma _i = 0.01$ for intensity. The result in figure 0100 shows, by carefully thresholding eigenvectors, I am able to segment this simple image quite well. There are still small holes in the bigger regions, because there is no constraint on the data points' spatial neighborhood.

Figure 4: Clustering results on a synthetic $ 50\times 50$ gray level image (same with the image in figure 3). Top left: original image. Middle: affinity matrix based on purely intensity difference ( $ \sigma _i = 0.01$ ). Right: first eigenvector and threshold. Red dotted line are threshold manually set by users. Bottom left: second eigenvector and threshold. Right: clustering result. colored region is the labels obtained from graph cuts, overlaid on original image. The number of clusters given by users is three.
\includegraphics[width = 0.3 \textwidth]{toy1_50.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_0100_aff.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_0100_ev_1.eps}
\includegraphics[width = 0.3 \textwidth]{toy1_50_0100_ev_2.eps}          \includegraphics[width = 0.25 \textwidth]{toy1_50_0100_res.eps}

The we use both intensity and distance-based affinity by multiplying them together. The result in figure 5 shows the affinity matrix looks like the combination of the individual case, also does the eigenvectors. The segmentation result is better than only using intensity-based affinity, as there is less holes in the big region. This is due to the constraints on distance force the isolated pixel merge to its neighbors. By choosing a larger $ \sigma_d$ , we are probably able to remove all holes in this particular data set. However, a bigger $ \sigma_d$ may not apply to other image in general.

Figure 5: use both intensity and distance-based affinity matrix on a synthetic $ 50\times 50$ gray level image (same with the image in figure 3). Top left: original image. Middle: distance-based affinity matrix. Right: combination of distance and intensity based affinity ( $ \sigma _i = 0.01, \sigma _d = 0.1$ ). Bottome left and middle: first and second eigenvector and threshold. Red dotted line are threshold manually set by users. Right: clustering result. colored region is the labels obtained from graph cuts, overlaid on original image. The number of clusters given by users is three.
\includegraphics[width = 0.3 \textwidth]{toy1_50.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1100_affi.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1100_aff.eps}
\includegraphics[width = 0.3 \textwidth]{toy1_50_1100_ev_1.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1100_ev_2.eps} \includegraphics[width = 0.25 \textwidth]{toy1_50_1100_res.eps}

Linear combination of eigenvectors: For this example in figure 3, 4 and 5 I did not observe the fact that single eigenvector is not enough for clustering. However, the combination of eigenvectors should have better discrimination strength than single eigenvectors. This is like feature extraction of pattern recognition, more features always get better results, if they are really good features and independent with each other.

Compare graph cuts with normalized graph cuts: To compare the two method, I use K-means for clustering instead of manually thresholding eigenvectors. This will guarantee a fair comparison between GC and NGC. Also note that K-means sometimes performs not good as manual thresholding but as long as both GC and NGC use K-means, it is a fair comparison.

Another motivation of using K-means is this method automatically cluster the data based on the eigenvectors, without human intervention.

Figure 6: Compare GC and NGC on a synthetic $ 50\times 50$ gray level image (same with the image in figure 3). Clustering step uses K-means. Left: original image. Middle: segmentation using graph cuts. Right: segmentation on normalized cuts.
\includegraphics[width = 0.3 \textwidth]{toy1_50.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1101_res.eps} \includegraphics[width = 0.3 \textwidth]{toy1_50_1111_res.eps}

NGC on realy image: To varify the normalized graph cuts on real image, I choose a Matlab's buildin image of spine. The results in figure 7 shows the method works. For human vision's observation this image have three regions. If we set a strong constraints on distance $ sigma_d = 0.1$ , we get the first row in the figure. We see when given two clusters, NGC works is able to find the forground and background. When given three clusters, it is able to find one region of the third cluster (on the top left of the image), but not able to find the other spatially seperated region of the same cluster (top right of the image). When given four clusters, NGC even tries to split the background. This is understandable, because the spatial distance constraint is so strong that even the background are classified as two clusters.

This let me think if NGC would work better if I loosen the spatial constraints. Bottom row is when $ \sigma _d = 0.3$ . We see when gigen three clusters, NGC did find the other region of the third cluster (top right of the third image). When given 4 clusters, it tries to put some edge points into fourth cluster. This is understandable, because from human vision, there is no fourth clusters at all.

Figure 7: Test on a real world spine image. Top row is when kernel for distance $ sigma_d = 0.1$ , bottome row is when $ \sigma _d = 0.3$ . 1st column: original image. 2nd column: segmented image given number of clusters 2. 3rd column: segmented image given 3 clusters. 4th column: given 4 clusters.
\includegraphics[width = 0.23 \textwidth]{spine_70.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_1111_res.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_31111_res.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_41111_res.eps}
\includegraphics[width = 0.23 \textwidth]{spine_70.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_21111_0.3_res.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_31111_0.3_res.eps} \includegraphics[width = 0.24 \textwidth]{spine_70_41111_0.3_res.eps}

On large data: The affinity matrix has $ N\times N$ elements (N is the number of pixels in image), and eigen-decomposition has $ N^3$ complexity, which make the computation cost prohibitive. I also tried to build a sparse affinity matrix and solve eigen problem on this sparse matrix with much smaller computation cost. The method of building sparse affinity matrix is redefine the connectivity between the vertex of the graph, such that affinity less than certain value is set to zero. This $ \varepsilon$ -neighborhood definition will generate sparse matrix.

Unfortunately, I was stuck on the usage of eigs function of Matlab, and could not finish this experiment. But theoretically this is doable.

Conclusion

Graph theory is widely applied to computer vision problem in recent years, partly because the rich methods available in this branch of mathematics. The good thing about graph cuts is the separation of affinity matrix and the optimization of clustering criteria (Ncuts). We can freely choose whatever affinity (domain specific) between data points, and choose whatever optimization method (Ncuts, Laplacian embedding, ...).

I am concerned the method to combine two affinity matrix (say, from two modality of image). A simple multiplication may not be the best method. We have to manually tune the kernel size of two affinity to represent our belief of them. But it seems there is no general method available for this combination.


Wei Liu 2010-04-30