# International journal of engineering technology and computer applications

Yüklə 109.62 Kb.
 tarix 26.04.2016 ölçüsü 109.62 Kb.

INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY AND COMPUTER APPLICATIONS.

Plant Clustering Using K-Means Approach

J Hencil Peter 1, Rev. Dr. A. Antonysamy, S.J 2

2 St.Xavier’s College, Palayamkottai, India – 627 002.

Abstract- Most of the Plants are already been classified into several categories based on their nature, life style, etc. But grouping/clustering them into given number of clusters using their combined properties are an interesting job. Using this proposed approach, selected plants properties are listed and each of the properties are given rank based on the importance. Once the rank has been assigned, clustering algorithm will be applied on the input table and the result table will contain the grouped plants
KeywordsPlant Clustering, Grouping Plants, K-Means Clustering Algorithm, Applications of K-Means Clustering Algorithm.

1. Introduction

Clustering is the process of partitioning or grouping a given set of patterns into disjoint clusters [2]. There are many clustering algorithms have been proposed and few of the notables algorithms are DBSCAN [3, 4], CLARA [10], CLARANS [9, 4], Hierarchical Clustering [4, 5]. In this paper, we have proposed an idea for clustering the plants using K-Means Clustering Algorithm [1, 2, 4, 8] Approach. K-Mean Clustering Algorithm groups the given N objects into K clusters (K <= N). But this algorithm will work with numeric numbers and doesn’t aware of Plants detail. So, Initial work is important to feed the plant details into this algorithm for clustering. The initial work involves assigning rank/numeric number to each and every properties of the plants based on its importance. These properties are selected by the Domain Expert to group the plants in a right manner. In this paper, we have chosen 12 Plants [7] with 3 properties of each plant for clustering. As we have used the K-Means clustering algorithm to resolve the problem of Clustering Plants, Clustering is briefly explained in Section2. K-Means algorithm is explained in Section 3. Plant Clustering and the relevant Flow-Diagram for Clustering Plants are explained in Section 4. Experiment Results are given in Section 5 and eventually ends with Conclusions (Section 6).

1. Clusters and Clustering

A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters [4]. Clustering is the process of grouping the data into classes or clusters (OR) the process of grouping a set of physical or abstract objects into classes of similar objects is called Clustering [4]. Clustering is also called data segmentation in some applications because clustering partitions large data sets into groups according to their similarities [4, 6].

1. K-Means Clustering Algorithm

K-means clustering is an algorithm to classify or group the objects based on its attributes/features into K number of groups. K is a positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid [8]. Thus the purpose of K-mean clustering is to classify the data.

K-Means Clustering Algorithm uses the partitioning method to group the given objects (N) into clusters (K). While grouping the objects into K groups, it must satisfy the following conditions:

• Each group must contain at least one object.

• Each object must belong to exactly one group.

The algorithm takes the input parameter K and split the set of N objects into K clusters. So, the resulting intra-cluster similarity is high and the inter-cluster similarity is low. Cluster similarity (distance) is measured using the mean value of the objects in a cluster.

Following steps are used in K-Means Algorithm

1. Select the Center points or centroid (K) coordinates.

First time, center points are chosen from the N objects in a sequence or random manner. Second time onwards, center points (Centroids) are considered to the similarity basis. i.e. if only one centroid found in a group (cluster), no need to find a new centroid for the particular group. Otherwise, average of all the specific cluster objects will be selected as the new centroid.

1. Determine the distance between each object to the Center points or centroids.

Usually, Euclidean Distance method is used to find out the distance between two points.

Formula for calculating the distance between two points:

Assume two points (x1, y1), (x2, y2) and the distance to be calculated is d:

1. Group (Clustering) the objects based on the minimum distance.

Once we are completed the step 2, we get the table of distance which shows the distance between each point from all the Center points or centroids. In this step, we should group the objects based on the minimum distance between centriods and the remaining objects (N-K).

1. Repeat the above steps until no objects to group.

This condition can be checked by comparing the previous group and the present group. If previous and present groups are same then no need to repeat the steps.

1. Plant Clustering

Clustering the plants based on their similarities (Similarity refers the similarity between their selected properties) is the objective of this paper. To achieve the goal, First plants are selected for clustering and selected plants properties are given ranks based on each properties importance/goodness. Best accurate rank selection by the Domain expert will give the accurate clustering result. After assigning Ranks to all the properties, Rank Table will be formed. Each row of the table represents the corresponding plant’s associated properties rank values. Since all the properties have been converted into the corresponding numeric numbers, now we can easily apply K-Means algorithm for grouping the plants. So, now K (number of cluster) and N are the inputs, and Algorithm process the input, group the objects based on its similarities into K clusters.

Flow-Diagram - Clustering Plants

Select the plants and their important properties for clustering

Assign Rank to each of the Property based on the importance / goodness

Feed the Rank Matrix (N) to the K-Means Algorithm

Compute Center Points (K) from the N object

Determine the distance between each object from centroids

Group the objects based on their minium distance from Centroids

Yes

No

1. Experiment Results

In the following experiment, there are 12 plants [7] and each plant’s 3 properties are chosen for clustering.

First Step: Plants and their important properties are selected.

 Botanical name of the Plant Dominated Habitat Character of Leaves Flowering Laurus Nobilis Trees Evergreen & Spiral Bisexual (or) Unisexual Annona Cherimola Trees (or) Shrubs Exstipulate Bisexual Magnolia Grandiflora Trees (or) Shrubs Stipulate Bisexual (or) Unisexual Asarum Canadense Shrubs Exstipulate Bisexual Piper Nigrum Herbs Stipulate (or) Exstipulate Bisexual (or) Unisexual Acorus Calamus Perennial Herbs Sheathing Bisexual Agave Deserti Sub Shrubs Parallel Veined Bisexual Allium praecox Biennial (or) Perennial Herbs Spiral Bisexual Aloe Marlothii Herbs Succulent Bisexual Brodiaea Elegans Herbs Sheathing Bisexual Lilium sp Herbs Spiral Bisexual Amaranthus Annual (or) Perennial Herbs Spiral Unisexual

Second Step: Rank is assigned against each property.

Below tables shows the properties Rank.

Rank Table - Habitat Properties
 Habitat Rank Tree 1 Tree (or) Shrubs 2 Shrubs 3 Sub Shrubs 4 Perennial Herbs 5 Herbs 6 Biennial (or) Perennial Herbs 7 Annual (or) Perennial Herbs 8

Rank Table – leaves Character

 Leaves Character Rank Evergreen & Spiral 1 Spiral 2 Stipulate 3 Stipulate (or) Exstipulate 4 Exstipulate 5 Succulent 6 Parallel Veined 7 Sheathing 8

Rank Table – Flowering Types

 Flowering Rank Bisexual 1 Bisexual (or) Unisexual 2 Unisexual 3

Plant Table with Ranks

 Botanical name of the Plant Dominated Habitat Leaves Flowering Laurus Nobilis 1 1 2 Annona Cherimola 2 5 1 Magnolia Grandiflora 2 3 2 Asarum Canadense 3 5 1 Piper Nigrum 6 4 2 Acorus Calamus 5 8 1 Agave Deserti 4 7 1 Allium praecox 7 2 1 Aloe Marlothii 6 6 1 Brodiaea Elegans 6 8 1 Lilium sp 6 2 1 Amaranthus 8 2 3

Clustered Plants

Output When K = 3

 Botanical name of the Plant Dominated Habitat Dominated Leaves Types Flowering Cluster Laurus Nobilis Trees Evergreen & Spiral Bisexual (or) Unisexual 1 Annona Cherimola Trees (or) Shrubs Exstipulate Bisexual 2 Magnolia Grandiflora Trees (or) Shrubs Stipulate Bisexual (or) Unisexual 1 Asarum Canadense Shrubs Exstipulate Bisexual 2 Piper Nigrum Herbs Stipulate (or) Exstipulate Bisexual (or) Unisexual 3 Acorus Calamus Perennial Herbs Sheathing Bisexual 2 Agave Deserti Sub Shrubs Parallel Veined Bisexual 2 Allium praecox Biennial (or) Perennial Herbs Spiral Bisexual 3 Aloe Marlothii Herbs Succulent Bisexual 2 Brodiaea Elegans Herbs Sheathing Bisexual 2 Lilium sp Herbs Spiral Bisexual 3 Amaranthus Annual (or) Perennial Herbs Spiral Unisexual 3

Output when K = 4

 Botanical name of the Plant Dominated Habitat Dominated Leaves Types Flowering Clusters Laurus Nobilis Trees Evergreen & Spiral Bisexual (or) Unisexual 1 Annona Cherimola Trees (or) Shrubs Exstipulate Bisexual 2 Magnolia Grandiflora Trees (or) Shrubs Stipulate Bisexual (or) Unisexual 1 Asarum Canadense Shrubs Exstipulate Bisexual 2 Piper Nigrum Herbs Stipulate (or) Exstipulate Bisexual (or) Unisexual 3 Acorus Calamus Perennial Herbs Sheathing Bisexual 4 Agave Deserti Sub Shrubs Parallel Veined Bisexual 4 Allium praecox Biennial (or) Perennial Herbs Spiral Bisexual 3 Aloe Marlothii Herbs Succulent Bisexual 4 Brodiaea Elegans Herbs Sheathing Bisexual 4 Lilium sp Herbs Spiral Bisexual 3 Amaranthus Annual (or) Perennial Herbs Spiral Unisexual 3

1. Conclusions

In this paper, we have proposed a way of clustering plants using k-means approach. Similar to K-Means algorithm, other clustering algorithms also can be used on the Rank Matrix to obtain various clustering results. For example, if we need the arbitrary number of cluster results, DBSCAN [3] algorithm can be applied on the Rank Matrix. A time consuming work using this approach is creating the Rank Matrix. It is always better, if we have the pre-processed Rank Matrix information for minimizing the overhead and improving the result accuracy. So, some good Rank Generation method needs to be developed to improve this approach.

References

1. R. C. Dubes and A. K. Jain. Algorithms for Clustering Data. Prentice Hall, 1988.

2. K. Alsabti, S. Ranka, and V. Singh, "An Efficient k-means Clustering Algorithm," Proc. First Workshop High Performance Data Mining, Mar. 1998.

3. Ester M., Kriegel H.-P., Sander J., and Xu X. (1996) “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise” In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland: Oregon, pp. 226-231.

4. Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, 2006.

5. G. Karypis, E.H. Han, and V. Kumar. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8):68–75, 1999.

6. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.

7. Michael G. Simpson, “Plant Systematics”, 2006.

8. Kardi Teknomo,"K-Mean Clustering Tutorials". available at : http://people.revoledu.com/kardi/tutorial/kMean/index.html

9. R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, Chile, pages 144–155, 1994.

10. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azrefs.org 2016
rəhbərliyinə müraciət