INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY AND COMPUTER APPLICATIONS.
Plant Clustering Using KMeans Approach
J Hencil Peter^{ 1}, Rev. Dr. A. Antonysamy, S.J^{ 2}
^{ }^{2} St.Xavier’s College, Palayamkottai, India – 627 002.
Abstract Most of the Plants are already been classified into several categories based on their nature, life style, etc. But grouping/clustering them into given number of clusters using their combined properties are an interesting job. Using this proposed approach, selected plants properties are listed and each of the properties are given rank based on the importance. Once the rank has been assigned, clustering algorithm will be applied on the input table and the result table will contain the grouped plants
Keywords—Plant Clustering, Grouping Plants, KMeans Clustering Algorithm, Applications of KMeans Clustering Algorithm.

Introduction
Clustering is the process of partitioning or grouping a given set of patterns into disjoint clusters [2]. There are many clustering algorithms have been proposed and few of the notables algorithms are DBSCAN [3, 4], CLARA [10], CLARANS [9, 4], Hierarchical Clustering [4, 5]. In this paper, we have proposed an idea for clustering the plants using KMeans Clustering Algorithm [1, 2, 4, 8] Approach. KMean Clustering Algorithm groups the given N objects into K clusters (K <= N). But this algorithm will work with numeric numbers and doesn’t aware of Plants detail. So, Initial work is important to feed the plant details into this algorithm for clustering. The initial work involves assigning rank/numeric number to each and every properties of the plants based on its importance. These properties are selected by the Domain Expert to group the plants in a right manner. In this paper, we have chosen 12 Plants [7] with 3 properties of each plant for clustering. As we have used the KMeans clustering algorithm to resolve the problem of Clustering Plants, Clustering is briefly explained in Section2. KMeans algorithm is explained in Section 3. Plant Clustering and the relevant FlowDiagram for Clustering Plants are explained in Section 4. Experiment Results are given in Section 5 and eventually ends with Conclusions (Section 6).

Clusters and Clustering
A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters [4]. Clustering is the process of grouping the data into classes or clusters (OR) the process of grouping a set of physical or abstract objects into classes of similar objects is called Clustering [4]. Clustering is also called data segmentation in some applications because clustering partitions large data sets into groups according to their similarities [4, 6].

KMeans Clustering Algorithm
Kmeans clustering is an algorithm to classify or group the objects based on its attributes/features into K number of groups. K is a positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid [8]. Thus the purpose of Kmean clustering is to classify the data.
KMeans Clustering Algorithm uses the partitioning method to group the given objects (N) into clusters (K). While grouping the objects into K groups, it must satisfy the following conditions:

Each group must contain at least one object.

Each object must belong to exactly one group.
The algorithm takes the input parameter K and split the set of N objects into K clusters. So, the resulting intracluster similarity is high and the intercluster similarity is low. Cluster similarity (distance) is measured using the mean value of the objects in a cluster.
Following steps are used in KMeans Algorithm

Select the Center points or centroid (K) coordinates.
First time, center points are chosen from the N objects in a sequence or random manner. Second time onwards, center points (Centroids) are considered to the similarity basis. i.e. if only one centroid found in a group (cluster), no need to find a new centroid for the particular group. Otherwise, average of all the specific cluster objects will be selected as the new centroid.

Determine the distance between each object to the Center points or centroids.
Usually, Euclidean Distance method is used to find out the distance between two points.
Formula for calculating the distance between two points:
Assume two points (x1, y1), (x2, y2) and the distance to be calculated is d:

Group (Clustering) the objects based on the minimum distance.
Once we are completed the step 2, we get the table of distance which shows the distance between each point from all the Center points or centroids. In this step, we should group the objects based on the minimum distance between centriods and the remaining objects (NK).

Repeat the above steps until no objects to group.
This condition can be checked by comparing the previous group and the present group. If previous and present groups are same then no need to repeat the steps.

Plant Clustering
Clustering the plants based on their similarities (Similarity refers the similarity between their selected properties) is the objective of this paper. To achieve the goal, First plants are selected for clustering and selected plants properties are given ranks based on each properties importance/goodness. Best accurate rank selection by the Domain expert will give the accurate clustering result. After assigning Ranks to all the properties, Rank Table will be formed. Each row of the table represents the corresponding plant’s associated properties rank values. Since all the properties have been converted into the corresponding numeric numbers, now we can easily apply KMeans algorithm for grouping the plants. So, now K (number of cluster) and N are the inputs, and Algorithm process the input, group the objects based on its similarities into K clusters.
FlowDiagram  Clustering Plants
Select the plants and their important properties for clustering
Assign Rank to each of the Property based on the importance / goodness
Feed the Rank Matrix (N) to the KMeans Algorithm
Compute Center Points (K) from the N object
Determine the distance between each object from centroids
Group the objects based on their minium distance from Centroids
Yes
No

Experiment Results
In the following experiment, there are 12 plants [7] and each plant’s 3 properties are chosen for clustering.
First Step: Plants and their important properties are selected.
Botanical name of the Plant

Dominated Habitat

Character of Leaves

Flowering

Laurus Nobilis

Trees

Evergreen & Spiral

Bisexual (or) Unisexual

Annona Cherimola

Trees (or) Shrubs

Exstipulate

Bisexual

Magnolia Grandiflora

Trees (or)
Shrubs

Stipulate

Bisexual (or) Unisexual

Asarum Canadense

Shrubs

Exstipulate

Bisexual

Piper Nigrum

Herbs

Stipulate (or) Exstipulate

Bisexual (or) Unisexual

Acorus Calamus

Perennial Herbs

Sheathing

Bisexual

Agave Deserti

Sub Shrubs

Parallel Veined

Bisexual

Allium praecox

Biennial (or) Perennial Herbs

Spiral

Bisexual

Aloe Marlothii

Herbs

Succulent

Bisexual

Brodiaea Elegans

Herbs

Sheathing

Bisexual

Lilium sp

Herbs

Spiral

Bisexual

Amaranthus

Annual (or) Perennial Herbs

Spiral

Unisexual

Second Step: Rank is assigned against each property.
Below tables shows the properties Rank.
Rank Table  Habitat Properties
Habitat

Rank

Tree

1

Tree (or) Shrubs

2

Shrubs

3

Sub Shrubs

4

Perennial Herbs

5

Herbs

6

Biennial (or) Perennial Herbs

7

Annual (or) Perennial Herbs

8

Rank Table – leaves Character
Leaves Character

Rank

Evergreen & Spiral

1

Spiral

2

Stipulate

3

Stipulate (or) Exstipulate

4

Exstipulate

5

Succulent

6

Parallel Veined

7

Sheathing

8

Rank Table – Flowering Types
Flowering

Rank

Bisexual

1

Bisexual (or) Unisexual

2

Unisexual

3

Plant Table with Ranks
Botanical name of the Plant

Dominated Habitat

Leaves

Flowering

Laurus Nobilis

1

1

2

Annona Cherimola

2

5

1

Magnolia Grandiflora

2

3

2

Asarum Canadense

3

5

1

Piper Nigrum

6

4

2

Acorus Calamus

5

8

1

Agave Deserti

4

7

1

Allium praecox

7

2

1

Aloe Marlothii

6

6

1

Brodiaea Elegans

6

8

1

Lilium sp

6

2

1

Amaranthus

8

2

3

Clustered Plants
Output When K = 3
Botanical name of the Plant

Dominated Habitat

Dominated Leaves Types

Flowering

Cluster

Laurus Nobilis

Trees

Evergreen & Spiral

Bisexual (or) Unisexual

1

Annona Cherimola

Trees (or) Shrubs

Exstipulate

Bisexual

2

Magnolia Grandiflora

Trees (or) Shrubs

Stipulate

Bisexual (or) Unisexual

1

Asarum Canadense

Shrubs

Exstipulate

Bisexual

2

Piper Nigrum

Herbs

Stipulate (or) Exstipulate

Bisexual (or) Unisexual

3

Acorus Calamus

Perennial Herbs

Sheathing

Bisexual

2

Agave Deserti

Sub Shrubs

Parallel Veined

Bisexual

2

Allium praecox

Biennial (or) Perennial Herbs

Spiral

Bisexual

3

Aloe Marlothii

Herbs

Succulent

Bisexual

2

Brodiaea Elegans

Herbs

Sheathing

Bisexual

2

Lilium sp

Herbs

Spiral

Bisexual

3

Amaranthus

Annual (or) Perennial Herbs

Spiral

Unisexual

3

Output when K = 4
Botanical name of the Plant

Dominated Habitat

Dominated Leaves Types

Flowering

Clusters

Laurus Nobilis

Trees

Evergreen & Spiral

Bisexual (or) Unisexual

1

Annona Cherimola

Trees (or) Shrubs

Exstipulate

Bisexual

2

Magnolia Grandiflora

Trees (or) Shrubs

Stipulate

Bisexual (or) Unisexual

1

Asarum Canadense

Shrubs

Exstipulate

Bisexual

2

Piper Nigrum

Herbs

Stipulate (or) Exstipulate

Bisexual (or) Unisexual

3

Acorus Calamus

Perennial Herbs

Sheathing

Bisexual

4

Agave Deserti

Sub Shrubs

Parallel Veined

Bisexual

4

Allium praecox

Biennial (or) Perennial Herbs

Spiral

Bisexual

3

Aloe Marlothii

Herbs

Succulent

Bisexual

4

Brodiaea Elegans

Herbs

Sheathing

Bisexual

4

Lilium sp

Herbs

Spiral

Bisexual

3

Amaranthus

Annual (or) Perennial Herbs

Spiral

Unisexual

3


Conclusions
In this paper, we have proposed a way of clustering plants using kmeans approach. Similar to KMeans algorithm, other clustering algorithms also can be used on the Rank Matrix to obtain various clustering results. For example, if we need the arbitrary number of cluster results, DBSCAN [3] algorithm can be applied on the Rank Matrix. A time consuming work using this approach is creating the Rank Matrix. It is always better, if we have the preprocessed Rank Matrix information for minimizing the overhead and improving the result accuracy. So, some good Rank Generation method needs to be developed to improve this approach.
References

R. C. Dubes and A. K. Jain. Algorithms for Clustering Data. Prentice Hall, 1988.

K. Alsabti, S. Ranka, and V. Singh, "An Efficient kmeans Clustering Algorithm," Proc. First Workshop High Performance Data Mining, Mar. 1998.

Ester M., Kriegel H.P., Sander J., and Xu X. (1996) “A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise” In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland: Oregon, pp. 226231.

Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, 2006.

G. Karypis, E.H. Han, and V. Kumar. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8):68–75, 1999.

K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.

Michael G. Simpson, “Plant Systematics”, 2006.

Kardi Teknomo,"KMean Clustering Tutorials". available at : http://people.revoledu.com/kardi/tutorial/kMean/index.html

R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, Chile, pages 144–155, 1994.

L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
