



Step 1: Either pick random centers (3 of them c_1, c_2, c_3), or split up your data into 3 random clusters. though you could use all 4 and just have 4D points instead. Personally, I'd view your data as 2D just the (x,y) that are Sales and Fuel Cost. In this case my cells are listed in the image here: This is what the formula looks like: =SQRT((B28-$B$52)^2+(C28-$C$52)^2) keeping in mind that each cell represents where your data is contained. If more information is needed, please let me know.Įdit: I've found some more solutions and thought I should add in the event anyone runs into the same issues.ġ) I calculated the Euclidean distance using this Excel formula mentioned on this video: However, they seem to provide the end result, not the procedures required to get there.Īny help is appreciated. The most progress I've made was coming across a variety of data mining software, such as WEKA, Orange, various Excel add-ons such as XLMiner, etc. This seems to be the best example I've come across, but even then, I'm still a bit lost due to my example being slightly different than the one introduced: I believe that I have to calculate the distance between two data items using the Euclidean distance algorithm, but does that mean the distance between z-score sales and z-score fuel, or what? This is why I am lost, even after I've read through about a dozen powerpoints and watched multiple videos.
How to use xlminer in excel for k clustering how to#
I've watched and searched for many examples on how to accomplish this, step by step, but I haven't had any success that allows me to understand.īasically what I am suppose to do is show a scatter plot at each adjustment to the centroid. I am having difficulty applying these procedures to my data set. I have to keep adjusting the centroid locations until no more movements are possible, that is the data remains the same after a certain result is met. The mean and standard deviation is given so that I can determine the normalized data.įrom my understanding of k-means clustering, I have to randomly find the centroids, where k = 3. I've been working to understand how to apply k-means clustering to a small set of data for a list of companies.
