Market Research Assignment-2
Cluster Analysis and Anova of Ipad
M2 Batch
Group Members:
1. Samaya Rayaprolu
2. Shatakshi
3. Sneha Yadav
4. Paridhi Gangrade
Case Processing Summary
Cases
Valid
N
Missing
N
Total
N
Percent Percent
Percent
52 98.1 1 1.9 53 100.0
a. Squared Euclidean Distance used
b. Average Linkage (BetweenGroups)
Introduction and Objectives:
This analysis employs cluster analysis and ANOVA to assess variations in characteristics (e.g., engine, comfort, storage) among cases in a dataset of 53 observations. The goal is to identify natural groupings and evaluate the degree of distinct on between these groups across various characteristics. The document uses Average Linkage (Between Groups) as the clustering method and Squared Euclidean Distance to calculate the distance between data points. These choices aim to build clusters based on similarities across multiple features and provide a detailed statistical view of the resulting clusters.
The process includes:
• Cluster Analysis: To examine how data points are grouped and understand the structural similarities within the dataset.
• ANOVA: To measure if significant differences exist between clusters on each characteris c,
providing an understanding of which features most differen ate the groups.
Agglomera on Schedule
Cluster Combined
Stage Cluster First Appears
Cluster 1 Cluster 2 Cluster 1 Cluster 2
Stage
Coefficients
Next Stage
1 3 27 .000 0 0 29
2 36 44 3 20 43 4 10 29 5 17 53 6 22 26 7 4 23 8 4 36 9 46 51 10 2 48 11 24 38 12 6 21 13 12 17 14 37 50 15 22 28 16 1 12 17 41 46 18 14 39 19 9 16 20 41 47 21 34 37 22 20 24 23 1 5 3.000 3.000 3.000 4.000 4.000 4.000 4.500 5.000 5.000 5.000 5.000 5.000 6.000 6.000 6.000 6.500 7.000 7.000 7.667 8.000 8.000 8.000 0 0 8
0 0 22
0 0 27
0 0 13
0 0 15
0 0 8
7 2 25
0 0 17
0 0 33
0 0 22
0 0 25
0 5 16
0 0 21
6 0 32
0 13 23
0 9 20
0 0 38
0 0 24
17 0 31
0 14 32
3 11 31
16 0 29
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 9 18 4 6 13 35 10 30 42 45 1 3 4 8 20 41 22 34 2 4 15 19 7 11 20 22 1 10 9 14 7 20 13 15 42 52 7 49 2 9 7 25 2 7 33 42 8.500 8.500 9.000 9.500 10.000 10.200 10.333 10.875 10.889 11.357 12.000 12.000 12.417 12.619 13.833 15.500 16.000 17.000 17.063 17.267 18.529 18.952 20.333 19 0 38
8 12 30
0 0 40
4 0 37
0 0 41
23 1 37
25 0 33
22 20 36
15 21 36
10 30 43
0 0 40
0 0 39
31 32 39
29 27 48
24 18 43
35 36 42
26 34 47
28 0 46
39 0 44
33 38 45
42 0 45
43 44 47
0 41 51
47 2 13 21.375 45 40 48
48 1 2 26.561 37 47 49
49 1 32 32.435 48 0 50
50 1 40 36.021 49 0 51
51 1 33 47.031 50 46 0
Detailed Cluster Analysis
Case Processing Summary
The Case Processing Summary reveals that out of 53 cases, 52 are valid, and only 1 case is missing, ensuring that almost the enter dataset was considered. With a minimal number of missing cases, the analysis remains statistically robust, supportung reliable interpretations of the clustering and ANOVA results.
Agglomera on Schedule
The Agglomera on Schedule is central to understanding the clustering process, showing how clusters merge stepbystep based on increasing distance between them. Each row includes:
• Stages: Clusters are combined in successive stages. Lower stages involve more similar clusters,
while higher stages combine clusters with progressively greater dissimilari es.
• Coefficients: Coefficients reflect the Euclidean distances at which clusters merge, offering insights
into similarity.
Interpretation of Key Stages:
1. Early Stages: In the ini al steps, clusters merge at low coefficients, reflecting high internal similarity. For instance:
o At Stage 1, clusters 3 and 27 merge with a coefficient of 0.000, meaning these clusters
contain data points that are either iden cal or nearly identical.
o By Stage 5, clusters 17 and 53 combine with a coefficient of 4.000, indica ng moderate
similarity but slightly more variance than earlier stages.
2. Middle Stages: By around Stage 25, the merging coefficient reaches 8.500, implying the clusters being combined are less homogeneous than those in earlier stages. This indicates that the clustering process has begun grouping data points with larger differences, creating broader clusters that are less specific but still retain identifiable similarities.
3. Later Stages: Towards the end, such as Stage 50 with a coefficient of 36.021, the clusters merged are highly dissimilar, showing that most natural clusters have already formed. These later stages
reflect forced combina ons of groups that are more heterogeneous, typical in hierarchical
clustering as the process attempts to merge all data into fewer clusters.
Initial and Final Cluster Centres:
The Cluster Centres sec on provides insight into the values of each variable at the beginning and end of the clustering process, representing the average values for each characteristic in each cluster. This helps understand the underlying structure and distinctions among clusters.
• Initial Cluster Centres: Early cluster centres show the star ng values for each variable, establishing the initial configurations based on characteristics like “engine” and “storage.” For instance, one
cluster may ini ally priori ze “engine” at a higher level, indica ng a subgroup where engine
performance is a stronger focal point.
• Final Cluster Centres: As the clustering process progresses, the centres for each cluster become
more stable. By the final clusters, we see how each group has se led around specific average
values for each characteristic. This stabilization process implies that clusters are welldefined by
these final values, offering a clearer picture of the traits defining each cluster.
Cluster Membership and Distribution:
The Cluster Membership table assigns each case to a specific cluster and provides the distance of each case from its respec ve cluster centre.
With 31 cases in Cluster 1 and 21 in Cluster 2, we observe a slightly uneven distribution of cases:
• Cluster 1 contains a larger number of cases, suggesting it may capture a broader or more common.
set of characteristics.
• Cluster 2 is smaller, which could indicate a more specialized subset of cases with unique traits.
This membership structure allows us to see the grouping dynamics based on similarity in a ributes and helps visualize how the dataset divides into naturally occurring groups. The distances indicate the relative closeness of cases to their respective centres, providing insight into cluster cohesion.
Distances Between Final Cluster Centres:
The Distance between Final Cluster Centres measures the separation between clusters, with a value of 3.476, which is moderately large. This distance suggests:
• Moderate Distinction: A reasonable degree of differentiation on exists between the clusters, indicating.
that each cluster has a unique set of characteriscs that make it dis nct from the other.
• Interpretation: In hierarchical clustering, the distance between clusters can help assess the effectiveness of the clustering process.
Here, 3.476 implies that while some overlap may exist, each cluster retains a specific identity.
ANOVA
Sum of Squares df Mean Square F Sig.
Display Between Groups 11.593 4 2.898 1.879 .130
Within Groups 72.484 47 1.542
Total 84.077 51
Operating System Between Groups 13.627 4 3.407 4.549 .003
Within Groups 35.200 47 .749
Total 48.827 51
Battery Life Between Groups 7.090 4 1.772 2.193 .084
Within Groups 37.987 47 .808
Total 45.077 51
Processor Between Groups 23.524 4 5.881 5.470 .001
Within Groups 50.534 47 1.075
Total 74.058 51
camera Between Groups 16.436 4 4.109 4.366 .004
Within Groups 44.237 47 .941
Total 60.673 51
storage Between Groups 11.679 4 2.920 3.225 .020
Within Groups 42.552 47 .905
Total 54.231 51
Connectivity Between Groups 6.483 4 1.621 1.366 Within Groups 55.748 47 1.186
Total 62.231 51
Apple Pencil Support Between Groups 13.221 4 3.305 3.574 Within Groups 43.471 47 .925
Total 56.692 51
Face ID Between Groups 34.262 4 8.566 7.634 <.001
Within Groups 52.738 47 1.122
Total 87.000 51
Interpretation of ANOVA Results
1. Display
F-Statistic: 1.879 with a significance level (Sig.) of 0.130, indicating no significant difference in
“Display” across clusters. This variable does not appear to contribute meaningfully to group distinctions.
2. Operating System
F-Statistic: 4.549 with a p-value of 0.003, indicating a statistically significant difference in “Operating System” across clusters. This suggests that the operating system attribute varies across clusters, potentially contributing to the differences between groups.
3. Battery Life
F-Statistic: 2.193 with Sig. = 0.084, which is not below the 0.05 threshold, but close. This hints that battery life might show some differentiation across clusters, though not strongly significant.
4. Processor
F-Statistic: 5.470 with Sig. = 0.001, indicating a significant difference in “Processor” across clusters. This attribute likely contributes meaningfully to the distinctions between clusters.
5. Camera
F-Statistic: 4.366 with Sig. = 0.004, which is significant, suggesting that “Camera” varies significantly across clusters. This differentiation likely plays a role in distinguishing between the clusters.
6. Storage
F-Statistic: 3.225 with Sig. = 0.020, showing a significant difference in “Storage” across clusters. This result indicates that storage attributes might contribute to group distinctions.
7. Connectivity
F-Statistic: 1.366 with Sig. = 0.260, indicating no significant difference in “Connectivity” across
clusters. Connectivity does not appear to contribute to cluster distinctions.
8. Apple Pencil Support
F-Statistic: 3.574 with Sig. = 0.013, suggesting a significant difference in “Apple Pencil Support” across clusters, indicating that this feature may play a role in cluster differentiation.
9. Face ID
F-Statistic: 7.634 with Sig. < 0.001, showing a highly significant difference in “Face ID” across clusters.
This attribute strongly differentiates the groups.
Summary of ANOVA Results:
The ANOVA results indicate significant differences across clusters for “Operating System,” “Processor,” “Camera,” “Storage,” “Apple Pencil Support,” and “Face ID,” which are likely key contributors to cluster distinctions. Attributes such as “Display,” “Battery Life,” and “Connectivity” show little to no meaningful differentiation across groups. This suggests that specific features, particularly Face ID and Processor, may be primary factors in defining the clusters, while others play a lesser role.
Display Operating System Battery Life Processor camera storage Connectivity 95% Confidence Interval
Point Estimate
Lower Upper
Eta-squared .138 .000 .266
Epsilon-squared .065 -.085 .203
Omega-squared Fixed-effect .063 -.083 .200
Omega-squared Random-effect .017 -.020 .059
Eta-squared .279 .041 .417
Epsilon-squared .218 -.040 .368
Omega-squared Fixed-effect .214 -.040 .363
Omega-squared Random-effect .064 -.010 .125
Eta-squared .157 .000 .289
Epsilon-squared .086 -.085 .228
Omega-squared Fixed-effect .084 -.083 .225
Omega-squared Random-effect .022 -.020 .068
Eta-squared .318 .069 .454
Epsilon-squared .260 -.010 .407
Omega-squared Fixed-effect .256 -.010 .402
Omega-squared Random-effect .079 -.002 .144
Eta-squared .271 .036 .409
Epsilon-squared .209 -.046 .359
Omega-squared Fixed-effect .206 -.045 .355
Omega-squared Random-effect .061 -.011 .121
Eta-squared .215 .004 .353
Epsilon-squared .149 -.081 .298
Omega-squared Fixed-effect .146 -.079 .294
Omega-squared Random-effect .041 -.019 .094
Eta-squared .104 .000 .221
Epsilon-squared .028 -.085 .155
Omega-squared Fixed-effect .027 -.083 .153
Omega-squared Random-effect .007 -.020 .043
Apple Pencil Support Eta-squared .233 .013 .372
Epsilon-squared .168 -.071 .318
Omega-squared Fixed-effect .165 -.069 .314
Omega-squared Random-effect .047 -.016 .103
Face ID Eta-squared .394 .135 .522
Epsilon-squared .342 .061 .481
Omega-squared Fixed-effect .338 .060 .476
Omega-squared Random-effect .113 .016 .185
a. Eta-squared and Epsilon-squared are estimated based on the fixed-effect model.
b. Negative but less biased estimates are retained, not rounded to zero.
Effect Size Interpretation:
Effect sizes provide additional insights by quantifying the extent to which variability in each characteristic is due to clustering:
– Eta-Squared: This measure ranges from low to moderate for most characteristics, indicating minimal to moderate cluster-based variance. For example, “Face ID” has an eta-squared of 0.394, suggesting it accounts for a substantial portion of cluster variance, whereas attributes like “Connectivity” have lower eta-squared values, indicating they contribute minimally to clustering differences.
– Epsilon-Squared and Omega-Squared: These measures adjust eta-squared to provide less biased estimates. Negative values for some characteristics reflect that these variables do not meaningfully vary by cluster, suggesting limited or non-significant contributions to the clustering structure.
Conclusions and Implications:
This analysis offers a comprehensive examination of clusters within the dataset, with clustering insights and statistical evaluations through ANOVA.
Key takeaways include:
– Cluster Similarity and Distinctiveness: Clusters initially show clear distinctions, as seen in the early
agglomeration stages. However, later stages combine increasingly dissimilar points, resulting in broader and more heterogeneous clusters.
– ANOVA and Effect Sizes: Significant F-values and moderate to high effect sizes for variables like
“Operating System,” “Processor,” “Camera,” “Storage,” “Apple Pencil Support,” and “Face ID” suggest these attributes contribute notably to the cluster distinctions. In contrast, attributes with lower effect sizes, such as “Display” and “Connectivity,” show limited differentiation across clusters.
– Implications for Further Research: Future analyses could consider alternative clustering methods, additional variables, or larger sample sizes to detect more nuanced subgroups, enhancing insights into the factors driving subgroup characteristics.
In conclusion, while clustering helps organize the dataset and offers a structural overview, significant differences are primarily observed for a select set of attributes. This highlights the importance of these variables in defining clusters and suggests the potential benefit of further refinement in clustering approaches or variable selection to yield more detailed subgroup characteristics.