Cluster Analysis of Credit Cards
Authors: Anushka Namjoshi, Siddharth Singh
Introduction:
Cluster analysis is a statistical technique used to group a set of objects or data points into clusters based on their similarities. The goal is to categorize data in such a way that items in the same cluster are more identical than to those in other clusters. This can help uncover patterns, structures, or relationships within the data.
Cluster analysis attempts to maximise the homogeneity of objects within the clusters while also maximize the heterogeneity between the clusters.
Objective:
The main objective of this report is to find out if there are any significant differences in the various characteristics among the clusters using Analysis of Variance (ANOVA). By understanding these differences and therefore the relationship between cluster membership and the mentioned attributes, a conclusion can be derived about the characteristics that differentiate the clusters.
Characteristics for the Analysis:
Cluster analysis used the below mentioned 10 characteristics as the primary variables:
- Cash Back
- Emergency financial access
- Credit limit
- Free airport lounge access
- CIBIL benefits
- Buy on credit
- Convenience of Transaction
- Improve spending habits
- Foreign Transaction Convenience
- Discount and offers
Data Collection:
This survey was conducted using Google Forms to evaluate perceptions of Credit Card based on the above mentioned 10 essential characteristics.
A K-means clustering algorithm was used by choosing appropriate number of clusters. Here, it is two.
Data Preprocessing: Standardize or normalize the data to ensure that features contribute equally to the distance calculations.
Data Analysis:
The data analysis from IBM SPSS software.
- Cluster Analysis Results:
Cluster Membership:
Case Number | Cluster | Distance |
1 | 1 | 2.654 |
2 | 1 | 2.321 |
3 | 2 | 2.120 |
4 | 1 | 1.850 |
5 | 1 | 1.393 |
6 | 2 | .997 |
7 | 1 | 2.070 |
8 | 1 | 1.622 |
9 | 1 | 1.784 |
10 | 2 | 2.828 |
11 | 1 | 1.477 |
12 | 1 | 2.037 |
13 | 1 | 1.774 |
14 | 2 | 1.377 |
15 | 1 | 2.103 |
16 | 1 | 1.850 |
17 | 1 | 1.745 |
18 | 1 | 1.465 |
19 | 1 | 1.803 |
20 | 2 | 1.948 |
21 | 2 | 2.235 |
22 | 1 | 1.755 |
23 | 1 | 2.020 |
24 | 1 | 1.850 |
25 | 1 | 1.784 |
26 | 1 | 1.643 |
27 | 1 | 1.850 |
28 | 1 | 2.053 |
29 | 2 | .997 |
30 | 2 | .997 |
31 | 2 | 1.377 |
32 | 1 | 1.841 |
33 | 2 | .997 |
34 | 1 | 2.269 |
35 | 2 | 1.701 |
36 | 1 | 1.976 |
37 | 1 | 1.850 |
38 | 2 | 2.568 |
39 | 2 | 2.235 |
40 | 1 | 1.405 |
41 | 2 | 2.235 |
42 | 2 | 2.235 |
43 | 2 | 2.587 |
44 | 2 | 1.759 |
45 | 2 | 2.120 |
46 | 2 | .997 |
47 | 2 | 1.138 |
48 | 1 | 1.950 |
49 | 1 | 2.078 |
Number of Cases in each Cluster | ||
Cluster | 1 | 29.000 |
2 | 20.000 | |
Valid | 49.000 | |
Missing | .000 |
Iteration Historya | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Iteration | Change in Cluster Centers | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | 2.539 | 1.920 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | .233 | .263 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | .110 | .106 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | .099 | .103 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | .083 | .094 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | .144 | .196 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | .083 | .126 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | .000 | .000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
a.Convergence achieved due to no or small change in cluster centers. The maximum absolute coordinate change for any center is .000. The current iteration is 8. The minimum distance between initial centers is 4.472.
Interpretation: 1. Cluster Composition: – Cluster 1 (29 cases): This group generally rated items higher, with average ratings typically between 2.38 and 2.76. This suggests that individuals in this cluster are more positive in their evaluations. – Cluster 2 (20 cases): This group exhibited lower average ratings, mostly falling between 1.55 and 1.90. These individuals appear to be less satisfied or more critical in their assessments. 2. Convergence and Stability: – The analysis reached convergence by the 8th iteration, indicating that the cluster centers stabilized and that the classification of cases into clusters is robust. The minimal changes in cluster centers suggest that the groups are well-defined and consistent. 3. Significant Differences: – The ANOVA results highlight statistically significant differences between the clusters for most rating items (p < .001). This indicates that the two clusters represent distinct perspectives or experiences related to the items evaluated. The significance reinforces the idea that these clusters are meaningful and not due to random variation. 4. Distance Between Clusters: – The distance of approximately 2.536 between the final cluster centers indicates a substantial separation between the groups. This separation suggests that the characteristics of each cluster are quite different, further emphasizing the need to tailor strategies or interventions to address the specific needs or perceptions of each group.
Practical Implications: Targeted Strategies: Understanding that Cluster 1 consists of more positive evaluations and Cluster 2 represents a more critical viewpoint allows for tailored approaches. For instance, Cluster 2 might benefit from additional support or improvements in areas they rated lower. – Further Exploration: It may be beneficial to investigate the specific items or aspects that contribute to the differing ratings between clusters. This could provide insights into what drives satisfaction or dissatisfaction. – Communication and Engagement: Different communication strategies may be required for each cluster. Engaging with the more critical group (Cluster 2) may involve addressing concerns directly, while the more positive group (Cluster 1) could be encouraged to share their positive experiences more widely. In summary, the clustering analysis reveals two distinct groups with differing evaluations, which can inform targeted strategies for improvement and engagement based on the unique characteristics of each cluster. Conclusion: The K-means clustering analysis identified two distinct groups among the 49 cases based on item ratings. 1. Distinct Clusters: The analysis revealed two clusters, with Cluster 1 showing generally higher ratings compared to Cluster 2. This indicates a significant difference in how the two groups perceive the item. 2. Statistical Significance: ANOVA results indicated significant differences between the clusters for most rating items, confirming that the clusters are not only different but also that these differences are statistically meaningful. 3. Practical Implications: Understanding these clusters can help tailor strategies to address the varying perceptions of the item, potentially guiding marketing or product improvement efforts based on the preferences of each group. Further investigation into the characteristics of each cluster could provide insights into the underlying reasons for these differences in ratings. |