Advancing Data Clustering via Projective Clustering Ensembles
- clustering ensembles and projective clustering. Specifically, PCE enables clustering ensemble methods to handle ensembles composed by projective clustering solutions. PCE has been formalized as an optimization problem with either a two-objective or a single-objective function. Two-objective PCE has shown to generally produce more accurate clustering results than its single-objective counterpart, although it can handle the object-based and feature-based cluster representations only independently of one other. Moreover, both the early formulations of PCE do not follow any of the standard approaches of clustering ensembles, namely instance-based, cluster-based, and hybrid.
In this paper, we propose an alternative formulation to the PCE problem which overcomes the above issues. We investigate the drawbacks of the early formulations of PCE and define a new single-objective formulation of the problem. This formulation is capable of treating the object- and feature-based cluster representations as a whole, essentially tying them in a distance computation between a projective clustering solution and a given ensemble. We propose two cluster-based algorithms for computing approximations to the proposed PCE formulation, which have the common merit of conforming to one of the standard approaches of clustering ensembles. Experiments on benchmark datasets have shown the significance of our PCE formulation, as both the proposed heuristics outperform existing PCE methods.
Projective Clustering Ensembles (PCE) are a very recent advance in data clustering research which combines the two powerful tools of
F. Gullo, C. Domeniconi, A. Tagarelli. Advancing Data Clustering via Projective Clustering Ensembles. ACM International Conference on Management of Data (SIGMOD’11), pp. 733-744. Athens, Greece, June 12-16, 2011.
This paper has been awarded of the SIGMOD11 Repeatability/Workability Evaluation Test. SIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference, by reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability).