![]() |
ICA 2007 |
||||||||
London, UK 9 - 12 September 2007 |
|||||||||
![]() |
|||||||||
Paper No: 120Text Clustering on Latent Thematic Spaces: Variants, Strengths and WeaknessesAuthor(s): Xavier Sevillano, Germán Cobo, Francesc Alías, Joan Claudi SocoróAbstractDeriving a thematically meaningful partition of an unlabeled document corpus is a challenging task due to several issues, such as the difficulty of determining a priori the optimal document indexing technique to apply. This work presents an empirical comparison between several latent thematic generative models applied to the text clustering problem. As results demonstrate, document representations on latent thematic spaces can lead to improved clustering, but the superiority of none of these models can be guaranteed a priori. So as to overcome this situation, we propose creating consensus clusterings upon several document representations. Experiments conducted on subsets of two standard text corpora evaluate several clustering strategies applied on latent thematic spaces and highlight the appropriateness of our proposal. |
|
||||||||
| Last Updated: 14-Aug-2007 | Please read our disclaimer | ||||||||