CUED Publications database

Scaling the iHMM: Parallelization versus Hadoop

Bratières, S and Van Gael, J and Vlachos, A and Ghahramani, Z (2010) Scaling the iHMM: Parallelization versus Hadoop. Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. pp. 1235-1240.

Full text not available from this repository.


This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM [1], in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging. © 2010 IEEE.

Item Type: Article
Divisions: Div F > Computational and Biological Learning
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:53
Last Modified: 18 Feb 2021 18:56
DOI: 10.1109/CIT.2010.223