MSRLM: a scalable language modeling toolkit
MSRLM: a scalable language modeling toolkitPatrick Nguyen, Jianfeng Gao, and Milind Mahajan{panguyen,jfgao,milindm}@microsoft.comNovember 2007Technical ReportMSR-TR-2007-144Microsoft ResearchMicrosoft CorporationOne Microsoft WayRedmond, WA 98052http://www.research.microsoft.comAbstractMSRLM is the release of our internal language modeling tool chain used in Microsoft Re-search. It was used in our submission for NIST MT 2006. The main difference with otherfreely available tools is that it was designed to scale to large amounts of data. We success-fully built a language model on high end hardware on 40 billion words of web data within8 hours. It only supports a minimal set of features. Large gigaword language models maybe consumed in a first pass machine translation decoding without further processing. Thisdocument describes the implementation and usage of the tools summarily.It is our stated goal and hope that this release will be useful to the scientific community.The tool may not be used in a commercial product, or to build models used in a commercialproduct, or in for any commercial purpose. In addition, we require that you kindly cite thistechnical report when publishing results derived with this language model tool chain.This describes the LM tool which is available as:http://research.microsoft.com/research/downloads/details/78e26f9c-fc9a-44bb-80a7-69324c62df8c/details.aspx1Contents1 General Description 21.1 Introduction . . . . . . . . . ...