discuss: Proposal: CMU Sphinx HOWTO

Previous by date:	26 Oct 2003 14:16:15 -0000 Re: How to create a HOWTO on Red Hat Linux 9?, David Lawyer
Next by date:	26 Oct 2003 14:16:15 -0000 Re: general, Machtelt Garrels
Previous in thread:
Next in thread:	26 Oct 2003 14:16:15 -0000 Re: Proposal: CMU Sphinx HOWTO, Guylhem Aznar

Subject: Proposal: CMU Sphinx HOWTO
From: "Ivan A. Uemlianin" ####@####.####
Date: 26 Oct 2003 14:16:15 -0000
Message-Id: <3F9BDBAA.80502@jurakm.com>

Dear All

I'm starting to write up my experiences with CMU Sphinx, a suite of free
software for speech recognition.  I intend to write them up in docbook.
   The documentation available is poor.

Would the LDP would be interested in cataloging or hosting it (and maybe
reviewing my early efforts, as described in the LDP Author Guide).

I include a first sketch of the proposed document below (as it's quite
short).  It's in emacs outline format.  Each section has a sentence or
two describing what will be in it.

HOWTO or not, I'll be writing this (e.g. as a SphinxTrain Companion) so
any feedback at all would be most appreciated.  I've checked out the LDP
author guide and its links, and I'm dabbling in Docbook.  I think I may
write the first few drafts in something simpler though.

Best wishes

Ivan Uemlianin


<howto>

A SphinxTrain HOWTO

* Summary
This HOWTO describes the SphinxTrain acoustic model building software
from Carnegie-Mellon University (CMU).  Its aim is (a) to provide some
easy-to-follow documentation for the software, and (b) to provide a
brief introduction to some of the technicalities of automatic speech
recognition (ASR).  It is not a general introduction to speech
recognition on Linux (cf. the Speech Recognition HOWTO).

* Legal Notices
** copyright
** GNU Free Documentation License (GFDL)
** Disclaimer
** Trademarks

* Preface
** How this document came about
** Aims of this document
*** supplement to tinydoc
*** for use with sphinx2 (use with sphinx3 in later versions, see TODO)
** Acknowledgements
** Contact the author
** TODOs
*** cover use with Sphinx3
*** sphinx2 HOWTO
*** sphinx3 HOWTO

* Introduction
** What is SphinxTrain?

* Preparation
** Getting SphinxTrain
The source should be downloaded from Sourceforge [ref].  There are also
some bugs which need to be addressed before SphinxTrain (ST) is compiled
and installed.
*** The source
*** Configuration and debugging
*** Compilation and installation
** Getting Ancillary Software
Building acoustic models (AMs) in general, and ST in particular, has
various prerequisite tasks and software.  They are dealt with in this
section.
*** Pronunciation Dictionaries
**** A couple of examples (CMUDict, BEEP)
**** A note on phonetic alphabets
**** A note on Pronunciation Dictionary design
*** Other requirements
For example, ST requires perl and csh.
** Preparing the Training Data
*** What kind of thing is training data for SphinxTrain?
Audio files with accompanying orthographic transcriptions.
*** Getting it
Outside scope of this doc, but some notes anyway.  List some corpora,
repositories.
*** Preprocessing
There is a fair bit of preprocessing to do on the training data before
it can be used by ST.  This section goes through the preprocessing
step-by-step.
**** Step-by-step
**** Troubleshooting

* Running SphinxTrain
** Setting Up
Running setup_SphinxTrain
** Running RunAll.pl
The discussion in this section aims to give the user enough information
(a) to have a practical understanding of what is going on, and (b) to
have some idea of what to do if things go wrong.
*** RunAll.pl overview
*** Scripts called by RunAll.pl
**** 00.verify/verify_all.pl
**** 01.vector_quantize/slave.VQ.pl
**** 02.ci_schmm/slave_convg.pl
**** 03.makeuntiedmdef/make_untied_mdef.pl
**** 03.make_mdef/make_mdef.pl
**** 03.make_mdef/make_alltri_mdef.pl
**** 04.cd_schmm_untied/slave_convg.pl
**** 05.buildtrees/make_questions.pl
**** 05.buildtrees/slave.treebuilder.pl
**** 06.prunetree/slave.state-tie-er.pl
**** 07.cd-schmm/slave_convg.pl
**** 08.deleted-interpolation/deleted_interpolation.pl
**** 09.make_s2_models/make_s2_models.pl

* Technical Notes
** Stages in Building an Acoustic Model
The discussion in this section can be interpreted as (a) going over
similar ground to `Scripts called by Runall.pl' but at a more technical
level, and/or (b) introducing some central technical concepts involved
in building AMs for ASR.
** Limitations of SphinxTrain
This section discusses some perceived (by me) limitations of ST, and
explores some ways of working around or overcoming them.
*** The `front end'
*** The requirement for untimed orthographic transcription in the
training data
** Things which aren't limitations of SphinxTrain but which might seem to be
The things dealt with in this section might look like they're
limitations of ST, but they're more to do with the underlying problem of
building AMs.
*** The requirement for audio files of less than 60 secs

* Glossary
This section lists comprehensively every technical term used in the rest
of the HOWTO, with appropriate definitions, links, and so on.

* References

Cook, S. (2002).  Speech Recognition HOWTO.  [todo: LDP url]

* Index of `Help Wanted's
Throughout the HOWTO There will be `Help Wanted' notes for bits of
information I don't have but which would be useful in the HOWTO (e.g.
Pronunciation dictionaries for languages other than English; Acoustic
data repositories other than those listed, etc.).  This section will
list them all.

</howto>

Previous by date:	26 Oct 2003 14:16:15 -0000 Re: How to create a HOWTO on Red Hat Linux 9?, David Lawyer
Next by date:	26 Oct 2003 14:16:15 -0000 Re: general, Machtelt Garrels
Previous in thread:
Next in thread:	26 Oct 2003 14:16:15 -0000 Re: Proposal: CMU Sphinx HOWTO, Guylhem Aznar