.\" Hey, EMACS: -*- nroff -*- .\" First parameter, NAME, should be all caps .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection .\" other parameters are allowed: see man(7), man(1) .TH SWATH 1 "January 2008" .\" Please adjust this date whenever revising the manpage. .\" .\" Some roff macros, for reference: .\" .nh disable hyphenation .\" .hy enable hyphenation .\" .ad l left justify .\" .ad b justify to both left and right margins .\" .nf disable filling .\" .fi enable filling .\" .br insert line break .\" .sp insert n+1 empty lines .\" for manpage-specific macros, see man(7) .SH NAME swath \- General-purpose Thai word segmentation utility .SH SYNOPSIS .B swath [options] \<\ \fIinfile\fP\ \>\ \fIoutfile\fP .br .SH DESCRIPTION Thai script has no word delimitor. Applications need to recognize word boundaries before they can do useful things with Thai text, such as line wrapping. .sp Swath provides word analysis filter to insert word delimitors into a given text stream. It reads text from standard input, analyzes it for word boundaries by consulting a Thai word list, and outputs to standard output the same text with the predefined word delimitors inserted. .sp Currently, it can read plain text, HTML, RTF, LaTeX and Lambda (Unicode version of LaTeX with Omega typesetter kernel) documents and insert common word delimitors for each format (pipe `|' for plain text). But user can always override this with a preferred delimitor. .SH OPTIONS .TP .B \-b [\fIdelimitor\fP] Define the string to be inserted as word delimitor in the output text. .TP .B \-d [\fIdict-path\fP] Specify alternative dictionary location. \fIdict-path\fP must be either a directory containing the swath dictionary file `swathdic.tri', or a path to the dictionary file itself. The dictionary file must be a trie file prepared using \fBtrietool\fP(1) utility from the libdatrie package. .sp If this option is given, \fBswath\fP will override normal dictionary search and will exit if the given dictionary cannot be found. Otherwise, if \fBSWATHDICT\fP environment is set, it will try to open dictionary from the location specified by its value. Otherwise, it will try the current working directory, and finally the usual installed location. .TP .B \-f [\fIformat\fP] Specify format of the input. Possible formats are: html, rtf, latex, lambda. .TP .B \-m [\fIscheme\fP] Choose word matching scheme when analyzing word boundaries. Possible schemes are `long' (for longest or greedy matching) and `max' (for maximal matching, with least words preferred). Maximal matching is the default value. .TP .B \-u \fIinput-enc\fP,\fIoutput-enc\fP Specify encodings of the input and the output. \fIinput-enc\fP and \fIoutput-enc\fP can be one of 'u' (for UTF-8 encoding) and 't' (for TIS-620 encoding). Swath will convert the character encoding as necessary. If this option is omitted, TIS-620 encodings on both input and output are assumed. .TP .B \-v, \-\-verbose Turn on verbose mode. .TP .B \-help, \-\-help Show help. .SH ENVIRONMENT VARIABLES .IP SWATHDICT If specified, \fBswath\fP will search for dictionary from this location before the usual places (current working directory and usual installed directory, respectively). This value is overridden by \fB\-d\fP option. .SH EXAMPLES For LaTeX (to be used with babel-thai package): .sp $ \fBswath\fP \-f latex < thaifile.tex > thaifile.ttex .br $ \fBlatex\fP thaifile.ttex .sp For HTML (to provide web pages to web browsers that cannot wrap Thai lines properly, but support the tag): .sp $ \fBswath\fP \-f html < myweb.html > myweb-wbr.html .sp To preprocess a Thai UTF-8 encoded LaTeX file for babel-thai with tis620 inputenc: .sp $ \fBswath\fP \-f latex \-u u,t < thaifile.tex > thaifile.ttex .br $ \fBlatex\fP thaifile.ttex .sp This is equivalent to filtering with \fBiconv(1)\fP: .sp $ \fBiconv\fP \-f UTF-8 \-t TIS-620 thaifile.tex | \fBswath\fP \-f latex > thaifile.ttex .br $ \fBlatex\fP thaifile.ttex .sp To use longest matching scheme with LaTeX document: .sp $ \fBswath\fP \-f latex \-m long < thaifile.tex > thaifile.ttex .br $ \fBlatex\fP thaifile.ttex .sp To use an alternative dictionary from libthai: .sp $ \fBswath\fP \-f latex \-d /usr/share/libthai/thbrk.tri < thaifile.tex > thaifile.ttex .SH AUTHOR This manual page was written by Theppitak Karoonboonyanan .