Hacker's guide
TEItools is extremely poorly documented. Here are some info
and tips. Hopefully this part will eventually grow into a decent
documentation for TEItools. Any takers?
Invocation
The user interface to TEItools is a single script named
sgml2any. It should be invoked via some other
name like linuxdoc2tei or
tei2tex, by the means of a symbolic link or just
a copy. The syntax is asfollows:
scriptname source_file style_options
source_file is the name of the source SGML file to
be processed.
Each style_option looks like
-style name or like -style
name=value.
Each name modifies the output of TEItools
in one way or another. The order of -style options is
significant. Single -style might list several comma
separated styles. More on styles below.
You will probably create your own extensions to the TEItools
skeleton like I do. These can include:
- DTDs you use locally;
- Local styles;
- Local scripts and libraries.
To facilitate such local development, the TEItools home
directory could have a site subdirectory - possibly
symlinked to some directory outside TEItools tree for easy
upgrades. It should have the same structure main TEItools
tree have. Files are searched in site tree first and
then in the main tree.
Front-ends and back-ends
sgml2any determines the frontend and backend to
use just by its name: if it is started as
tei2rtf, then the frontend is taken to be
tei, and the backend is rtf.
Each frontend is placed in a subdirectory under
$SGML_HOME. In turn, each backend lives in a subdirectory under
the frontend's directory. The backend directory contains
a CoST script named script plus any supplementary files it
needs.The backend directory also contains a styles directory
which in turn contains style files.
Each backend script processes document presentation
independently, which distinguishes TEItools from DSSSL-based
software. (Please, don't tell me I'm doing it in wrong way,
you won't be first to do that, and even not in first dozen
:-)). I'm trying to be consistent across scripts as far as it
makes sense, so please report all divergences as bugs.
Style files are used by just appending them to the main script,
so they can rename existing Tcl procedures and substitute
their own equivalents. Styles for say tei to
rtf conversion are looked up in $SGML_HOME/tei/rtf/styles directory.
If style has the name=val form, then a global Tcl
variable TEItools_name_value will be set to
val. It could then be checked from within the main
scripts or styles.
If style has the name form, then a global Tcl
variable TEItools_name_in_use will be set to some
value. In this way styles could check for presence of other
styles.
To understand the scripts, you have to know Tcl,
CoST, and
for RTF backend, RATFINK.
Localization
The TEItools are targeted to be easily internationalized.
They works pretty well in Russian, since that's what I'm
using daily. English, Russian, French, Finnish and Czech
localization files are included in the distribution. You
should clone one of them for creating new language support
data.
Everything locale-dependent is in
$SGML_HOME/lib/locale.LANG.tcl
files. LANG is defined this way:
- If top-level <tei.2> element carries lang
attribute, its value is taken. Otherwise,
- If SGML_LANG environment variable is
defined, its value is taken. Otherwise,
- If LANG environment variable is defined,
its value is taken. Otherwise,
- en is taken.
Note that LANG value is converted to lower case to form
file name.
Locale files specify the following stuff.
First, there should be localize
substitution[2], which defines some wordings in your
native language, such as Table of contents
etc. Second, there should be functions whose actions are
localization-dependent. Currently, they are the following
(parentheses show which backend uses the function).
- openLang { lang } (LaTeX)
- Appropriate \selectlanguage{}
command. If your TeX installation doesn't use the
babel package, return empty string. Example:
\selectlanguage{english}.
- closeLang{} (LaTeX)
- Closing \selectlanguage{}.
- tabTitle { number } (RTF)
- Table caption. number gives table
number. Example: "Table ${number}:"
- figTitle { number } (RTF)
- Figure caption. number gives figure
number.
- appendixPrefix { number } (RTF)
- Appendix name letter generator. For example, if
your appendices are numbered with "A", "B" and so on,
then [appendixPrefix 0] should return "A" etc.
Styles
Note that each style applies to one backend only.
HTML
- number_heads
- Adds numbering to
divisions' headings.
- split
- Splits output by
first-level divisions.
- splitLevel=NN
- Splits output by
NN-level divisions.
- toc_ref
- Adds link to table of
contents in the end of each first-level
division.
- toc_depth=NN
- Sets TOC
depth to NN. Defaults to 3.
- forced_toc
- Append generated
table of contents.
- framed_toc
- Generate table of
contents in separate frame.
- frames=X:Y
- Gives left to
right frames ratio when using framed_toc. Default
is 1:4.
- indent_para
- Indents first
paragraph line.
- CSS=URL
- Uses cascading
stylesheet specified with URL.
- no_signature
- Does not print
Converted by TEItools at the bottom of
HTML page.
TeX
- 11pt,12pt
- Changes size of
main font from default 10pt to 11 or 12pt.
- scale=NN
- Sets
magnification factor to NN.
- page_headers
- Changes page style to
headers with section name and page number,
separated by rule. By default, pages have page
number at bottom.
- running_title
- Creates "running
title" instead of usual title page.
- linuxdoc_title
- Creates
linuxdoc-sgml-alike title page. Requires
running_title style.
- no_title
- Disables title page
generation.
- no_front_matter
- Disables
front matter pages numbering by roman numerals.
- ps_fonts
- Switches to using
PostScript fonts as opposed to METAFONT ones.
- pdf
- Resulting .tex file
is suitable for pdflatex processing for PDF
generation. Requires ps_fonts style.
- skip=NN
- NN here is a
value of baseline stretch. Useful values are 1.5
and 2.
- parskip
- Adds 1ex of vertical
space between paragraphs.
- numdepth=N
- N
here is a value of maximum level of <div>s
numbering. It is equal to 3 by default.
- openpage=gi
- gi here
should be one of div1, div2 etc. Each
gi will start from new page.
- firstpage=N
- N is
first text page (that is, one right after title
page) number.
- landscape
- Switches to landscape
mode. You have to specify -t landscape to
dvips to obtain landscape PostScript output.
- float_pages
- Forces floats
(illustrations) to be placed on separate pages.
- twocolumn
- Typesets text in two
columns.
- two_side
- Typesets text for
printing on two-side printer.
- small_tables
- Reduces tables'
font.
- fancyvrb
- Encloses examples (<eg>
elements) into frame.
- raggedright
- Makes document body
text flushed to the left.
RTF
- linuxdoc_title
- See above.
- dont_number_heads
- Removes
numbering from divs heads.
- external_figs
- Use separate BMP
files instead of in-lining bitmaps into RTF
code. StarOffice seems to require former, MS
Word seems to require later,
ApplixWords seems not to be
satisfied with either. sigh
RTF specifics
The RTF backend now supports user-defined styles. For an
example see
$SGML_HOME/site/tei/rtf/styles/userstyle.
To make page references actual (e.g. in table of
contents), load resulting RTF file into MS Word and press
Ctrl-A F9 (select all and
recalculate fields).