Previous Next

Hacker's guide

TEItools is extremely poorly documented. Here are some info and tips. Hopefully this part will eventually grow into a decent documentation for TEItools. Any takers?

Invocation

The user interface to TEItools is a single script named sgml2any. It should be invoked via some other name like linuxdoc2tei or tei2tex, by the means of a symbolic link or just a copy. The syntax is asfollows:

scriptname source_file style_options

source_file is the name of the source SGML file to be processed.

Each style_option looks like

-style name or like -style name=value.

Each name modifies the output of TEItools in one way or another. The order of -style options is significant. Single -style might list several comma separated styles. More on styles below.

You will probably create your own extensions to the TEItools skeleton like I do. These can include:

To facilitate such local development, the TEItools home directory could have a site subdirectory - possibly symlinked to some directory outside TEItools tree for easy upgrades. It should have the same structure main TEItools tree have. Files are searched in site tree first and then in the main tree.

Front-ends and back-ends

sgml2any determines the frontend and backend to use just by its name: if it is started as tei2rtf, then the frontend is taken to be tei, and the backend is rtf.

Each frontend is placed in a subdirectory under $SGML_HOME. In turn, each backend lives in a subdirectory under the frontend's directory. The backend directory contains a CoST script named script plus any supplementary files it needs.The backend directory also contains a styles directory which in turn contains style files.

Each backend script processes document presentation independently, which distinguishes TEItools from DSSSL-based software. (Please, don't tell me I'm doing it in wrong way, you won't be first to do that, and even not in first dozen :-)). I'm trying to be consistent across scripts as far as it makes sense, so please report all divergences as bugs.

Style files are used by just appending them to the main script, so they can rename existing Tcl procedures and substitute their own equivalents. Styles for say tei to rtf conversion are looked up in $SGML_HOME/tei/rtf/styles directory.

If style has the name=val form, then a global Tcl variable TEItools_name_value will be set to val. It could then be checked from within the main scripts or styles.

If style has the name form, then a global Tcl variable TEItools_name_in_use will be set to some value. In this way styles could check for presence of other styles.

To understand the scripts, you have to know Tcl, CoST, and for RTF backend, RATFINK.

Localization

The TEItools are targeted to be easily internationalized. They works pretty well in Russian, since that's what I'm using daily. English, Russian, French, Finnish and Czech localization files are included in the distribution. You should clone one of them for creating new language support data.

Everything locale-dependent is in $SGML_HOME/lib/locale.LANG.tcl files. LANG is defined this way:

  1. If top-level <tei.2> element carries lang attribute, its value is taken. Otherwise,
  2. If SGML_LANG environment variable is defined, its value is taken. Otherwise,
  3. If LANG environment variable is defined, its value is taken. Otherwise,
  4. en is taken.

Note that LANG value is converted to lower case to form file name.

Locale files specify the following stuff.

First, there should be localize substitution[2], which defines some wordings in your native language, such as Table of contents etc. Second, there should be functions whose actions are localization-dependent. Currently, they are the following (parentheses show which backend uses the function).

openLang { lang } (LaTeX)
Appropriate \selectlanguage{} command. If your TeX installation doesn't use the babel package, return empty string. Example: \selectlanguage{english}.

closeLang{} (LaTeX)
Closing \selectlanguage{}.

tabTitle { number } (RTF)
Table caption. number gives table number. Example: "Table ${number}:"

figTitle { number } (RTF)
Figure caption. number gives figure number.

appendixPrefix { number } (RTF)
Appendix name letter generator. For example, if your appendices are numbered with "A", "B" and so on, then [appendixPrefix 0] should return "A" etc.

Styles

Note that each style applies to one backend only.

HTML

number_heads
Adds numbering to divisions' headings.

split
Splits output by first-level divisions.

splitLevel=NN
Splits output by NN-level divisions.

toc_ref
Adds link to table of contents in the end of each first-level division.

toc_depth=NN
Sets TOC depth to NN. Defaults to 3.

forced_toc
Append generated table of contents.

framed_toc
Generate table of contents in separate frame.

frames=X:Y
Gives left to right frames ratio when using framed_toc. Default is 1:4.

indent_para
Indents first paragraph line.

CSS=URL
Uses cascading stylesheet specified with URL.

no_signature
Does not print Converted by TEItools at the bottom of HTML page.

TeX

11pt,12pt
Changes size of main font from default 10pt to 11 or 12pt.

scale=NN
Sets magnification factor to NN.

page_headers
Changes page style to headers with section name and page number, separated by rule. By default, pages have page number at bottom.

running_title
Creates "running title" instead of usual title page.

linuxdoc_title
Creates linuxdoc-sgml-alike title page. Requires running_title style.

no_title
Disables title page generation.

no_front_matter
Disables front matter pages numbering by roman numerals.

ps_fonts
Switches to using PostScript fonts as opposed to METAFONT ones.

pdf
Resulting .tex file is suitable for pdflatex processing for PDF generation. Requires ps_fonts style.

skip=NN
NN here is a value of baseline stretch. Useful values are 1.5 and 2.

parskip
Adds 1ex of vertical space between paragraphs.

numdepth=N
N here is a value of maximum level of <div>s numbering. It is equal to 3 by default.

openpage=gi
gi here should be one of div1, div2 etc. Each gi will start from new page.

firstpage=N
N is first text page (that is, one right after title page) number.

landscape
Switches to landscape mode. You have to specify -t landscape to dvips to obtain landscape PostScript output.

float_pages
Forces floats (illustrations) to be placed on separate pages.

twocolumn
Typesets text in two columns.

two_side
Typesets text for printing on two-side printer.

small_tables
Reduces tables' font.

fancyvrb
Encloses examples (<eg> elements) into frame.

raggedright
Makes document body text flushed to the left.

RTF

linuxdoc_title
See above.

dont_number_heads
Removes numbering from divs heads.

external_figs
Use separate BMP files instead of in-lining bitmaps into RTF code. StarOffice seems to require former, MS Word seems to require later, ApplixWords seems not to be satisfied with either. sigh

RTF specifics

The RTF backend now supports user-defined styles. For an example see $SGML_HOME/site/tei/rtf/styles/userstyle.

To make page references actual (e.g. in table of contents), load resulting RTF file into MS Word and press Ctrl-A F9 (select all and recalculate fields).


Previous Next

Last modified: Чтв Фев 19 18:09:49 MSK 2004
Produced by TEItools