Previous Next

Markup

Read and understand TEI Lite description before going here.

DTD

The practice has shown that TEI Lite usage leads to extra typing, given that TEItools implements some specific subset of entire TEI Lite, and makes some assumptions about markup used. To take some trouble off users, I've created the "TEItools DTD". It is almost a verbatim copy of TEI Lite with minor changes. If you're feeling uncertain about using a non-standard DTD, just use TEI Lite: TEItools will work just as nicely. The TEItools DTD allows for some shortcuts in markup, namely:

TEItools DTD FPI is "-//TEI//DTD TEI Tools 0.1//EN".

Divisions

TEItools recognize two hierarchy of divisions. This seems to be confusing, but I've found TEI-encoded documents using both types of division, so I decided to support both.

Based on element
Here hierarchy is established by elements used: <div1> is first-level divisions, they can be subdivided by <div2> divisions, and so on.

Based on type
Here TYPE attribute is relevant, since all divisions use the same <div> element. Recognized types are: "section", "subsection", "subsubsection".

Note that TYPE attribute is mandatory for <div>'s anyway, unless you're using the TEI Tools DTD instead of TEI Lite (see section DTD).

Besides, any division can be tagged with a TYPE attribute having the value "abstract" or "appendix", and will be rendered as such. TYPE=stub will generate a division, whose header will not appear in page headers and will not make an entry in table of contents.

Recognized types of generated divisions (divGen) are:

ToC
generates table of contents.

LoF
generates list of figures.

LoT
generates list of tables.

REND attribute

REND's should be obeyed everywhere, so report a bug if they aren't. A different look of the same element in different backends' output is a bug as well.

The value of REND is a list of keywords, separated by spaces or commas. Currently recognized keywords include:

Font changes:
"bold", "italic", "slanted", "typewriter", "sans", "smallcap", "underline", "large", "small", "normal";

Rendering:
"quoted", "inline", "block", "fixed", "continuous";

Alignment:
"left","center", "right", "justify", "flushleft", "flushright";

Width (of figure in multicolumn typeset):
"narrow";

Position in string:
"superscript", "subscript";

Direction:
"landscape".

Text will flow around figures if a figure carries the REND attribute specifying which side of the page a figure should be flushed to.

Paragraphs will be numbered sequentially if <div>isions carry the REND attribute with "ordered" value.

Some examples. Centered paragraph:

 <p rend='Center'>....</p> 

Bold word:

 <hi rend='Bold>word</hi>

(It's always better to mark what an element is rather than how it should be rendered. So, <kw>function</kw> is always better than <hi rend='Bold,Typewriter'>function</hi>.)

Lists

<LABEL>'s work only inside <LIST TYPE='Gloss'>. <LIST TYPE='Alpha'> will label items with a letter.

Tables

The element <TABLE> must carry the attribute REND. Its value is a string denoting the format of the corresponding columns of the table. Two formats of REND are supported: "LaTeX-style" and "CALS-style".

LaTeX-style
Each character in REND value corresponds to a column. Characters and their meaning follow.
Table formating instructions
Chars and Formats
l Flushed left
c Centered
r Flushed right
j Justified. Note that only justified cells will span several lines, other cells will only occupy single line. Note that only justified cells will span several lines, other cells will only occupy single line. This is very long cell to be formatted.

CALS-style
REND value is a sequence of comma- or space-separated formats for individual columns. Each format consists of width and alignment specifiers without any spaces. Width specifier gives proportional widths of columns, e.g. if two column carry values of "1" and "2", then their widths will relate as 1:2. The alignment specifier is a single letter, see table above. Example:

<table rend='1l,2j'>

Here is the source SGML text for the table above - just for example:

 <table rend="cj">
	      
	      <head>Table formating instructions</head>
		
	      <row role="label">
 	        <cell cols="2"/Chars and Formats/
	      </row>
	      <row>
	        <cell role="label"/ l /
	        <cell/Flushed left/
	      </row>
	      <row>
	        <cell role="label"/ c /
	        <cell rend="center"/Centered/
	      </row>
	      <row>
	        <cell role="label"/ r /
	        <cell rend="right"/Flushed right/
	      </row>
	      <row>
	        <cell role="label"/ j / 
	        <cell rend="justified"/Justified. Note that only
		  justified cells will span several lines, other cells
		  will only occupy single line. Note that only justified
		  cells will span several lines, other cells will only
		  occupy single line. This is very long cell to be
		  formatted./
	      </row>
	      </table> 

<CELL ROWS=N> is not working yet in all backends. COLS=N, however, does. Only justified cells will be formatted as paragraphs, others will probably take only one line.

In fact, <TABLE>'s REND gives the default format for columns. Each cell might, it turn, define its own REND which is a one-character string with the same value. This will override the default format.

Each <ROW> and <CELL> element can carry ROLE attribute. The only recognized value is "label".

Attention: if the cell content needs to consist of more than simple running text, for example, have a lists included, you must use "justified" cells.

Landscape tables are made if <TABLE> carries N attribute set to landscape. Be aware that this will require you to use dvips, because xdvi will not be able to show such a table, so use ghostview, gv or other PostScript viewer. Such a table will take full page per itself. The same holds true for <FIGURE>s, but using REND attribute instead. (Yes, this IS inconsistent. I'll probably fix it one day.)

N can also have the "notlined" value. In this case table will have no frame drawn.

Figures

TEItools used to expect figures to be:

filename.gif
for HTML backend;

filename.eps
for TeX backend;

filename.png or filename.pdf
for TeX backend with pdf style;

filename.rtfdata or filename.bmp
for RTF backend.

where filename is specified in corresponding entity declaration. TEItools did append appropriative suffix automatically.

While this is still true for backward compatibility reasons, the preferred way is to explicitly specify the correct filename and data format notations. Since filenames depend on backend and even styles used, entity declarations should go to marked sections. TEItools defines parameter entites corresponding to backend and styles used. Example follows:

<!doctype tei.2 public '-//TEI//DTD TEI Tools 0.1//EN' [
<!ENTITY % html "IGNORE">
<!ENTITY % tex "IGNORE">
<!ENTITY % rtf "IGNORE">
<!ENTITY % pdf "IGNORE">
<!ENTITY % externalfigs "IGNORE">
<![ %pdf; [
<!ENTITY logo SYSTEM "image.pdf" NDATA PDF>
]]>
<![ %tex; [
<!ENTITY logo SYSTEM "image.eps" NDATA EPS>
]]>
<![ %html; [
<!ENTITY logo SYSTEM "image.jpeg" NDATA JPEG>
]]>
<![ %externalfigs; [
<!ENTITY logo SYSTEM "image.wmf" NDATA WMF>
]]>
<![ %rtf; [
<!ENTITY logo SYSTEM "image.rtfdata" NDATA RTFDATA>
]]>
]>

Note that underscore symbols are omited from backend and style names.

Figures are emitted in their "native" size, i.e. pixel dimensions for RTF and HTML and bounding box for PostScript (TeX).

References to figures actually reference their <head>s, so only figures with <head>s might be referenced by <ptr>s.

n attribute of <figure>, if any, becomes alt attribute of <img> element in HTML.

URLs

URL are referenced with <xref> with TYPE attribute. Use it like this:

 <xref type='URL' n='Some interesting place'>http://some.where</xref>

SUBDOCs

SUBDOCs are currently started to be implemented. They should work at least with divisions in tei2tex script. Here is an example. This is the main document:

<!doctype tei.2 public '-//TEI//DTD TEI Tools 0.1//EN' [
  <!entity chap1 system "chap1.tei" subdoc>
]>
<tei.2><text>


<body>&chap1;</body> </text></tei.2>

And here goes chap1.tei file:

<!doctype div1 public '-//TEI//DTD TEI Tools 0.1//EN'>
<div1><head/The section/
   <p>Something</p>
</div1>

Now you can run either the main file or chap1.tei through tei2tex and get identical results. You could benefit from the SUBDOC feature, for instance, by allowing authors to write separate divisions independently.

Notes

Notes can have the PLACE attribute having one of following values:

foot
footnotes;

inline
notes included in running text in parentheses;

interlinear
notes between text lines;

end
endnotes;

side
margin notes.

Miscellanea

Formulas

Formulas are supported with TeX notation only. Use it like this:

 <formula notation='TeX'>E=mc^2</formula>

which is then rendered thus: .

Sample code

Use <eg> element for samples. White space is preserved inside <eg> contents. Note, however, that SGML markup is still recognized within in. If you want to completely screen your text from any interpretation, use CDATA marked section, like this:

<eg><![CDATA[ some text which goes into output unmodified ]]></eg>
	    

Note also that tab symbols are represented as single space, so please translate them into spaces before including your program code in <eg>s.

Epigraph

Epigraph's author is encoded with <bibl> element.

Processing instructions

useStyle

useStyle instruction is used for style inclusion. Example follows:

<?TEItools useStyle skip=1.5>

is equivalent to invocation of tei2something filename -style skip=1.5.

nextDivision

nextDivision instruction is used to change division numbering order. Next <div1> occuring after it will have specified number. For example:

<?TEItools nextDivision 77>
<div1><head/Division number seventy seven/

pageNo

pageNo instruction changes page numbering so that current page will have specified number.

nextDivName

nextDivName instruction sets the file name for the next division. Useful when using tei2html -style split.


Previous Next

Last modified: 19 18:09:55 MSK 2004
Produced by TEItools


return_links(); if ($a) echo "
".$a; ?>