Text 2 Text Documentation

Ehc

1 Introduction

Text2Text takes a text document with text type annotations for various formats and generates a representation in a particular requested format. Although currently only one input format (doclatex) can be used, the idea is to have a mixture of various mixed text fragments of possibly different types be parsed into a common representation and generate a document in one output format.

Text2Text is meant to be used after Shuffle. On request Shuffle annotates its output with text types specified for input chunks; Shuffle just passes this information on. In this way documentation can be written in different formats independent of the particular output it will be presented.

A simple Text2Text input looks like:

@@[doclatex

@@[doclatex
@@[doclatex
@@]
@@]

@@[doclatex
@@]

@@]

Text fragments are delimited by @@[<texttype> and @@]. As mentioned already, currently the only <texttype> is doclatex, restricted LaTeX for documentation purposes.

Text fragments may be nested, but the delimiters always start at the beginning of the line. Nested fragments may have different text types and are treated as defined by its type. For text type doclatex we can use LaTeX:

@@[doclatex
\documentclass[a4paper]{article}
\begin{document}
Foo bar
\section{Foo}
\subsection{With Bar}
\subsection{Without Bar}
\section{Bar}
\subsection{Barred foos}
\end{document}
@@]

We assume this text to be in file foo.

Output is generated by the following command text2text (EHCHOME/bin/text2text), which we assume for simplicity to be on the shell searchpath.

text2text --doclatex foo

This produces an identical copy of the file contents of foo except for the Text2Text annotations. More interesting is its translation to TWiki:

text2text --twiki foo

This gives the following output:

Foo bar
---+ 1 Foo
---++ 1.1 With Bar
---++ 1.2 Without Bar
---+ 2 Bar
---++ 2.1 Barred foos

LaTeX specific commands like \begin{document} do not produce output, but section headings do, even numbering is added.

2 Formatting commands

In the following sections the permitted formatting commands are described in terms of document LaTeX (previously alread named doclatex). The following general rules apply:

  • The original text structure is maintained as much as possible. Only limitations imposed by a specific output format break this rule.
  • Parameters (text between curly braces) or text within environments is as unrestricted as possible, except when noted otherwise.

In its current state Text2Text not yet deals with all pecularities of a specific output format. Such pecularities are discussed in a separate section.

Text2Text delimiters are omitted in the examples in this section.

2.1 Document setup and meta information

2.1.1 Document structure

The global document structure follows LaTeX, a minimal document is specified by:

\documentclass[a4paper]{article}
\begin{document}
\end{document}

Allowed commands:

  • \documentclass[X]{Y}, where X may be a4paper.
  • \usepackage{X}

2.1.2 Document title

Allowed commands:

  • \title{X}
  • \author{X}
  • \maketitle

2.1.3 Table of contents

Allowed commands:

  • \tableofcontents

2.2 Structure

2.2.1 Section commands

Three levels of section commands are allowed:

  • \section{X}
  • \subsection{X}
  • \subsubsection{X}

2.2.2 Itemizing

Allowed itemizing environments, with \item for individual items:

  • \begin{itemize} \item ... \end{itemize}
  • \begin{enumerate} \item ... \end{enumerate}

2.2.3 Table

The tabular environment \begin{tabular}{FMT} ... \end{tabular} can be used, with \\ for row termination, and & for column separation. Allowed formatting for FMT: l r c p{...} |

2.3 Referencing

References can be to locally (to the document) defined labels, hyperlinks, to pages of the EHC twiki, relative to the root of the EHC source tree, and to citations. Allowed referencing commands:

  • \lref{X}{Y}, refer to local label X, displaying Y. Example: \lref{CommandLine}{command line} gives: command line (label is defined elsewhere in this document by \label{CommandLine}).
  • \href{X}{Y}, refer to hyperlink (url) X, displaying Y. Example: \href{http://www.cs.uu.nl/groups/ST/Projects/ehc/text2text-doc.pdf}{this doc as pdf} gives: this doc as pdf.
  • \eref{X}{Y}, refer to sub twiki page X of EHC, displaying Y. Example: \eref{Text2Text}{this doc} gives: this doc? .
  • \uref{X}{Y}, refer to sub twiki page X of UHC, displaying Y. Example: \uref{WebHome}{UHC home page} gives: UHC home page.
  • \sref{X}{Y}, refer to source file X, displaying Y. Example: \sref{text/ToolDocText2Text.cltex}{source of this doc} gives: source of this doc (EHCHOME/text/ToolDocText2Text.cltex).
  • \cref{X}{Y}, refer to citations X, displaying Y. Currently no actual reference to X is generated, being a placeholder for later. Example: \cref{xxx}{Some Author} gives: Some Author.

Abbreviations (syntactic sugar):

  • \secRef{X}, expands to \lref{X}{section}.
  • \figRef{X}, expands to \lref{X}{figure}.

Allowed labeling commands:

  • \label{X}, need only be usable local to file..
  • \glabel{X}, must be referrable from outside, if such a reference mechanism exists. Currently twiki only; it must comply to twiki anchor rules.

2.4 Pictures

Pictures created by external tools can be included by \includegraphics[options]{picture}, where picture refers to a document holding the picture. Currently only pdf pictures with suffix .pdf are allowed. The following options can be used:

  • scale=fraction, scale picture down/up by fraction.
  • label=text, add label.
  • caption=text, add caption.

2.5 Text style

2.5.1 Verbatim

Two equivalent environments for verbatim text can be used:

  • \begin{verbatim} ... \end{verbatim}
  • \begin{pre} ... \end{pre}

Inlined verbatim is done by \verb|X|. Example: \verb+Some text+ renders as: Some text.

2.5.2 Fonts

  • \textbf{X}, bold. Example: \textbf{Some text} renders as: Some text.
  • \textit{X}, italic. Example: \textit{Some text} renders as: Some text.
  • \texttt{X}, teletype. Example: \texttt{Some text} renders as: Some text.
  • \emph{X}, emphasized. Example: \emph{Some text} renders as: Some text.

2.6 Non LaTeX compatible commands

Non LaTeX, borrowed from lhs2tex:

  • Text between vertical bars |, like |Some text| is a shorthand for \emph{Some text}, displaying as Some text.
  • Text between @, like @Some text@ is a shorthand for \texttt{Some text}, displaying as Some text.

No complex commands are allowed between these delimiters. Escaping can only be done by mutually delimiting | and @ or using \verb. This notation cannot be used inside options to commands.

2.7 Remaining commands

Remaining allowed commands:

  • \hline

3 Output specific behaviour

3.1 Document LaTeX - doclatex

Assumed/included styles/packages:

  • External styles: fancyvrb, geometry, graphicx, hyperref.
  • Internal styles: mainsty (EHCHOME/text/mainsty.clsty). This is the main style file used by all documentation and papers, it includes the external styles, and is processed/included as part of the building of documentation.

Other pecularities:

  • When a label and/or caption is given for \includegraphics, all is also wrapped inside \begin{figure}[h] ... \end{figure}.
  • \glabel is equivalent to \label.

3.2 TWiki - twiki

Pecularities:

  • \documentclass[X]{Y} is ignored.
  • \usepackage{X} is ignored.
  • Title related commands are ignored.
  • Spacing around \item and its content is not yet ignored, so items are not properly rendered.
  • Labels may only be defined globally (because anchors must start at the beginning of a line).
  • Table formatting is not implemented (yet).
  • \includegraphics ignores all options. The included picture is assumed to have suffix .gif instead of .pdf.

4 Commandline invocation

As printed by text2text --help:

Usage: text2text [options] [file|-]

options:
    --doclatex                       generate doclatex
    --html                           generate html
    --twiki                          generate twiki
    --help                           output this help
    --gen-header-numbering[=yes|no]  generate header numbering, default=no


5 Installation via cabal

From the README file:

!Text2Text documentation processing
==================================

See:

http://www.cs.uu.nl/wiki/bin/view/Ehc/Text2TextDocumentation
(part of http://www.cs.uu.nl/wiki/bin/view/Ehc/Documentation)


Cabal installation
==================

In the UHC installation root directory:

# configure UHC, and the cabal file for text2text
> ./configure
> make text2text

Source files are now generated and ready for use by cabal.

In this directory:

> cabal configure
> cabal build
> cabal install
> cabal clean

6 Further reading

See also

  • Shuffle for manipulating source chunks.
  • LaTeX for knowing the assumed meaning of doclatex commands.