Documents possess structure
Most of the electronic documents read and written in an office environment possess structure. For example, common kinds of documents are:
- Text documents: articles, reports, documentation. Format examples: MS Word, LaTeX, etc.
- Web pages. Format: HTML.
- Presentations. Format example: MS PowerPoint.
- Spreadsheets. Format example: MS Excel.
- Appointments, to-do list items, address book items. Format: any organiser.
- Computer programs. Format: any programming language.
A user that edits or constructs such a structured document often uses just a few abstract operations, which are similar for different structures:
- Copy and paste parts of the document.
- Add a new `structure element' (e.g. new section, new table), or delete part of a document.
- Enter text.
- Select and position.
Documents and presentations
When editing a document, a user looks at a presentation of the document. If this presentation is a presentation of how the document will look on paper or in a browser, the editor is called a WYSIWYG editor. Examples of such editors are MS Word and FrameMaker
. Sometimes the presentation of a document can be specified to some extent by the user, for example in a style-sheet. Thus layout and content are separated. Separating contents from layout is very desirable: content can be reused on different media, it is easier to query content, etc. Often, however, the flexibility of style sheets is limited. HTML is an example of a language in which contents and layout are mixed, and this was one of the most important reasons for creating XML.
An average computer user uses many editors to construct or edit structured documents. Some of these editors are flat text editors such as NotePad
; others are dedicated structure editors such as MS Excel. By a dedicated structure editor we mean an editor for a particular kind of information, such as spreadsheets in the case of MS Excel. The presentation on a screen or on paper, of these structures might be specified by a style-sheet.
Specifying derived values
Many documents need a formalism for specifying computations. Examples can be found in the proposed use for interactive applications of XML, in which it will be necessary to react in real-time to events and to adjust the presentation of an XML document accordingly. One may also envisage interdependent XML documents which, together with their presentation, are updated in real-time when an author changes part of one of the documents, e.g. the value of an attribute. Furthermore, the functionality of spreadsheet-like operations, as in the what-if decisions for a tax-form or for making appointments in an organiser, can be supported if an XML editor allows for real-time calculations and presentations.
Problems with editors
There are several problems with the current editing situation:
- Integration: a document constructed in one dedicated structure editor can almost never be included in another dedicated structure editor, and if it is edited in a flat text editor, the structure is lost. Even in the relatively integrated editors of Microsoft Office, a user cannot, for example, edit an Excel document in MS Word, or a Powerpoint presentation in MS Excel.
- Presentation: often there is a restricted number of ways in which a document can be presented. For example, although MS Word uses an internal format for a document, it is impossible to view this internal format.
- Inconsistency: most editors use (sometimes slightly) different user interfaces. Sometimes even different user interface styles can be found within a single editor.
- Flexibility: many editors are not only inflexible with respect to their input format, but also with respect to the order in which certain edit steps are to be performed. Furthermore, most editors are closed in the sense that the user has limited access to the functionality of an editor.
- Expressivity: few editors allow the user to specify computations. In most editors computations are built in, such as the computation of a table of content, and it is almost always difficult or impossible to change these, or to add new kinds of computations.
- Safety: most HTML/XML editors allow the user to edit the source text of an HTML/XML document and saving it afterwards. However, the safety and other advantages of structural editing are then lost.
This list is not exhaustive, more problems can be found in the literature, and, unfortunately, in existing editors themselves.
XML: a document structure standard
The integration problem is solved if there is a standard for describing document structure. XML (eXtensible Markup Language) is a recent standard for describing structured documents. It is a simplified version of SGML (Standard Generalised Markup Language). Since XML was completed in early 1998 by the World Wide Web Consortium (the international body that controls the standards for Web related concepts), the standard has spread like wildfire through science and into industries ranging from manufacturing to medicine. XML is much more widely accepted than its predecessor SGML.
XML is a method for putting structured data in a text file. Programs that produce structured data often also have to store it on disk. They can either use a binary format or a text format. The latter allows you, if necessary, to look at the data without the program that produced it. XML is a set of rules (or guidelines, conventions) for designing text formats for such data, in a way that produces formats that are easy to generate and read (by a computer), that are unambiguous, and that avoid common pitfalls, such as lack of extendibility, lack of support for internationalization/localization, and platform-dependency. One can see XML as a standard for describing both context-free grammars and documents that can be generated by a context-free grammar.
The structure of an XML document is given by its document type definition (DTD). A DTD is a notation for a context-free grammar.
Since XML has been accepted as a standard for describing document structure, an XML-based editor is very desirable. There already exist many SGML editors, which can relatively easy be transformed into an XML editor. However,we want to significantly improve on these editors to obtain a presentation-oriented editor with which computations can be specified.
Structure in Computer Science
Although XML is a recent standard for describing structured documents, the notion of structure has been central in several research fields in Computer Science since the 1960s. For example, program development tools has been an important research field within Computer Science almost since the start of Computer Science. One of the main achievements of this research is that the structure of a programming language determines the tools for the language. Parser generators, editor generators, code generator generators all take a structure as input. Many of the results from these fields are directly reusable in an XML editor.
The WWW will be a primary source for XML documents, and the impact of XML editors will be dramatically increased if they can also be used for Web browsing and editing of distributed documents. This requires that the specified computations can be performed and constraints can be incrementally propagated across the Web. The Web context also increases the urgency of allowing multi-user editing.
The goal of this project is to develop Proxima: a single, generic, presentation-oriented editor for all kinds of XML-documents, in which computations can be specified.
- 01 Jul 2008