Gw Requirements

Gw
The requirements (user stories) for the 2005 project are maintained in JIRA at

The following pages contain a few lines to explain the basic idea:


Below are the requirements for the 2004 precursor project; these have been partially implemented.

Storage Management

Advisor/Customer: Eelco Dolstra

Most current Wikis are based on RCS or CVS. This has some limitations, the main one being that files (Wiki pages), not directories, are versioned. This means that:

  • It is not possible to rename or move files, refactor the directory structure, etc.

  • Wikis generally do not support a recursive directory structure.

  • Commits are per-file. So changes to sets of files are logically distinct, i.e., not atomic. So if you want to make related changes to a bunch of pages, the Wiki may be in an inconsistent state while you're making the changes, and it's hard to undo (back out) the changes.

Since Subversion solves all these problems, we will use it as the storage back-end of GW. An entire Wiki will be stored in a single Subversion repository.

The goal of the storage team is to implement the Subversion storage back-end. I envision the following milestone releases (subject to change):

Release 0 - initial GW

On startup GW does a checkout of the repository. Edits happen on the working copy. However, there are no commits, so all changes to the Wiki are lost when the server is restarted. There is no locking of any kind.

Release 1 - persistent storage

Save operations should cause a commit to happen. When edits a new page, the page should be added first of course. Still no locking though.

R2 - multi file edits ("transactions"):

When a user starts editing a file, create a new working copy.

Edits happen on this per-session working copy and are not visible to other users. So a "Save" causes the per-session working copy to be modified, but nothing is committed.

There should now be a "Commit" operation that causes the entire per-session working copy to be committed. After this the per-session working copy can be deleted, and the global working copy should be updated. This makes the changes globally visible.

Merge conflicts are ignored for now.

R3 - merge conflicts

If on commit a merge conflict occurs, the per-session working copy should be retained, the user should be presented with a list of conflicting pages (showing the conflicts in those pages) and be allowed to edit those pages. Most of the work here should be done by the versioning UI team, but the storage team has to provide the supporting infrastructure.

R4 - use RA layer

Using working copies is inefficient. For instance, to start an edit session, we have to clone the entire working copy. This doesn't scale well. So instead of using Subversion's working copy (WC) layer, we should use the remote access (RA) layer, which allows us to fetch and edit just those files that are involved in an operation.

It's possible that the high-level Subversion bindings only supports WC operations, not RA operations. So additional C/Java bindings might have to be created.

R5 - caching for RA operations

Using the RA layer is scalable, but it's also slow. For instance, to view a page, we have to fetch it from the Subversion server every time (while before R4, we could just get it from the working copy). So RA fetches should be cached.

Of course, the cache should be properly invalidated on edit operations.

Additional complications

The storage layer is quite fundamental. The entire Wiki depends on it. The storage team should design a simple but sufficient interface to the storage layer that other teams can develop against. In particular the work of versioning UI team is closely related to that of the storage team. For instance, for merge support it is necessary that the storage layer offers to the upper layers notification that there is a merge conflict, a way to query what the conflicts are, and a way to clear the conflict situation. Close communication and frequent syncing with the versioning UI team is probably required. Big bang integration is not an option.

Maybe R4/R5 aren't such a good idea since replicating a lot of the functionality in the WC layer (such as support for moving files) is a lot of work. However something should be done about the scalability problem in R3. Alternatives might be to clone working copies using hard or symbolic links, lazily cloning the working copy, and so on.

Versioning User Interface

Advisor/Customer: Martin Bravenboer

The corner-stone of a Wiki is the ability to edit the data and structure of a Wiki online in a web browser from any workstation. The web browser is an extremely distributed and concurrent interface for editing the data of the Wiki. Hence, the generalized Wiki requires an appropriate user interface to the underlying version control system (Subversion). The Versioning User Interface project will develop a web-based user-interface for viewing and editing the Subversion repository of the Generalized Wiki.

Editing Files

Files contain the data of a Wiki. Files can be edited, moved and copied like ordinary files in a file system. Files can be organized in directories. GW supports binary files. Thus, there is no need for a notion of attachments like many existing Wiki implementations. Binary files are under version control, since they are just files in the Wiki

  • Edit, preview and commit in a single atomic action.
    • Editing a file is only supported for the head revision. This restriction will not be repeated in the description of all features.

  • Allow a commit message for a commit.

  • Edit transactions:
    • Use a session or author specific working copy
    • In cooperation with storage group
    • See also "Status Overview".

  • Create a new file.

  • Delete a file.

  • Move a file
    • Change all references
    • In cooperation with query and search

  • Revert the changes of a file to a certain previous revision.

  • Show working copy status if the current user is in an editing transaction.

  • Action for updating a file with the latest server revision (only in editing transaction).

  • Resolving conflicts: three buttons: cancel edit, save, save and resolve.

  • Create and remove a symbolic link to a file.

Editing Directories

GW has no notions of webs or subwebs: it is entirely hierarchic by allowing directories. Hence, it allows arbitrary levels of nesting. Furthermore, directories can be moved to different levels and are under version control.

  • Implement a directory listing. An URL that ends with a / should show his directory listing (if the user has the rights for this). The / URLs should not forward to a default file (such as index).
    • Optional: show access rights for files and subdirectories.

  • Directories are versioned as well. Implement support for viewing a directory listing specific revision of a directory. Don't show edit actions.

  • Create a subdirectory (head revision only)

  • Create a file in this directory.

  • Delete this directory

  • Move this directory

  • Show working copy status if the current user is in an editing transaction.

  • Action for updating a directory with the latest server revision (only in editing transaction).

Version Information of Files and Directories.

The user-interface must show versioning information of an item.

  • Show revision, last change date and last author at item page itself.

  • Action to view the history of a file (log)
    • Commit log: author, date and time, commit message.
      • Provide link to the actual commit to see the files that have been changed in the same commit.
    • Show the copy/move history in the Wiki file hierarchy.

  • Action to view the blame (praise) of Wiki topic: for each line show the author responsible for it.

  • Action to view differences between revisions of a file and directory. The diffs might be visualized in several ways.

Status Overview

During an editing transaction the user has its own working copy. The versioning user interface must implement tools for viewing the current status of this working copy. This status overview makes clear what has been changed and supports some additional actions for preparing a commit.

  • Show 'first column' information of SVN status:
    • Possible changes:
      • Modified
      • Added
      • Deleted
      • Conflicted
      • Merged
      • ... (see SVN documentation)
    • Use attractive icon set

  • Show information on properties
    • Conflicted
    • Modified
      • Use custom visualizations for known properties (access properties)

  • Optional, status overview with -u:
    • Show if newer revision exists on server

  • Show working and last-committed revision.

Actions from status overview:

  • Update a file for the server.

  • Edit a file in order to resolve the conflict of a file.

  • Revert a file to the state at the server.

  • Commit.

Integrate user authentication

The user authentication group will implement methods for authenticating users and determine their rights. This work has highly related to the versioning user interface, since the versioning user interface involves many actions that require user authentication. The integration of user authentication is a cross-cutting feature and therefore it is currently not described in detail.

Global information and statistics

  • Recent changes
    • For specific subdirectories (webs) only
    • HTML Commit log at the Wiki
    • RSS and Atom feeds for feed aggregators
    • Support Live Bookmarks of Firefox
    • In cooperation with query and search group.

  • Statistics:
    • Author statistics (all-time and per month)
    • File views (based on logs, database, or existing statistics system)
    • HotSpots? : frequently modified files

Local information and statistics

  • Show refererred-by
    • HotReferrers?
    • In cooperation with query and search team.

  • Who 'owns' this file: author statistics.

User Management

Advisor/Customer: Eelco Dolstra

GW should support the creation of users for the following reasons:

  • To provide access control - not everybody might be allowed to edit or view some pages.

  • To provide per-user customisation: preferences, style sheets, etc. (user management should be restricted to provide pointers to the preferred resources; stylesheets are in the rendering and templates package)

  • To provide edit traceability - who edited what?

Requirements

Make it possible to create and edit users. The information maintained for each user should be flexible (so that arbitrary properties can be added later on), but should included at least the full name and e-mail address.

Make it possible to create and edit ACLs (access control lists) that define who has the right to do what.

The most important access rights that must be implemented are read and write access. A more interesting access right is who has the ability to modify the access rights (for instance, in the Unix filesystem only the owner of a file can change the access rights, but this is often very limiting). (So maybe the concept of owner should be adapted as well; not just the person who last edited the file. -- EelcoVisser)

ACLs should be per-page; they should not be global to the Wiki. I.e., different parts of the Wiki can have different access controls.

It should be possible to defined recursive ACLs, e.g., if we make /A readable to user Foo, then user Foo should also be able to read /A/B.

Make it possible to create and edit groups (sets of users that can be reffered to in ACLs).

It must not be necessary to login to access/edit the Wiki if allowed by the ACLs (so there should be an "anonymous" user). And login should be prompted only when trying to access a restricted resource.

It would be interesting to be able to form subwikis for which user management is done locally in that subwiki. --EelcoVisser

(Maybe:) Account creation should use an e-mail confirmation scheme to verify that the user supplied a valid e-mail address.

I envision the following milestone releases (subject to change):

Release 1 - user creation

Allow users to be created. User account information should probably just be stored in the Wiki and edited through the normal Wiki mechanisms, for instance, through a Wiki page /Users/Foo for a user named Foo. This makes it unnecessary to create special forms for this purpose.

Release 2 - basic access control

Allow simple, non-recursive ACLs to be defined per-page. Probably setting the ACLs for some page /X should happen by editing /X/ACL.

(Note that it may be problematic to let /X be a page and a directory at the same time, unless we establish some kind of implicit extension scheme. But we (might) also want to use GW to edit arbitrary subversion repositories. -- EelcoVisser)

Use the ACLs to check access to pages.

Release 3 - groups

Allow groups to be created and used in ACLs.

Release 4 - recursive ACLs

Add recursive ACLs.

Release 5 - efficiency?

Perhaps loading and parsing the ACLs for each page access is too slow, so it may be necessary to add caching.

Difficult issues

It's not quite clear how ACLs and versioning should interact. For instance, it will be possible to view old revisions of pages. But should be then use the ACLs in the old revision of the repository, or the current ACLs? The latter is necessary to withdraw or grant access to old revisions of pages. On the other hand, it is not clear how current ACLs should be applied to pages that have been deleted in the current revision.

Rendering and Template

Advisor/Customer: Eelco Visser

An essential component of any wiki is the rendering of text in a simple wiki markup language to full blown HTML files. Following the tradition of the original c2 wiki, this rendering phase is regular-expression based. As is known from programming languages, this leads to badly designed languages when adding new features.

  • line by line processing makes it awkward to write well structured markup

  • rendering operations interfere with each other, e.g., a wiki word in a forced link

The TWiki clone of wiki extends the basic wiki markup scheme with a server-side template mechanism, which allows configuration of page layout. The template mechanism is 'propietary' and ad-hoc. Being server-side it cannot be maintained via the wiki itself.

GW should support

  • A wiki markup language based on context-free parsing

  • Extensibility to new markup languages through an XML-based intermediate format, which separates parsing from rendering

  • Stylesheets for rendering of wiki markup and composition of pages

The following ingredients are necessary to achieve this.

GWML: XML Schema for the GW Intermediate Representation

Design an XML schema for structured representation of wiki markup. The first version should support all the markup in the TWiki TextFormattingRules? . The second version should also incorporate TWiki variables.

WikiML? would be a logical name, but is already claimed. See http://wiki.wikiml.org/ also for inspiration.

Parsing TWiki Markup

Parse the TWiki markup. Since we want to migrate several existing wikis (ST Wiki, program-transformation.org, stratego-language.org) to the new wiki, the twiki format should be supported. The parser should produce an XML document in DOM format to be passed on to an XSLT transformer.

Rendering: GWML representation to HTML in XSLT

When viewing a page, its GWML representation is converted to HTML by an XSLT stylesheet (or wiki formatting template). The stylesheet is obtained from the wiki.

Cascading Stylesheet: pretty layout

The amount of formatting in the XSLT stylesheet can be limited by separating logical layout and graphical layout. The latter should be achieved using cascading stylesheets. The CSS for viewing pages should also be part of the wiki. A good set of conventions for element classes and nice initial graphical layout should be designed.

Templates provide context for generated HTML

In addition to rendering the contents of a page to HTML, templates can add context information such as navigation bars, headers, footers, and menus. Define templates for viewing, editing, previewing, and saving pages. See the FlexibleSkin? templates for inspiration. For example in the ST wiki the WebContents? page is included in the navigation bar on the left.

Finding Preferences

Rather than just providing a single template, subtrees or specific pages of the wiki may adopt a different presentation style. For that purpose it should be possible to override the default templates with new templates. There should be a scheme for finding the applicable templates. Also it should be possible to reuse (inherit) as much code as necessary from exising templates.

Type-Based Rendering

A wiki hierarchy does not necessarily only consist of proper wiki files. Rather it can also contain pure HTML files, files in other document formats such as PDF or Postscript, files in data formats such as XML, Bibtex, plain text, CSS. Each of these file types should be 'viewed' and 'edited' using the appropriate wiki operations. This requires a rendering function for each type and thus the rendering engine should be extensible with new rendering functions. There should be a standard interface for such rendering functions.

(Note that 'rendering' a pdf file may simply mean providing a link to the file, or to copy it to the browser untouched.)

See also relation with Form rendering in Forms project

Context Variables

Wiki pages may use a series of variables to refer to their context. That way the page may be easier to port (move to another location), to include dynamic information (current time, user logged in), or search results. Also it allows the creation of generic files for inclusion in other files. See TWikiVariables for inspiration and legacy requirements.

Access Control

Integrate with access control; don't show a page if no access to it is granted. What happens if a component of a page is not accessible (e.g., reviews for a paper are not accessible to its authors until notification)

Dependencies

When creation of page views becomes involved, the efficiency of presenting pages may suffer. Therefore, it may become necessary to maintain a cache of page renderings. In order to maker sure that pages in the cache are invalidated when a page used for its creation changes it is necessary to infer the its dependencies, i.e., all files accessed for rendering a page and all authentication needed to obtain them. This should be achievable by asking user and storage management for a trace.

Integration with Forms

A wiki page is a form with textarea field, which can also have other fields such as parents and access control. There may be different access controls for different fields.

Risks

The prim risk seems to be to find a good parsing technology is key to (1) obtaining a working parser for twiki markup and (2) for its maintainability.

Wiki Form

Advisor/Customer: Martin Bravenboer and Eelco Visser

Introduction

Wiki webs are used to maintain and develop a web of information with a group of people. In current Wiki implementations this information is restricted to text documents. The text documents are written in a simple markup language and are presented in a browser. The goal of the Forms project is to experiment with collaborative maintenance of structured information. This information could be structured in XML, but it could also be stored in ad-hoc notations that are part of the text documents. Examples of this are the WebNotify, Preferences, and Category feature of TWiki.

Structured information can be edited at the source level, i.e. an XML document could simply be edited as an ordinary plain text file, but this will not make the web application accessible to users that are not deeply involved in the structure of the web application. This approach would be comparable to diving into a relational database and executing SQL queries and update.

To make the manipulation of structured data more user-friendly, we need to be able to attach forms to edit topics. These forms are an alternative to editing the structured source of a Wiki topic. The ultimate goal is to have

If GW Forms are to be compared with the Model View Controller pattern, then the most striking difference is that GW Forms will probably not have a controller: GW Forms are restricted to editing the structured of a single file and will not be used to implement web applications with complex control flow.

This project will have a more experimental nature than the other project. Hence, the requirements are not very specific and might be changed, based on the experiences of the team working on the Forms project. If one of the approaches appears to be particularly useful, then it might be used for the development of an entire web application.

Overview

A GW Form should do several things:

  • Describe the structure and presentation of a form.
  • Create an instance of a form, based on existing data.
  • Map the changes performed by the user back to the data.
  • Verify that the data is correct and send feedback if it is not.

We will mostly focus on data stored as XML documents.

Server-side Validation

The first step towards structured information in a Wiki is to ensure that the information stays structured. For information stored in XML this means that the Wiki files should remain well-formed and valid.

More concrete requirements are:

  • Verify the well-formedness of XML documents, or rather, files that are supposed to be XML documents.

  • Verify the well-formedness of XML documents with a schema or multiple schemas. Schema languages that should be supported: Schematron, RELAX NG, W3C? XML Schema.

  • Arbitrary validators implemented in a scripting language (Groovy or Jython).

  • Arbitrary validors implemented in Java.

The validators that are to be applied must be defined in meta files.

Some form techniques that we will experiment with in this project also allow client-side validation. This is only suitable for improving the user experience, since a client cannot be trusted.

HTML Forms

The only advantage of HTML Forms is that they are available in every browser. However, they are not suitable for developing Forms without any scripting or other server-side code. Still, it is in practice the only option for creating web forms.

  • Develop a method for defining GW Forms as HTML Forms. Use server-side code that can be edited online to bind the form to the structured data of a GW file. The server-side code might for example be written in Java, Groovy, etc.

Client-side XForms

Client-side XForms improve the user experience by allowing client-side validation, offering more widgets, etc. Although most browsers do currently not support XForms, this is probably the future of forms in web applications. Mozilla will include XForms in future builds and plugins are available for Internet Explorer. XForm client-side and server-side libraries exist for platforms like Java.

  • Develop a method for defining XForms to edit the XML data of GW files.

Server-side XForms

The lack of browser support for XForms is a serious problem in any website that is going to be used in practice. This could be solved by using server-side XForm libraries, or client-side plugins (maybe Java).

  • Develop an alternative implementation for browsers that do not support XForms. The form definitions in GW should remain the same.

XUL Forms

XUL is the user-interface technique used by the Mozilla platform. The ultimate goal of XUL is to provide a platform for the development of platform independent independent graphical user interface applications. Most applications rely heavily in JavaScript? , which makes the Form less declarative. Another limitation is that XUL-based web applications will probably only work in Mozilla browsers: there are currently no efforts to make XUL a true web standard.

  • Develop examples of XUL based web applications that are hosted and maintained in a Wiki.

Use Cases

Some application that can be used to test the various approaches are:

  • GW Configuration (variables)
  • User preferences
  • List of users in web notification service
  • Category editing
  • Form for BibTex? entry editing

Full web applications:

  • Conference management (paper sumbission)

Search and Query

Advisor/Customer: Merijn de Jonge

Search

  • topic search
  • keyword search
  • full text search
  • category search
  • search in search
  • constrained search
  • multi-web search
  • presentation of search results

Query

  • querying web structure
  • dead pages
  • live pages
  • new pages
  • most recently changed pages
  • page readers/visitors
  • page writers
  • changes
  • most visited pages
  • web site activity browser (i.e, graphical presentation of activities of all parts of the web site are
  • web counters/statistics

General

  • search/query expression language
  • (caching of results. this may conflict with access control mechanism)