Calliope Developer Documentation

1. Calliope Service
2. Mongo DB
3. C support libraries
4. pdef-tool
5. Calliope API

1. Calliope Service

Calliope is designed to provide all the services needed at a high level to implement DSEs (digital scholarly editions). Calliope is a Java-based web service that can run standalone in Jetty or as a Web application within a web-app container like Tomcat. Although it is conceived as a faceless service, there is a test interface at http://localhost/tests/ for testing various aspects of its behaviour. Everything that calliope does is configurable (see below). Calliope performs the following tasks:

  • Imports XML, plain text or HTML files into MVD formats
  • Exports a digital scholarly edition as a pdef-archive, consisting of a set of TEXT + markup files or as pure MVDs. We hope to add XML and HTML export to this soon.
  • Compares any two versions in an MVD and renders the result as HTML
  • Combines text and separate markup properties (which may overlap) into HTML. For example, it can render an imagemap as a set of areas, or add a set of links to point to image areas. Multiple markup sets can be combined to enhance a single document.

Calliope Service Proxy Setup


Calliope runs as a service or Tomcat web application on port 8080, but a proxy is setup in apache to translate requests on port 80 to port 8080, so the server does not have to open that port to the world. On Linux the proxy settings are held in /etc/apache2/mods-available/proxy.conf and on MacOSX in /etc/apache2/httpd.conf.

Calliope in Jetty

Calliope can be run as a standalone service in Jetty. Instead of creating a war file (which requires an XML configuration file) calliope can run in a purely XML-free environment somewhat faster than in Tomcat. It is also much easier to debug in that form. However, calliope uses external C libraries for speed, portability and code longevity. This means that, until the C code fully stabilises, even a minor error in these libraries will cause the entire Java Virtual Machine to crash, bringing down the calliope service.

Because of the way that nmerge computes differences it requires a stack space proportional to the document's complexity. As a result, specifying a large stack size for calliope is essential. In practice we have found that a maximum memory size of 2GB, and a stack size of 8MB works well. Also turning on incremental garbage collection on Linux is necessary to avoid using up all memory when merging large documents. (This is the default on MacOSX).These settings are contained in the script, found in the calliope-0.2.2 directory of the distribution.

Calliope in Tomcat

In Tomcat calliope runs as a standard web application. Errors that would crash the Jetty installation get caught by the container, and instead of failing it relaunches the service. Since many services are typically run in Tomcat alongside calliope, adding one more is easy. On the other hand, the URLs are more complex, because each must be prefixed by "calliope" or whatever the war file is called. This prefix is stripped from all urls passed to calliope so that the behaviour is the same in Jetty and Tomcat.

p>Tomcat's JVM settings should be -Xss8m -Xmx2048m -Xincgc, and are contained to

The calliope.war file

This is built from the calliope.jar file contained in the calliope distribution. Although it is probably unnecessary to recreate a fresh one, the script to build it is This pulls the calliope.jar file from dist/calliope.jar and adds it to the calliope folder, then jars the whole directory. To install the war file copy it to tomcat's webapps directory, stop tomcat, remove the old expanded calliope directory and restart tomcat.

The file

In order to find the C libraries Tomcat needs to have its libpath set before it starts. The best way to do this is via a script added to $tomcat_home/bin. In the calliope-distribution is a folder tomcat-bin. Copy the contents into $tomcat_home/bin and then restart tomcat. If calliope complains that it can't find AeseSepller or any of the other libs it is because this script has failed. The script is written in dash (not bash) because it is called by, which is also written in dash. calls LibPath.class to get the default java library path, it then appends "/usr/local/lib", where the C-libraries are, and sets the new path into CATALINA_OPTS. So when Tomcat starts up it will look in /usr/local/lib for the aese* libraries.

Java libraries needed by calliope

These are all contained in the war archive.
nmerge - this is the merging library. Makes MVDs
commons-codec - this is used for base64 encoding to read MVDs
commons-fileupload - this is used for importing and reads the mime multipart format
commons-io - needed to read uploaded files
jetty - needed for standalone running
jtar - need to generate tar.gz pdef-archives for export
jtidy - used for HTML import
juniversalchardet - amazon lib for guessing character encodings of plain text
lucene-* - for content based searching
mongo-driver - driver to support mongo database
servlet-api - needed for running calliope servlets

Netbeans project

The Netbeans project files are included in the distribution. This can be used to modify and rebuild the source code. The output is in dist/calliope.jar.

2. Mongo DB

Calliope Database

Calliope stores all its data in MongoDB, though it can also use CouchDB. MongoDB is much faster and unlike CouchDB it doesn't save old versions. So uploading a new copy of a document obliterates the old one (except for images). MongoDB can be installed on Debian Linux via apt-get install mongodb. The admin user will also need to be created before starting calliope. The javascript to do this is containined in calliope/calliope-0.2.2/mongouser.js, which can be run on the commandline as:

mongo mongouser.js

The mongo database used by calliope is also called calliope, and it has five collections (tables): cortex, corcode, corform, config and corpix. Mongo uses BSON files in a flat structure (without relations). To see the collections enter the mongo client by typing:

use calliope
show collections


Each document in calliope has a "docid", which identifies it to calliope. This is a hierarchical and human-readable identifier and is in addition to the unreadable id that Mongo generates. Docids were originally based on languages like english/shakespeare/kinglear/act1/scene1, but the tendency now is to start with a project name. Docids can't contain spaces, and must not start or end with a slash. To get a list of docids in cortex type the following on the commandline:


To obtain a list of docids for a particular docid prefix (note the escaped "/"):

db.cortex.find({docid: /^english\/harpur/},{docid:1})

To delete a particular document remember that entries are paired between corcode and cortex. So if you delete one cortex, be sure to delete all related corcodes with the same prefix as the cortex. After listing to find the correct docid try:

db.cortex.remove({docid: "english/harpur/test"})
db.corcode.remove({docid: "english/harpur/test/default"})


cortex stores multi-version plain text MVDs ("format":"MVD/TEXT") or single-version plain text files ("format":"TEXT"). Try this in the mongo client:


This will retrieve the BSON document in the collection cortex at that docid, which has a number of fields

body: the body of the MVD (in base64) or plain text if only one version.

format: is ually MVD/TEXT - this indicates that it is in MVD (merged) format and the contents of the MVD is plain text. If it is TEXT the contents must be plain text, because this document has only one version.

version1: the default version for display. This becomes the left hand version in the compare view unless you specify otherwise. This is the full version ID, which includes the group path, and begins with a slash.

style: the docid of a format in the corform collection, which will be used to format the contents

author, title, section are created on import based on the docid but are not really needed.

corcode is a collection that stores STIL (standard interval language) format standoff properties. Since there may be more than one corcode for each cortex they are stored at the same docid as for the corresponding cortex, but the alternative corcode names are added on, so the default corcode for the cortex english/shakespeare/kinglear/act1/scene1 is english/shakespeare/kinglear/act1/scene1/default. If there are other corcodes for a cortex they may be merged with the text but unless specified otherwise only the "default" corcode is used. The default corcode for shakespeare would give it a simple speech/line/stage structure as per TEI. Each corcode has a default style, which is the docid of the default corform to use when rendering it. The format of a merged corcode is MVD/STIL, or just STIL if there is only one version, in which case the contents of the corcode is the STIL document itself.

corform is a collection of CSS formats. The docids are usually simpler than those for cortexs and corcodes because they are reused for many documents. For example, there is a corform called TEI/default. To list all corform docids in the mongo client:


By changing the corform used with a particular corcode you can change the way it is rendered. So bold could become italic. By providing a STYLE<n>, e.g. STYLE1 parameter you can specify the docID of another corfom instead of the default. To see the CSS of a corform type in the client:


This shows the default rendering of all TEI-Lite formats. Since unspecified formats are ignored they must at least have an empty definition for them to appear in the HTML output. All css selectors of the form will render as <tag class="property">...</tag> in HTML. For example the format br.l {} will render as a line of text with a <br> at the end (because in HTML br is empty), and p.formula {} will render text tagged with the property "formula" (originally a TEI element) as <p class="formula">...</p>. To rename the attributes of properties you can prefix the target HTML attribute with "-aese-" and the name of the original attribute. The value of the CSS property is then taken as the name of its HTML attribute name, and its value remains the same, e.g.

a.ref { -aese-target: href }

will render properties tagged with "ref" and the attribute "target="/images/123.png" as the HTML element <a href="/images/123.png">...</a>.

Other more complex types of CSS selector are ignored by calliope, but all the CSS is passed through verbatim and can be used to render the HTML output.

config is a collection of configuration files in JSON format. These are specific to various functions within calliope. You would need to look at the import Java code to see how they are applied. For example, those configs whose docIDs begin with "stripper/" configure the tag-stripping on import, and specify how tags get renamed and which ones combine with their attributes. Those starting with "text/" govern the behaviour of the text import filters. Those starting with "splitter/" govern the behaviour of the TEI-XML splitter, which divides a document containing version tags like <app>, <rdg>, <add>, <del>, <abbrev> etc into layers or sub-versions.

corpix stores images using Mongo's GridFS, which splits larger images into chunks. To access these files you should use the API calls provided in AustESE, since GridFS is a separate part of Mongo. However, as in the other collections, each image in the calliope database has its own docID.

Uploading the default database values

A bare bones set of collection data stored in the archive folder in calliope/calliope-0.2.2 can be uploaded by using the pdef-tool. Without these values calliope won't work. The command to upload from the calliope-0.2.2 folder is:

pdef-tool archive

This uploads the contents of the archive directory to the running calliope host named in archive/archive.conf. If you are using the tomcat war file you will have to change the "base_url" property in archive.conf at the top level of the archive folder to http://localhost:8080/calliope/. The pdef-tool can be downloaded separately from the git repository at AESEInfrastructure/pdef-tool. Installation is by running The help file is accessible via man pdef-tool.

3. C Libraries

Calliope uses three custom C libraries for speed and portability (it is envisaged to port calliope to php at some stage). These are:

  1. Formatter - this converts the JSON standard interval language (STIL) standoff markup + plain text into HTML. Formatter requires a format file in CSS format stored in the corform collection. The code for formatter is in calliope-0.2.2/formatter, and can be rebuilt separately. The following command rebuilds the formatter library in /usr/local/lib:
    sudo ./
  2. Stripper - this strips tags from HTML and XML files and stores the markup as STIL standoff properties and the rest as plain UniCode text. It joins up hyphenated words split over lines using aspell. When stripping HTML it first tidies the HTML into XHTML and then strips as per XML. Stripper requires the Debian packages aspell, libaspell-dev, expat, libexpat1-dev, tidy, libtidy-dev, and can be rebuilt as per formatter.
  3. Speller - this is used when doing plain text import and for generating lists of installed dictionaries for the user to specify the language on import. The speller can be rebuilt as per formatter.

4. pdef-tool

About pdef-tool

pdef-tool is for uploading and downloading DSEs (digital scholarly editions) in an interoperable form. Interoperability is achieved by storing the editions in multiple formats: MVD (the native format of Calliope), plain text+external markup, XML, HTML.

The plain TEXT format is coherent, that is, each file cannot contain sub-versions in the form of additions, deletions, substitutions that may be present in the original documents. Instead, these features are expressed via copies with the alterations explicitly carried out. For example, a DSE based on two physical copies A and B, with one layer of internal corrections would be stored in a hierarchical directory structure thus:


Other layers can be created to represent expansions and regularisations of spelling etc. If desired, the layers can be compared with one another in any comparison program to obtain a list of differences.

HTML is a coherent rendered format, that is, it contains the markup as originally specified via textual properties as a set of explicit formats. This makes it difficult to change, but also easy to read in any web browser. HTML is not yet supported by pdef-tool.

XML is not coherent like HTML or TEXT, since it may contains alternative internal versions. Separate physical copies are represented by separate files. Although not yet supported by pdef-tool for export (though it is for import), XML files should be renderable from the internal cortex+corcode files using the formatter program.

MVD format contains the merged cortex + corcode files dumped from calliope. Uploading/downloading them in this form is the most efficient way to store a DSE, but it is not usable in other programs.


Uploading: pdef-tool <source-folder>


-h <host> the url for download (defaults to http://localhost:8080/)

-f <formats> a comma-separated list of TEXT,XML,MVD,MIXED (defaults to MVD)

-d <docid> the prefix of a docid as a regular expression, e.g. english/poetry.* (defaults to ".*")

-n <name> the name of the archive to download (defaults to archive)

-z <zip-type> specifies the type of zip archive, either tar_gz or zip (defaults to tar_gz)

-r download required corforms and all configs on server


pdef-tool is used to upload or download digital scholarly editions using the PDEF (portable digital edition format). A PDEF archive is a specially formatted collection of nested folders and files.

Some folders begin with metacharacters, which are used to specify the docids of the data on the server. Paths may be literal or relative.

CONFIG files

Config files, ending in ".conf" are JSON files containing key-value pairs as described below. A config file’s values apply to the directory in which it occurs and also to any subordinate folders.


A folder name beginning with ’@’ designates a literal path. The foldername minus the ’@’ designates the database collection to which the contained files will be uploaded. The remaining folders and files nested within a literal path folder form the docids. e.g. a file in a folder structure:


will upload the file "capuana.json" to the database collection "config" with the docid "stripper/play/italian/capuana.json". Literal paths are useful for specifying images, configs and corforms.


These begin with "+" and end with "%", so the directory path:


will upload its contents to the docid "english/shakespeare/kinglear/act1/scene1".

Relative paths contain at least one subordinate folder called "MVD", "TEXT", "XML" or "MIXED". Their formats are as follows:

MVD folders

An MVD folder contains one cortex.mvd file containing all the text versions at that docid. It also must contain a folder "corcode" containing all the corcodes of that cortex, for example the file "default" in the corcode sub-folder would be the default corcode for that cortex.

Other information about the corcode or cortex may be contained in a .conf file with the same name as the cortex/corcode, e.g. for the cortex.mvd file the file cortex.conf would be a JSON file containing keys about the document. The following keys are recognised for cortexs and corcodes:

author: The author’s name

title: The title of the work

style: the docid of the desired corform

format: One of "TEXT" (cortexs) or "STIL" (corcodes)

section: The section of the document this MVD refers to e.g. "Act 1, scene 1"

version1: The short ID of the first version to display by default, e.g. "/Base/F1". (Version IDs always start with a slash, docids do not)

TEXT folders

These contain files whose names will be used to compose the short-version names. If there are subordinate folders with a TEXT folder, these are used to specify group-names. A versions.conf file may be used to specify the long names of these short-names. This consists of a single array keyed with the tag "versions". The array consists of objects with the keys "key" and "value", where "key" refers to the version short-name and "value" to its long name. The short-names must be the same as the file-names.

XML folders

May be specified as per TEXT folders, but extra config keys are recognised to facilitate import. In addition to the versions key, other recognised keys are:

corform: specifies the docid of a corform file (a CSS file wrapped in JSON) as the default format for files in this and child directories.

stripper: specifies the docid of a stripper config file to direct stripping of markup from files in this and in child directories.

splitter: specifies the docid of the splitter config to use for this and all child directories.

filter: designates the name of a Java filter program to be used for filtering text files.

Config keys recognised in TEXT and XML folders

dict: the country code name of the aspell dictionary to use for upload, e.g. ’it’ or ’en_GB’. The default behaviour of hyphens at line-end is to join the last word to the next word, by deleting the intervening line-feed and by flagging the hyphen as ’weak’. However, if the two words are both in the dictionary and the compound word (without a hyphen) is not, then the hyphen will be flagged as ’hard’.

hh_exceptions: a white-space delimited list of compound words (no hyphens) that must be hyphenated according to the rules specified above for the dict keyword. e.g. adding the compound word ’underfoot’ in a hh_exception list will cause the hyphen to be flagged as hard, i.e. ’under-foot’.

Other config keys

At the topmost level a PDEF archive should contain a .conf file with at least

base_url: The url to upload to, e.g. http://localhost:8080/


pdef-tool archive

pdef-tool -n shakespeare -r -d "english/shakespeare/*"

(the quotes are required to get around substitution by bash)

5. Calliope API

The calliope service is accessed over the Web on port 80 or 8080. The API (application programming interface) follows the REST (representational state transfer) whereby the state of the server between calls remains the same. Instead of calling procedures as with SOAP calliope responds to the basic HTTP verbs: GET, POST, PUT (update) and DELETE. Any extra information needed is passed as either in the URL (GET) or in the message body in mime-multipart format (POST).


Documents in Calliope may be specified by their docid. A Docid is a sequence of path components as in a file name, separated by slashes. A Docid has no leading or trailing slash.


A version within a multi-version document is a series of path components separated by slashes. This only applies to Cortex and Corcode collections, and only if they contain multiple versions. The last component is the version's short-name or siglum in the MVD. The other components are version-group names. A VersionID has a leading slash. Example: /C376/add0.


A CorForm is a CSS file with special rules that may contain extended Calliope properties that will be used for transformation. All other rules, selectors and properties are still usable by the browser. A Corform thus transforms and formats a CorCode/CorTex combination.

GET Methods

1. Text version list

URI format: http://host/list/DocID

Description: Get a plain text description of the versions of a CorTex MVD. Each version is quoted in its short-name form, preceded by its group-path if not at the top-level. e.g. "F1,F2,F3,F4,Q1,Q2" or "/C376/add0,/C376/base,/A88/base,/A88/add0".

2. JSON version list

URI format: http://host/json/list/DocID

Description: Get a JSON description of the versions of a CorTex MVD. The full version table is returned, including the MVD description, the full VersionIDs of each version, and their long names.

3. JSON list dictionaries

URI format: http://host/json/dicts

Description: Get a JSON description of the dictionaries installed on the server. Dictionaries help in resolving hyphenation at line-end. Two values are returned for each dictionary: language: a plain name for a language, and code: the name of the dictionary which describes its variety. If other dictionaries are desired they must first be installed via aspell.

4. HTML version list

URI format: http://host/html/list/DocID

Param NameParam ValueNotes
NAMEnameThe given name will be assigned to the <select> element when using /list/default.
FUNCTIONjavascript functionThe javascript function will be called (when using the /list/default) when the onchange event occurs on the HTML list element.
LONG_NAME_IDIDThe ID is any desired HTML ID. This specifies the ID of the element to hold the long name of the currently selected version.

Description: Gets a HTML formatted list. The HTML content is governed by a corform, which defaults to /list/default. This produces a <select> list suitable as a dropdown. If another format is desired then create another corform and place it on the server at a suitable URI. Then specify it when invoking the HTML list service.

5. HTML Text

URI format: http://host/html/DocID

Param NameParam ValueNotes
version1VersionIDThis specifies the version to fetch. Required.
SELECTED_VERSIONS"all" or a comma-separated list of VersionIDsIf this optional parameter is supplied, IDs will be added to all continuous spans of the version1 text that is shared by this set of versions.
CORCODE<n>DocIDSpecifies a CorCode to use to structure the text. There may be several with the parameter names CORCODE1, CORCODE2 etc. Defaults to "default".
STYLE<n>CorformIDSpecifies the CSS styles and transformations to apply to the text. Defaults to "default".

Description: Produces a HTML rendition of the chosen version, using the supplied CorCodes and CSS styles.

6. HTML Compare

URI format: http://host/html/compare/DocID

Param NameParam ValueNotes
version1VersionIDThis specifies the first version to compare with the second. Required.
version2VersionIDThis specifies the second version to compare with the first. Required.
diff_kindDELETED or ADDEDThis specifies how fragments unique to version 1 are to be displayed. Defaults to DELETED.
CORCODE<n>DocIDSpecifies a CorCode to use to structure the text. There may be several with the parameter names CORCODE1, CORCODE2 etc. Defaults to "default".
STYLE<n>CorformIDSpecifies the CSS styles and transformations to apply to the text. Defaults to "default". The /diffs/default CorForm will be added to all comparisons to specify coloured spans for additions deletions etc.

Description: Compares the first version with the second version. Currently only compares the CorTex. The differences are computed as a CorCode and merged with the document's own CorCode(s). The result is formatted via the specified or default CorForm and returned as a HTML fragment containing the text of version 1. A single column (div) is produced.

7. HTML Table

URI format: http://host/html/table/DocID

Param NameParam ValueNotes
version1VersionIDThis specifies the base version to appear at the bottom.
OFFSETintegerThe offset of the range within version1 to build the table from.
LENGTHintegerThe length of the range within version1 to build the table from.
WHOLE_WORDS"1" or "0"If set to "1" part-word differences are extended to whole word boundaries.
COMPACT"1" or "0"If "1" the table is compacted: that is, lines more than 90% similar are merged and a nested collapsible/expandable table is used to store the minor differences.
HIDE_MERGED"1" or "0"Wherever the text is the same across all versions, do not repeat text in versions other than the base.
SELECTED_VERSIONS"all" or a comma-separated list of VersionIDsBy default all versions encountered in the range are used.
SOME_VERSIONS"1" or "0"This parameter must be set to "1" if "SELECTED_VERSIONS" is set to anything but "all"
FIRSTIDintegerThis sets the first ID of the base version of the table for alignment with the surrounding text. Each cell receives a successive ID from this number starting with "t". These ids correspond to those in the main text base version, which start with "v". If the SELECTED_VERSION option is used then the same parameter must be passed to the get version method.

Description: Produces a HTML table of stacked variants for the specified text range in the base version. Not configurable by a corform as it is produced directly by nmerge.

8. PDEF (portable digital edition format) download

URI format: http://host/pdef/

Param NameParam ValueNotes
FORMAT"MVD" or "TEXT" [or "XML"]An array of formats, each with the key "FORMAT". In future the format "XML" will also be possible. This governs the formats in which the text and markup will be exported from the database. The default is "MVD".
DOC_IDDocIDThis should be a regular expression to specify the set of CorTex resources and their associated CorCode and CorForm resources that should be downloaded. The default is ".*", which means everything.
NAMEarchive-nameThis specifies the name of the archive file to create. Defaults to "archive".
add_required"true" or "false"If "true" all required resources such as the contents of the config database and general CorForms will also be downloaded. Default is "false".
zip_type"TAR_GZ" or "ZIP"Specify the zip format to use for the download. Defaults to "TAR_GZ".

Description: This services allows the user to download part or all of the digital scholarly editions stored on a calliope instance. It is completely handled by the commandline pdef-tool, which can also upload a previously downloaded archive. The archive format is described in the pdef-tool man page.

POST Methods

1. Mixed Import (Text/XML)

URI format: http://host/import/mixed/

Param NameParam ValueNotes
DOC_IDDocIDThe desired location in the hierarchical classification scheme for the uploaded document
STYLECorformIDThe corform ID of the desired CSS stylesheet in the database. Defaults to TEI/default.
DEMOanythingIf present this field name will disable all uploads. Server responds only with "Not enabled on public server"
dictdictionary code as returned by json/dicts/If omitted this parameter defaults to en_GB.
hh_exceptionsa space-delimited list of compound wordsThe list specifies words that will be hard-hypheanted over line-endings even if they occur in a dictionary.
FILTERfilter-nameName of a Java filter class in the package calliope.importer.filters. Defaults to "Empty". May be one of "Empty" "Poem" or "Play". Or define your own filter and add it to Calliope. A filter is a Java program to convert each plain plain text file into a CorTex+CorCode pair.
TEXTConfigIDConfiguration data for the text filter. Forms the LAST component of the ConfigID: text/filter/text. Defaults to "default". It is useful to override this setting if more than one configuration for that filter is present.
SPLITTERConfigIDDefaults to "default". Contains configuration parameters to control the splitting of XML files into separate layers or versions e.g. <add> or <del>
STRIPPERConfigIDDefaults to "default". Contains configuration parameters to control the simplification of XML tag name and attributes to properties.
TEXTConfigIDThe corform ID of the desired CSS stylesheet in the database. Defaults to TEI/default.
XSLTConfigIDThe corform ID of the desired CSS stylesheet in the database. Defaults to TEI/default.

Description: This service creates one or more CorTexs and their corresponding CorCodes based on a mime-multipart upload of files. The uploaded files are all assumed to be versions of the one work. Several CorTexs may be created however, if XML files are uploaded and the files contain external interpretations. Several CorCodes may also result from different layers of markup within the XML, e.g. page rep=ferences. In these cases the DocIDs generated on the server will be docID-notes and docID-pages respectively.