While the name and emphasis of the project, as well as the practical ways to achieve its aims has been changed over the more than 30 years of its existence, a constant has been to provide access to carefully curated sources, but not insist on one single understanding or interpretation of these sources. Like on an archaeological excavation site, multiple layers of interpretation might be in existence at the same time.
If there had to be a leading principle guiding us here, one might look to Morimoto Kakuzō 森本角蔵, who in 1921 published a concordance to the four Confucian classics called 四書索引. As the first of his usage notes, he gave 理屈よりも便利を主としました — When in doubt, I preferred convenience over principle.
The list in Main features has been on earlier iterations of a TLS websites and is reproduced here for your reference - all of these statements are still not wrong, but they reflect the perspective of earlier stages of this project and not necessarily the current understanding.
What we are trying to do here in these pages is to document the evolving project, which now can be characterized as an attempt to collaboratively and interactively explore the written cultural tradition of the East-Asian cultural hemisphere, as far as it is based on variations of pre-modern Chinese.
In an ideal world, this database would be compiled with the highest degree of confidence on each and every information item contained within, but this would make it impossible to ever publish, so we have to live with the current bricoulage of bits and pieces, some elaborate and detailed, other rough and uneven with embarrassing gaps everywhere. But since this is a collaborative project, everybody is invited to cure the ills detected — for the delectation of all users.
Very broadly speaking, there are two main parts, that together make up the TLS:
- A corpus of texts
- Records of observations made on these texts
1. The text corpus
A list of available texts can be directly visited here, The TLS text list (on HXWD); there is also TLS Text list in this manual, which gives some background on the scope and organization of the list.
Generally speaking, texts have to be prepared in a very specific format to be available for annotation, however it is now not necessarily required to actually have the text stored in the database, but rather we are moving towards a confederated system of distributed text databases; see Search in the Kanseki Repository and Setting up a text repository for more information.
2. Observations
Many observations can be recorded with respect to the texts. Some are to semantic units on a phrase level, others are on spans of textual content beyond the level of a phrase.
Attributions
We call the phrase level observations attributions. They are organized on several levels. At the most basic level, that is the level closest to the text, they record at least the syntactic function in the phrase, an explication of the meaning to be understood at this instance. In practice, they are grouped together for a phrase (line of text in the view presented in our system) and possibly a translation of this line, into a group of information nodes we call a Citation. One such instance is called a Syntactic Word Location (SWL), which ties a Syntactic Word (SW) to a specific location in the text. The SWs are organized with other SWs of a similar semantic field that use the same characters to collectively form what might be called, for sake of simplicity and in the absence of a better appellation a Word. These words are then organized together with other words of a similar semantic field into broader groups, which are called Concepts. Again, this is just a convenient name for this group of words, which might otherwise also be called ‘a set of near-synonyms’ which are given a label to be able to conveniently talk about it. The concepts itself are also organized in a taxonomic hierarchies along different axis, most importantly the ‘kind-of’ and ‘is-a’ relationship.
Annotation of larger textual units
(Work in progress)
—draft
There is a wide variety of annotation types available, additionally, some of these annotations are organized in taxonomic hierarchies. It is tempting to call this a dictionary, especially there is a lot of emphasis on linguistic annotation, but that would be misleading — it is more like citations as the raw material for a dictionary, since it lacks coherently edited entries that organize these citations according to editorial principles.
The following annotation types are available:
- Concepts
- Translations
- Word relations
- Rhetorical Devices and other observations.
- Citations: A composite annotation of Syntactic Word at a specific location in a text, possibly with translation.