A new relationship for multidisciplinary knowledge organization systems: dependence Claudio Gnoli 1, Mela Bosch 2, Fulvio Mazzocchi 3 1University of Pavia. Department of Mathematics, Pavia, Italy. .
2 University of La Plata. Faculty of Journalism, La Plata, Argentina. .
3 National Research Council. Institute for Atmospheric pollution. Environmental knowledge organization laboratory, Monterotondo staz. (Rome), Italy.
Most existing knowledge organization systems (KOS) are based on disciplines. However, as research is increasingly multidisciplinary, scholars need tools allowing them to explore relations between phenomena throughout the whole spectrum of knowledge. We focus on the dependence relationship, holding between one phenomenon and those at lower integrative levels on which it depends for its existence, like alpinism on mountains, and mountains on rocks. This relationship was first described by D.J. Foskett in the context of CRG's work towards a non-disciplinary scheme. We discuss its possible status and representation in three kinds of KOS: thesauri, classification schemes, and ontologies. In thesaural structures, dependence could be one of the subtypes of associative relationships (RT), which should be defined according to several authors in order to enrich their semantic functions. In classification, it could act together with hierarchy as a structuring principle, providing a way of connecting and sorting main classes based on integrative levels. In ontologies, it could be defined as a dependsOn direct slot, expressing the fact that through it a class does not inherit all properties of the other class on which it depends. We argue that providing search interfaces with cross-disciplinary links of this kind can give users more adequate tools to examine the recorded knowledge through creative paths overcoming some limitations of its canonical segmentation into disciplines. Keywords:classification, dependence, disciplines vs. phenomena, integrative levels, ontologies, relationships, thesauri
Disciplines vs. phenomena
Disciplinarity is a key feature of knowledge organization systems (KOSs), as most of them are structured primarily according to disciplines. Special KOSs are focused on one single domain of knowledge, which usually corresponds to a discipline, as is the case with the Medical Subject Headings or the Agrovoc thesaurus. This gives them some practical advantages, but also produces some problems in indexing topics that are marginal for that discipline, or belonging to a different discipline but nevertheless being relevant to the literature of the present field: to cope with such cases, indexers need to refer to at least an outline of a general scheme [Foskett 1991]. General KOSs, on the other hand, are often not much more than an aggregate of special schemes, one for each discipline [Kyle 1959]. This can be seen quite clearly in faceted classification schemes, where each disciplinary main class, like chemistry, economics, or literature, has its own set of facets and subclasses, having little to do with those of the other main classes. In this sense, Colon Classification has been said to be a meta-classification [Gatto 2006], where the only structure shared throughout all disciplines are the fundamental categories of Personality, Matter, Energy, Space, and Time, serving as an ordering pattern of facets. Thus, also general schemes are based on disciplines, or in Langridge's  words on the "forms of knowledge", rather than the phenomena studied by them. In this situation, multidisciplinary search is problematic: as each discipline is a separate universe with its own hierarchical structure, its own facets, its own terminology, and its own notation, searching for a given concept across multiple disciplines is difficult, and many relevant items can be lost as they are filed under a different term or notation. Indeed, several authoritative voices have suggested that KOSs should be improved in order to allow for better multidisciplinary search. The International Council for Scientific and Technical information has started a group on Interdiscipinary searching, giving recommendations to standardize databases across disciplines [Weisgerber 1993]. Beghtol [1995, 1998a, 1998b] notices that multidisciplinary research has increased much, so that "a paradigm shift" is needed to support it by more flexible and hospitable systems. On a similar line, Williamson  claims that "it is absolutely imperative that the search for, and the development of, other kinds of classificatory structures continue", and suggests that "the first division of knowledge would be the phenomena with the disciplinary field subordinated to it". This idea is also put forward by Szostak [2004, 2007], who wishes a "classification by phenomena, theory, and method" as three separate components of an analytico-synthetic system, where the phenomenon studied should be first in the citation order. McIlwaine  supports more conservative solutions, suggesting that multidisciplinary topics be coped with by revising traditional KOSs like UDC. However, Weinberg  believes that "the Dewey Decimal Classification is not an appropriate scheme for organizing electronic documents because its primary facet is discipline, not concrete topic".
Classification by integrative levels
English members of the Classification Research Group (CRG) were among the first to realize that the disciplinary approach produces problematic constraints to the indexing and organization of knowledge. Kyle  claimed that "the classification-maker should break away from the terminology peculiar to each discipline and endeavour to reduce these terms to underlying concepts". The Group explored the possibility of a non-disciplinary general scheme. In absence of disciplines, some other general principle was needed to keep together the list of phenomena treated by the literature, and to sort them in a predictable order: this was identified in the notion of integrative levels of increasing organization and complexity – from elementary particles, through molecules and cells, up to organisms, societies, and cultural products – in which phenomena can be distributed. In the CRG draft scheme, main classes are not disciplines but phenomena of increasing integrative levels, and classmarks can be built by combination of phenomena in reversed order of levels: the subject "alpine oak-groves" can be expressed as forests : oaks : mountains [Foskett 1961, Austin 1969, Gnoli & Poli 2004]. On the basis of the previous CRG work, the Integrative Level Classification (ILC) research project was started in 2004 within the ISKO Italian chapter [ISKO Italia 2004, Hong 2005]. The project is testing non-disciplinary classification by integrative levels in bibliographic samples of different domains, like local culture, bioacoustics, and facet analysis itself. Web interfaces are produced, allowing users to exploit freely faceted notation in retrieving and sorting relevant bibliographic references [Gnoli & Merli 2005, Gnoli 2006, Gnoli & Hong 2006]. Integrative level KOSs are basically structured according to two principles. One is the classical hierarchical relationship holding between a class and its subclasses. The other is the relationship holding between two integrative levels: for example, between mountains and rocks, or between alpinism and mountains, or between society and individuals. According to the theory, as formulated by philosophers like James Feibleman and Nicolai Hartmann, the level above depends on the level below to exist (there cannot be mountains without rocks, or societies without individuals), but at the same time it has a more complex organization with new emergent properties, which make the higher level an essentially different thing [Foskett 1961, Gnoli & Poli 2004]. We call this a dependence relationship. The term implies its asymmetry: while alpinism requires mountains to be performed, mountains themselves can happily exist even in the absence of alpinism. Foskett  already suggested that this relationship should be used as an access tool in a KOS based on integrative levels: "If we index only at the level of the term naming a complex as a whole, such as laundry or parliament , are we in danger of forgetting that documents on these topics may be valuable to those who study particular qualities, such as "hot, damp workplaces", or "the conduct of formal meetings", but who are unaware of their presence in the complexes? [...] A new feature has to be consciously incorporated in the alphabetical index, namely, a system of references from certain combinations of terms to at least the level above. [...] The whole trend of modern research, in the humanities and social sciences as well as in the natural sciences, proves that some of the most fruitful investigations spring from awareness of these less obvious relations. If each level derives its parts from the level below, it seems reasonable to use this as a principle in making upward references. I surmise that only the upward references would be necessary." In the following, we will examine how the dependence relationship could be incorporated in several kinds of KOS, like thesauri, classification schemes, and ontologies.
Dependence in thesaural structures
Three basic types of relationships are defined in thesauri: hierarchical (BT-NT) and associative (RT) relationships at the conceptual level, plus equivalence (UF-USE) which mainly concerns lexicalization. Recent works [ALA 1997; Schmitz-Esser 1999; Tudhope et al. 2001; Soergelet al.2004;Mazzocchi & Plini 2007] have suggested that information retrieval could be improved by allowing thesauri to record a richer variety of relationships. The traditional structure is, in fact, considered not refined enough, and lacking in a well-defined semantics. A more detailed relational structure seems to be required, for example, in order to enhance the suitability of thesauri for uses in artificial intelligence and Semantic Web applications, or to increase the capability of simultaneous searching of networked vocabularies. Moreover, the standard structure itself is not always applied correctly – relationships not provided with a precise semantics are most likely implied in such trend – with the result that many existing thesauri suffer with structural inconsistency. Some advanced thesauri, mainly in the medical domain, have already introduced or are introducing additional relationships. Another example of this can be found in the Italian CNR’s EARTh (Environmental Applications Reference Thesaurus) project [Mazzocchi & Plini 2007]. Other projects are concerned with the reengineering of thesauri into ontologies. An example is the attempt to convert the FAO’s Agrovoc thesaurus into an ontology of agriculture [Soergel et al. 2004]. In order to ensure compatibility with existing thesauri, as well as interoperability between systems adopting different strategies, it seems necessary to maintain the shared core set of three main relationships at the top of the relational structure. Hence, in order to enrich the structure, hierarchical, associative and equivalence relationships have to be differentiated into subtypes. The dependence relationship discussed here could be one of the possible RT subtypes. RT relationships are not easy to specify, as they concern a heterogeneous set of relations among terms, which are not hierarchically but thematically based. A number of studies have explored the possibility of refining RT structures better by enriching their specification and semantics in order to improve information retrieval. The ALA proposal , for example, includes about 100 subtypes of associative relationship. This shows the wide range of possibilities for the identification of RTs, which most likely form an inherently open category. Their relevance is also connected to the features of the domain or of the operational context. In this sense, Tudhope et al.  have proposed a restricted expansion of the RTs at the second level, and a richer domain specialization at lower levels. The same authors have also emphasized the need to differentiate between the subtypes deriving from the application of some heuristics (one of the pragmatical methods to identify RTs is the occurrence of one term in any definition of the other [International Organization for Standardization 1986]), and those originating in refining the RT semantics for retrieval purposes. Thus, although so far it has not been considered for such a role, dependence can find its place among the possible subtypes of the associative relationship, in a form like depends on/is necessary for. Coming back to the above example, in order to make alpinism possible, the existence of a mountain is a necessary but not sufficient criterion. Obviously, other conditions should be satisfied, e.g. there should be a living being provided with a culture and able to perform highly complex activities as sports. Another example of dependence is the one between plants, belonging to the integrative level of organisms, and forests, belonging to the higher level of ecosystems. Forests depend on plants, as they could not exist without them. This relationship could be alternatively expressed as a kind of partitive relationship, the collection-member relationship [Winston et al. 1987]. Collection-member indicates membership in a collection and is determined on the basis of spatial or temporal proximity, or by a social connection. However, such an approach risks to fall into the shortcomings of reductionism. In the emergentist perspective of integrative levels – but also from an ecological point of view – a forest is not just a collection of plants, but an integrated system consisting of a complex network of relations between organisms and their inorganic environment. Therefore, in order to represent such complexity, a different kind of partitive relationship would be necessary. The relationship between forest and plants could be, instead, regarded as dependence.
Dependence in classification
Classification schemes are based on hierarchical relationships; other kinds of relationship can be expressed by common auxiliaries, facets, and phases [Broughton 2004]. Users can browse a classification scheme by moving from a class to a more general or a more specific one, e.g. from Km mountains to K landforms. In addition to this, a dependence link like that wished by Foskett would allow one to move from a class to a dependent class in a different part of the scheme, e.g. from Km mountains to Xwou alpinism. The related phenomena will be listed in different parts of the schedules: mountains belong to the class of landforms, while alpinism belongs to the class of sport, laying at the higher level of human activity and arts. Therefore the relationship will not be expressed in notation. However it can be recorded by a link: K landforms
X art and leisure works
Xw games, sports
Xwo open air sports
Xwou alpinism « Km The user searching for "mountains" will thus have a hint that she can expand her research to include alpinism too. This example has been used in the ILC search interface of the bibliography on the Apennine local culture. In a classification of phenomena by integrative levels, main classes will be connected to each other by the dependence relationship: E atoms
F molecules « E
G bulk matter « F
H rocks « G
I celestial objects « H
As it can be seen, dependence also provides a principle according to which classes can be ordered, instead of being just listed in a canonical sequence. On the other hand, the resulting structure is not always linear: a first major branching was noticed by the CRG [Foskett 1961, Gnoli & Poli 2004] to occur between inorganic and organic phenomena, which both depend on aggregates of molecules: H bulk matter « F
L cells « F Other branches occur at many levels, especially the higher ones which according to Poli  can be described more as tangled than as linear structures. Therefore, criteria should be established to convert branching conceptual structures into the linear order needed for browsing and expressed by notation. L cells « F
M organisms « L
N populations « M
O perception « M
P consciousness « O
Q signals, language « O
R communities « Q P M
T artifacts « R H
U wealth « T
V institutions « U R
X art works « T Q P The storage of these links in a database, and their exploitation in dynamic Web search interfaces, is described by Gnoli & Hong . On a more philosophical side, the notion of dependence can help to shed light on the general principles by which phenomena can be classified. Gnoli [in prep.] discusses how the two basic principles of common origin and similarity can be combined and applied to the classification of phenomena: dependence is clearly connected with this, as new levels of phenomena are originated from pre-existing ones, and at the same time are different enough to show emergent properties which make them worth forming a new class.
Dependence in ontologies
As is now recognized, ontology is a formal explicit description of concepts in a domain of discourse. An ontology is constituted by classes (sometimes called concepts). Classes can have subclasses, representing concepts more specific than their superclass. When there are sets of individual instances of classes, the ontology constitutes a knowledge base. All classes have [Noy & McGuinness 2001]:
slots (sometimes called roles or properties), that is the properties of each concept describing various features and attributes of it. All subclasses of a class inherit its slots. A slot should be attached to the most general class that can have that property;
facets (sometimes called role restrictions); restrictions on slots can have different facets describing the value type, allowed values, the number of the values (cardinality), and other features of the values the slot can take.
Notice that this meaning of facet is completely different than in thesauri and classification schemes. Indeed, the latter rather corresponds to that of slots, while "facets" in classification express relationships that can be subsumed under general categories, such as Matter, Energy, Space or Time. These can be designated in an ontology as general properties to be inherited. Ontologies allow one to express any kind of relationships, provided they have been defined, usually in the form of slots. Thus, dependence could be incorporated in an ontology by defining it as, say, the dependsOn slot. Together with the hierarchical relationship (isA), it could act as a basic structuring principle of an ontology based on integrative levels.
ILC caption/OWL class
ILC dependence symbol
OWL direct slot and facet
We are currently exploring the possible conversion of the ILC classification scheme into an ontology, using the OWL language and the Protégé ontology editor. As we are starting from a general scheme, the result will be a general ontology, also known as an upper ontology or a top-level ontology. The utility of upper ontologies has been questioned, as they tend to express the worldviews of their authors, not necessarily shared by other users [Shirky 2005]. Anyway, this question concerns any kind of KOS, and does not prevent one from using it for practical purposes. A more technical question is whether the dependence relationship is representable in a satisfying way with the OWL syntax. OWL is based on description logic, a subcategory of first order logic, where crossed links can generate problems. Can such a multidisciplinary relationship be expressed without violating the logical rules of the ontology network? Can the deductive properties typical of ontologies, such as the following, be applied to it? Alpinism dependsOn Mountains
Mountains dependsOn Rocks
→ Alpinism dependsOn Rocks
Not all properties of a class are inherited through the dependence relationship: e.g., texture and acidity are properties of rocks, but not of alpinism. It is possible to express this situation by using Protégé direct slots, i.e. the slots attached directly to a class. Therefore we have to model dependence as a restriction of an association, rather than a deductive relationship. The diagram in Figure 1 represents this type of restriction in UML (Unified Modeling Language). We are aware that UML has some limitations regarding ontology development; on the other hand, it is easy to understand by humans and it has great potential for describing Web resources in a machine accessible way [Cranefield 2001]. Fig. 1: UML diagram of the association with restriction dependsOn.
Independently from its technical implementation, the notion of dependence appears to be potentially useful to link related concepts at different integrative levels, and thus to help searchers crossing disciplinary boundaries. Most KOSs justify their disciplinary structure by the assumption that users, while searching for information, will follow the disciplinary organization they are familiar with. This may be an effective way to reproduce the literary warrant faithfully. However, the function of knowledge organization is not only to represent the existent literature, but also to suggest new paths of research through the discovery of relations in published knowledge. To the latter purpose, cross-disciplinary relations must be represented and made searchable. Projects like Szostak's and ILC go in this direction. Cross-disciplinary principles, such as integrative levels or general systems, seem to be especially suitable to index and retrieve contemporary knowledge, as this is more and more interdisciplinary and interconnected. While theories like domain analysis [Hjørland & Albrechtsen 1995] emphasize the differences between terminologies and conceptualizations in separated research communities, it seems that our age of planetary information exchange also requires interoperable subject tools, being able to work independently on any particular domain, and to serve people who are looking throughout the whole of knowledge.
References American Library Association . Final report to the ALCTS/CCS Subject analysis committee. 1997. . [Consulted: 3 nov. 2006].
Austin, D. Prospects for a new general classification. Journal of librarianship, 1969, 1, n. 3, p. 149-169.
Beghtol, C. Knowledge domains: multidisciplinarity and bibliographic classification systems. Knowledge organization, 1998, 25, n. 1-2, p. 1-12.
Beghtol, C. General classification systems: structural principles for multidisciplinary specification. In: Mustafa el Hadi, W.; Maniez, J.; Pollitt, S.A. (eds.). Structures and relations in knowledge organization: proc. 5th International ISKO conference, Lille, 25-29 aug. 1998. W�zburg: Ergon, 1998, p. 89-96.
Broughton, V. Essential classification. London: Facet, 2004.
Cranefield, S. Networked knowledge representation and exchange using UML and RDF [electronic resource]. Journal of digital information, 2001, 1, n. 8. . [Consulted: 3 nov. 2006].
Dextre Clarke, S.G. Thesaural relationships. In: Bean C. & Green R. (eds.). Relationships in the organization of knowledge. Dordrecht: Kluwer, 2001, p. 37-52.
Fisher, D.H. From thesauri towards ontologies?In: Mustafa el Hadi, W.; Maniez, J.; Pollitt, S.A. (eds.). Structures and relations in knowledge organization: proc. 5th Int. ISKO Conference, Lille, 25-29 aug. Würzburg: Ergon, 1998, p. 18-30.
Foskett, D.J. Classification and integrative levels. In: Foskett, D.J; Palmer, B.I. (eds.). The Sayers memorial volume. London: Library Association, 1961, p. 136-150. Republished in: Chan, L.M.; Richmond, P.A.; Svenonius, E. (eds.). Theory of subject analysis. Littleton: Libraries unlimited, 1985, p. 210-220.
Foskett, D.J. Concerning general and special classifications. International classification, 1991, 18, n. 2, p. 87-91.
Gatto, E. Variazione locale e comunicabilità globale [electronic resource]. In: Classificare la documentazione locale: giornata di studio, San Giorgio di Nogaro, 17 dic. 2005. ISKO Italia. . [Consulted: 11 nov. 2006].
Gnoli, C. The meaning of facets in non-disciplinary classification. In: Budin, G.; Swertz, C.; Mitgutsch, K. (eds.). Knowledge organization for a global learning society: proc. 9th International ISKO conference, Vienna, 4-7 jul. 2006. Würzburg: Ergon, 2006, p. 11-18.
Gnoli, C. Phylogenetic classification. Knowledge organization, 2006, 33, in prep.
Gnoli , C.; Hong M. Freely faceted classification for Web-based information retrieval. New review of hypermedia & multimedia, 2006, 12, n. 1, p. 63-81.
Gnoli, C.; Merli, G. Notazione e interfaccia di ricerca per una classificazione a livelli. AIDA informazioni, 2005, 23, n. 1-2, p. 57-72.
Gnoli, C.; Poli , R. Levels of reality and levels of representation. Knowledge organization, 2004, 31, n. 3, p. 151-160.
Hjørland, B.; Albrechtsen, H. Toward a new horizon in information science: domain-analysis. Journal of the American society for information science, 1995, 46, n. 6, p. 400-425.
Hong M. A phenomenon approach to faceted classification [in Japanese]. In: Japan society of Library and information science 53th conference, Keio university, 22-23 oct. 2005. English abstract: . [Consulted: 11 nov. 2006].
International Organization for Standardization. ISO 2788:1986: Guidelines for the establishment and development of monolingual thesauri. 2nd. ed. Geneva: ISO, 1986.
ISKO Italia. Integrative level classification: research project [electronic resource]. 2004-. . [Consulted: 11 nov. 2006].
Kyle, B. Towards a classification for social science literature. American documentation. 1958, 9, p. 168-183.
Kyle, B. An examination of some of the problems involved in drafting general classifications and some proposals for their solution. Review of documentation, 1959, 26, n. 1, p. 17-21.
Langridge, D. Bliss, the disciplines and the New Age. Bliss classification bulletin, 1992, 34, p. 8-13.
Mazzocchi, F.; Plini, P. Refining thesaurus relational structure: implications and opportunities. In: Compatibility and heterogeneity, ethics and future of knowledge organization: proc. 10th German ISKO conference, Vienna, 3-5 jul. 2006. Würzburg: Ergon, 2007.
McIlwaine, I.C. Interdisciplinarity: a new retrieval problem? In: Beghtol, C.; Howarth, L.C.; Williamson, N. (eds.). Dynamism and stability in knowledge organization: proc. 6th International ISKO conference, Toronto, 10-13 jul. 2000. Würzburg: Ergon, 2000, p. 261-267.
Noy, N.F.; McGuinness D.L. Ontology development 101: a guide to creating your first ontology. Stanford University, 2001.. [Consulted: 3 nov. 2006].
Poli, R. The basic problem of the theory of levels of reality. Axiomathes, 2001, 12, n. 3-4, p. 261-283.
Schmitz-Esser , W. Thesaurus and beyond: an advanced formula for linguistic engineering and information retrieval. Knowledge organization, 1999, 26, n. 1, p. 10-22.
Shirky, C. Ontology is overrated: categories, links, and tags[electronic resource]. In:Clay Shirky's writings about the Internet. 2005. . [Consulted: 11 nov. 2006].
Soergel , D. [et al.]. Reengineering thesauri for new applications: the Agrovoc example [electronic resource]. Journal of digital information, 2004, 4, n. 4. . [Consulted: 3 nov. 2006].
Szostak, R. Classification, interdiscipinarity, and the study of science. Journal of documentation, 2007, 63, in prep.
Trigari, M. Old problems in a new environment: the impact of the Internet on multilingual thesauri as research interfaces. In:Thesauri and taxonomies: an international conference and workshop: Multites, London, 29-30 sep. 2003.
Tudhope, D.; Alani, H.; Jones, C. Augmenting thesauri relationships: possibilities for retrieval [electronic resource]. Journal of digital information, 2001, 1, n. 8. . [Consulted: 3 nov. 2006].
Weinberg, B.H. Complexity in indexing systems: abandonment and failure: implications for organizing the Internet [electronic resource]. In: Hardin, S. (ed.). Global complexity: information chaos and control: proc. 59th ASIS annual meeting. ASIS, 1996.. [Consulted: 11 nov. 2006].
Weisgerber, D.W. Interdiscipinary searching: problems and suggested remedies. Journal of documentation, 1993, 49, n. 3, p. 231-254.
Williamson, N. An interdisciplinary world and discipline based classification. In: Mustafa el Hadi, W.; Maniez, J.; Pollitt, S.A. (eds.). Structures and relations in knowledge organization: proc. 5th International ISKO conference, Lille, 25-29 aug. 1998. Würzburg: Ergon, 1998, p. 116-124.
Winston, M.E.; Chaffin, R.; Herrmann, D.A taxonomy of part-whole relations. Cognitive science, 1987, 11, p. 417-444.