工厂
TermFactory Manual

© Lauri Carlson 2007


Table of contents
Version history
Index

This manual documents the TermFactory ontology based terminology repository, tools, and workflow.

Repository

(TF 0.0) The current TermFactory ontology resides in http://grapson.com/TF/TermFactory.pprj.

Ontology

The TermFactory ontology is a multilingual terminological ontology conformant to terminology and other language technology standards, rich enough to support semantic inference and language technology applications.

TermFactory is currently couched in the OWL (Full) Web Ontology Language and edited with the de facto standard ontology editor Protege (Stanford).

Subontologies

The demo TF ontology imports the following subontologies:

space.owl
mobilite space ontology downloaded from web
yso-r.owl
yso sub ontology under letter R extracted from FinnOnto thesaurus ontology using sesame rdf
rakli8.owl
building maintenance ontology converted from PDF via TBX
paperi.owl
fi-de-en paper industry terminology converted from MultiTerm format
4m.owl
the 4M project ontology converted to TF

Bridge ontologies

For importing third party OWL ontologies to TF, bridge ontologies may be used. Bridge ontologies are (preferably relatively small) ontologies which interface between other ontologies. They import the component ontologies as is plus define the contact points where ontologies "plug in" to one another, by adding properties or concepts to relate them.

For instance, an expert of the YSO ontology is embedded into TermFactory using the bridge ontology YSO_bridge.owl. It embeds the root concept(s) of the expert under the appropriate node of the TermFactory ontology, and places any implied concepts imported from third party ontologies by the excerpt under node YSO_bridge in the TermFactory bridge namespace. (Without a bridge ontology, recursive imports will clutter the recipient ontology with irrelevant 'orphan' root concepts.)

Query imports

In the ContentFactory project, plans are to extend owl:import element with element of form <cf:query>...</cf:query> which contains a query specifying a part of the ontology to import. (The precise syntax to be defined.) Pending editor support for them, we need a converter which converts query imports into subontologies. This may be a specialisation of the sesame rdf extractor.

Formats

This section documents formats used in TF.

MultiTerm.dtd and MultiTerm.xsd

The TF demo includes a partial XML grammar for the MultiTerm xml export format in both DTD and XSchema forms.

Tools

This section documents tools used in TF.

Editors

This section documents editors for manual refinement of ontology data.

Protege

(TF 0.0) The current version of Protege is version 3.2.1 It is downloadable from http://protege.stanford.edu/. After installation of Protege, the TF top ontology can be downloaded from the TF repository as a Progege project by name TermFactory.pprj.

XMLMind TBX addon

The TermFactory demo includes a plugin (configuration addon) for the free XML document editor XMLMind which allows structured editing of the LISA Oscar TBX terminology format in WYSIWYG mode.

Validators

This section documents format validation tools used in TF.

Conversions

This section documents conversions to and from TF.

tbx2owl: TBX to OWL conversion

This section is a guide to the conversion of a terminology in the Term Base Exchange (TBX) format into TF. (TF 0.0) The converter tbx2owl.xsl is an XSL(T) script which transforms a TBX xml document into an OWL RDF/XML document.

Terminator2: TF to cparse conversion

A TF terminológy can be converted into multilingual lexicons for the cparse parser/generator. This allows parsing and generating NL text using vocabulary stored in TF. (TF 0.0) The converter (Terminator2) is implemented in Java using the Jena RDF library.

FM2TF.java: 4M ontology converted to TF

The 4M project multilingual ontology has been converted to TF with a Java Jena OWL converter.

MultiTerm2xhtml and MultiTerm2FO

Conversion scripts MultiTerm2xhtml.xsl and MultiTerm2FO.xsl convert MultiTerm vocabularies into xhtml and FO formats. Using the Apache FOP processor, MultiTerm vocabularies get further converted into multilingual PDF (Chinese).

Querying

This section documents tools for querying ontologies to produce extracts or views.

Extracting YSO using Sesame 2.0

The FinnOnto YSO thesaurus ontology is too big to handle with Protege or Jena tools in the space available on normal desktop computers. We used the Sesame 2 RDF repository library and query language seRQL to extract a manageable sized coherent subset from YSO around concepts matching a given pattern.

The extraction tool currently consists of the following pieces:

SuperGraph.java
a java script built on the Sesame rdf library which reads the yso ontology, finds all concepts matching a given pattern, and extracts an upward closure of the matched concepts under the yso schema from yso. The superclasses of each concept are included in the closure, but narrower, related or associated concepts are not included recursively.
YSO_schema.owl
a manually extracted schema subset of YSO.owl
YSO_header.owl
an RDF header file to be included in the extract
YSO.owl
local copy of the YSO ontology
foaf.owl
local copy of the Friend of a Friend ontology referenced by YSO
skos.owl
local copy of the Simple Knowledge Organization Systems ontology referenced by YSO

The hits are marked up with string property hit:pattern which holds the search pattern. They can be retrieved from the extract using the Protege SWRL query tool with the query SELECT ?subj WHERE ?subj hit:pattern ?obj.

The SuperGraph script is called with

java SuperGraph YSO.owl YSO_schema.owl YSO_header.owl pattern

The pattern is given as a sesame query language SERCL expression where 'string' refers to the string value of the rdfs:label of a class, for example

"string LIKE \"*rakennus*\" AND NOT string like \"*kehon*\"".

Workflow

This section documents workflows and best practices in TF based terminology work.

Contents

Version history

References

Standards

Other

Appendices

TermFactory top ontology

TBX to OWL converter

Index of topics

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
ontology
P
Protege
Parser
Q
R
References
S
T
U
V
W

(End of file)

Valid XHTML 1.0 Strict