mandag 19. september 2011

Classification & ontology

"Classification & ontology - Formal approaches and access to knowledge" er tittelen på UDC's Seminar 2011. UDC står for Universal Decimal Classification, og konsortiet forklarer hovudformålet slik:

"The Universal Decimal Classification (UDC) is the world's foremost multilingual classification scheme for all fields of knowledge, a sophisticated indexing and retrieval tool. It was adapted by Paul Otlet (Rayward's Otlet page; Wikipedia entry ) and Nobel Prizewinner Henri La Fontaine from the Decimal Classification of Melvil Dewey, and first published (in French) between 1904 and 1907."

Int. UDC Seminar 2011 føregår ved det Kongelege Nederlandske Bibliotek (Koninklijke Bibliotheek) i den Haag.

(dette er ein live-blogg som ikkje er skikkeleg rydda enno..)

Patrick Hayes: On being the same as. Why something so simple is so hard.

Forsøk på oppsummering:
Patrick Hayes advarer mot å bruka owl:sameAs (mykje brukt i Linked Data-samanheng) ukritisk. Han understrekar kva sameAs faktisk betyr: at to namn/begrep peiker på den same referenten. Men ofte er det ikkje tilfelle, ofte peiker dei to namna på to referentar som vi så seier er sameAs. Det er feil! For at A sameAs B skal vera sann, må alle utsagn om 'A' også delast av 'B'. Om det berre er eitt utsagn som gjeld for 'A' og ikkje for 'B', kan vi ikkje bruka sameAs! [eg leitar etter figurane han brukte for å visa dette]

Presentation:
sameAs:
There is ONE thing; two names referring to the same referrer.
NOT two things; not two referrers that are the same
This is very important! And here is where much goes wrong and where the ontological sameAs creates confusion (or we create confusion by misusing sameAs).

A sameAs B is true when the referent og 'A' ist he same as the referent of 'B'. Everything you say about 'A' also has to be true for 'B'. If there is only one thing we can say about 'A' that isn't true for 'B', they are not the same.

Continuants vs. occurents (continuous over time vs. incidents/events that only happens once) [But that's not really true either. If we look at the tsunami back in 2004, can we say that it was an occurent or a continuant? It happened over some time.]

Fractals ruin individuation
- how long is the coastline of Britain? It depends on how you view it

Frege's puzzle
- I tell you that the morning star is the same thing as the evening star
- If true, then these names must co-refer
- But you did gain information from learning this, so you must be understanding the names not purely referentially
- Frege: "morning star" and "evening star" have different senses


What is the sense of a name?
- When we hear a name and 'understand' it, what exactly are we understanding? Apparently not (just) the referent

sameRealThingAs might be glossed as 'in the same bundle as'. Paris is a bundle of ways to look at the "actual" city. In a sense it is the same as all of them, but not strictly the logical sameAs any of them

But Occam says: You had N and you wanted 1. Now you have N + 1. Sigh!

Our logics has been put under huge stress by being put into use in the semantic web. Identity crisis.

What is an ontology? "Formalisation of a conceptualisation"

The concepts of knowledge organization systems as hubs in the Web of data
Thomas Baker, Dublin Core Metadata Initiative

Oppsummering:
Thomas Baker viste eksempel på Linked Data og korleis triplettar er bygde opp (subjekt - predikat - objekt) og at dei er bygde opp av URIar eller namn/tekst (strings).

Presentation:
URIs are like footnotes to data

VIVO - network of scientists based on linked data, developed at Cornell Univ.

Six degrees of (linked-data) separation

A paradigm shift: From records to graphs

Where is the record?
- implicit
- description pulled together "just in time" (real time)
- as opposed to the record that pulls together description "just in case" (information in place on before-hand)

Questions:

Persistence in URIs?
- mostly a social question of how to organize cultural memory

How to preserve context?
- this is one of several challenges which is been worked on, by W3C and others

Do we know that two concepts really are the same?
- instead of "enthusiastic" we should be "cautious enthusiastic"

What about maintenance when concepts change and schemes change?
- absolutely an issue, and no easy answer

Issues in publishing and aligning Web vocabularies
Guus Schreiber, Vruje Universiteit, Amsterdam

Forsøk på oppsummering:
Guus Schreiber har ei pragmatisk haldning til semantisk web og lenka data, og til ontologiar og vokabular som understøttar desse områda. Han er ein talsmann for "det enkle er ofte det beste" og viser til SKOS som eit godt alternativ for å publisera vokabular. Guus advarer også mot opphøginga av ontologiar og peiker på vokabular som verdifulle i seg sjølv og som berarar av mykje semantisk informasjon.

Det er viktig å ikkje berre publisera vokabularet ditt, men også "alignments", altså tilpasningane du gjer til ditt eige vokabular slik at det reflekterer andre, meir brukte/kjende vokabular (Guus Schreiber framhevar også bruk som sentralt: "Ein god ontologi/vokabular er ein brukt ontologi/vokabular".

Også Guus Schreiber advarer mot ukritisk bruk av owl:sameAs. Viss skos:closeMatch hadde funnest tidlegare, ville det vore den føretrekte måten å "kopla" instansar på

Presentation:
Examples drawn from Amsterdam Museum
- everything in the museum is published online as linked data

Vocabularies
- often looked down upon y ontologists
- rich semantic sources
- semantics difficult to explicate with current instruments (OWL,..)
- the Holy Grail of the universal thesaurus (vocabulary builders should be satisfied with partial coverage)

Ontologies
- a model written in an ontology language is not necessarily an ontology
- diff between ontologies & data model is not language but scope:
   - teechnique for interop.
   - "I wrote my own ontology" is a contradiction in terminis
- modesty required of ontologist
   - diff between domains: medicine -
   - "errors" are often based on subtleties of the domain

Ontological commitment
- ontologists tend to be "trigger-happy"
   - i.e. define as many axioms as possible
- over-commitment makes ontologies less usable
   - the art of being minimal
   - "in der Beschränkunk zeigt sich der Meister"
- design criterion SKOS: minimal commitment

SKOS has been a major success
Initial work: SWAD-Europe

Publication of W3C WordNet
- URLs for synsets, word senses and words plus all 17 relations in Princeton WordNet

Vocabulary conversion
- you need a structured way of doing it
- input typically XML files with vocabulary entries

Step1: Pure syntactic transf. XML -> RDF
- keep as much original information as possible
- ensure standard datatypes for value conv
- choice of concept URL
- vocabylary URL; Our strategy
    - purl strategy

Step 2: Map collection schema to SKOS on paper (alternative mapping to RDA)
- equivalences?!
- sub-properties
- mapping preserves orig schema names
- in practice

Step 3: Align with other vocabularies
Toolkit: XMLRDF
- multitude of alignment techn. available
   - direct syntactic match
   - lexical manipulation
    structured, ...
- precision & recall varies
- large evaluation initiative

Guidelines for alignment
- do not rely on one technique
- map small vocabulareis to larger ones
- manual evaluation of result almost always required
- publish alignments (not only your vocabularies but also your alignments!)

skos:closeMatch
skos:exactMatch
skos:broadMatch
skos:narrowMatch
skos:relatedMatch

In Linked Data people often use owl:sameAs which is a very precise statement and which will often be wrong!

We have not agreed on an adequate
- misuse of owl_:sameAs

The fact that sameAs works, is a technical thing, in many circumstances it is wrong. If skos:closeMatch had been there from the start, it would have been the preferred one to use

Take home messages:
- Vocabularies are (in)valuable!
- Alignment of vocabularies should have high priority in classification communities

Networks, Links & Topics
- Classifying and collaborating in the Web
Dan Brickley, Vrije Universiteit, Amsterdam

Forsøk på oppsummering:
Dan Brickleys presentasjon er ikkje lett å oppsummera. Han hadde ombestemt seg for tema like før han skulle snakka (han fekk difor presentera som sistemann i sesjonen), og det vart reflektert i presentasjonen. Den var nokså uklar og vanskeleg å få tak i.

Hovudpoenget hans var dei tre oppfatningane eller måtane vi brukar begrepa 'network'/'graph' (han sette likskapsteikn mellom desse to) og 'informasjon'. DB likte ikkje begrepet Linked Data og ville heller ha kalla det Linked Information.

Presentation:
Three notions of 'network' (or 'graph')
- "Hypertext Graph": Linked documents
- "Social Graph": Linked people
- "Factual Graph": "semantic graph", "data graph", "semantic web", "linked data"
   - descriptive
   - it can describe hypertext graphs, social graphs, ... any graphs

Three notions of 'information'
- Factual information
- Documents and artifacts
- "In people's head", skills, abilities, ...

Linked Data & Linked Information
- Factual graphs + hypertext graphs = Linked Data
   - share factual graphs using hypertext graphs
   - e.g. as rdf ocuments in the web
   - each gives a partial description

Friend of a Friend (FoaF)
- Linked data = Hypertext Factual

Mistakes to avoid:
. "docs bad, data good"
- idea that "semantic web" replaces "doc web"
- that ontologies are always better than earlier subject-based approaches
- that posting factual claims...
- ...

Three uses of RDF
- to share simple factual data directly in the web
- as metadata, to describe other useful information
- describing people

Think Linked Information, not Linked Data (data is just part of the pictures)
- SKOS and subject classification at its heart

Pat Hayes (concluding comments): Despite all the warnings from the speakers opening this seminar, the semantic web is happening right now, at an astonishing speed!


Towards a relation ontology for the Semantic Web
Dagobert Soergel, University at Buffalo (USA)


The problem:
- Linked data are built on relationships
- Relatisonship types are not standardized
- Makes finding and linking data sets difficult

The solution:
- Develop a relation ontology
- Describe each dataset

Not a new idea. Relationship type registries have been talked about for a long time in the thesaurus community with no result
With the semantis web the issue becomes more urgent
One does not get semantics by syntax alone!

Note: Relationship type = RDF type

Building a comprehensive relation ontology
- comprehensive and specific to cover all LOD data
- a relation type registry (a type of metada registry)
- should be developed and maintained collaboratively

Sources:
- Bottom up: the LD sets themselves
- Top down: existing schemes (SUMO - ontologyportal.org, FrameNet, OBO rel. ontology, UMLS Sematnic Network, CYC, Soergel 1967, DCMI and many similar schemes, markup languages)

The registry as basis for an index to datasets
- refer to datasets that use a relationship type

Need a "seed registry"
- intorduce as good pracitce (Data set owners submit <relUse> data to the extended registry system, mapping the relationship types they use to the proper relationships in the registry.

If a rel.ship type is not found, they submit a new relationship type to the registry
- need an editing community

Developing a small registry of rel.ship types needed for KOS and representing them in RDF ready to use with SKOS

Conclusion:
A rel.ship tpe registrry would bring the promise of LOD closer to reality.
We need a Wikipedia-type organization to get there.

Q&A
Dagobert Soergel: RDF restricts many to think of only binary relationships. That is not the case, RDF can express n-ary rel.ships, but it should have been done more easily than it is now. Pat Hayes: It has already been done in the recent ver. of OWL. One should not change RDF itself, but develop better input tools.

Pat Hayes: Seems you are missing the essential power of the web. The web itself is going to do this without the need for a top-down
Reusing relations are emerging from the ground up. Look at the sheer amount of triples: how could one possibly relate these to a central definition/ontology?

DS: I doubt that the glue missing now, will evolve gradually by publishing more and more linked data sets. I think there is a need for a central coorination, albeit maintained by a wide community a la Wikipedia.


A faceted classification of general concepts
Ingetraut Dahlberg (Germany)


[Ingetraut Dahlberg må seiast å vera Knowledge Organization's grand old lady (84 år!). Ho kan visa til ein imponerande biografi. Ho hadde forresten eit "søtt" spørsmål i går om semantisk web/linked data - om det er noko som verkeleg eksisterer eller om det berre er eit (prøve-)prosjekt? Kanskje spørsmålet ikkje var så dumt likevel?]

Starts out with definitions:
Facet: Comes from 'face', the French made it smaller, 'facet'

Dahlberg compares different ways of categorization and facets (Dahlberg - Aristotle - Ranganathan)
(Ranganathan did not distinguish between categories and facets)

General concepts:
Otlet (1897): Auxiliaries
Allgemeinewörte

The nine general object areas of the ICC (International Coding Classification - faceted classification system developed by I. Dahlberg)

1. Form and structure area
2. Energy and matter area
3. Cosmos and earth area
4. Bio-area
5. Human area
6. Societal area
7. Technology/production area
8. Knowledge & information area
9. Culture area

The ICC 'systematiser'
1. Theoretical and general problems
2. Objects, their kinds and their elements
3. Methods and activities exerted upon objects

4. Properties relevant or 1st kind of field specialty
5. Persons involved or 2nd kind of field specialty
6. Institutions involved or 3rd kind of field specialty

7. technology, production, or influences from outside
8. Application of methods & activities to other subject groups or fields
9. Propagation of the knowledge of a subject group or field












Ingen kommentarer: