Blogg for Svein Ølnes. Etter kvart har bloggen handla mest om Bitcoin, kryptovaluta og blokkjedeteknologi. Det hender eg også skriv om andre interesser, som jazz og bilar. Det som blir publisert her står for mi eiga rekning.
torsdag 26. mai 2011
WIMS'11 - dag 2
Peter Mika, Yahoo! Research
2001: The birth of the Semantic Web
- Scientific American article
- Web Working Symposium at Stanford
- Semantic Web standard
- EU funding (OnToKnowledge) and DAML funding
- the Semantic Web starts a career.... and so do I
2004-2006: Reality sets in
- engineers are not logicians
- humans will have to do most of the job
- no funding
2007: A Second Chance
- data first, schema second... logic third
- Linked Data
Why Semantic Search?
Improvements in IR are harder and harder to come by
- machine learning using hundreds of features
- heavy investment in computational power
Remaining challenges are not computational, but in modeling and user recognition
Poorly solved information needs
- multiple interpretations ('Paris Hilton')
- Long tail queries (george bush - the beer brewer in Arizona)
- multimedia search
- imprecise or overly precise searches
- searches for descriptions (probably the most important)
Searching for 'roi blanco' [colleague of Peter] gives fairly good organic results, but the advertisements attached are quite bad
Don't solve the sparsity problem where it doesn't exist!
Why Semantic Search? (Part II)
Swoogle - the first semantic web search engine (2007)
- The Semantic Web is now a reality (not in 2007)
- end users use keyword queries, not SPARQL
Novel search tasks
- aggregation of search results (e.g. price comparison across websites)
- analysis and prediction (e.g. world temperature by 2020)
- semantic profiling
- semantic log analysis
- support for complex tasks (search apps)
Why Semantic Search? (Part III)
There is a model
- publisheres are (increasingly) interested in making their content searchable, linkable and easier to aggregate and reuse
-
rNews: RDFa vocabulary for news articles
Facebook's Like and the Open Graph Protocol (also mentioned by Jim H.)
Semantic Search
Definition:
- makes use of the structure of the data
- exploits this understanding at some part of the search process
Data on the web
Two solutions:
- extraction using Information Extraction techn.
- NLP
- extr of triples
- filling web forms aut.
- extr from HTML tables
- wrapper induction
Have to teach your search enginge how to crawl the semantic web
- linked data
- RDFa
- SPARQL endpoints (problem of discovering the endpoints)
Data fusion
- ontology matching
- entity resolution
- blending
Query interpretation
- provide a higher level representation of queries in some conceptual space
- interpretation happens before the query is executed
Query interpretation in Semantic Search
- "Snap to grid"
- larger user involvement
- guiding the user in constructiong the queries (e.g. Freebase Suggest)
Indexing
- matching and ranking
- indexing for speeding up matching
- type of index depends on the query language to support
Semantic Search evaluation
- critical component in developing IR systems
- keyword search over RDF data
- focus on relevance, not efficiency
- real queries and real data
- TREC style evaluation
Search interface
- snippet generation (enriched serach results for pages that contain microformat or RDFa)
- adaptive and interactive pres.
- aggregatred search
- query and task templates
Sig.ma - Semantic Information Mashup (DERI)
Time Explorer
Future work in Semantic Web Search
- semi-automated ways of metadata creation (how to go from 5 % metadata to 95 %?)
- data quality (how to assess the quality of data?)
- reasoning (aut. ontology mapping, instance mapping and blending)
- scalability
- ontology reuse (how to get people to reuse ontologies?)
- display (how to aut. generate effective displays for data we don't understand, or only partially understand?)
In 2011
- the Semantic Web is still evolving (finally, a JSON syntax for RDF!)
- leaner and meaner
- bottom-up approaches
- we get some credit in other fields
- RDF data management is now a topic at VLDB, others
The Semantic Web has finally grown up - and so have I!
Sören Auer (Universität Leipzig): Creating Knowledge out of Interlinked Data
Based on the EU FP7 project LOD2
Why the Semantic Web won't work
- reasoning does not scale on the web (web scalable DL reasoning is out-of-sight)
"What is the only former Yugoslav republic in the EU?"
- this question can still not be answered by IBM's Watson
But we can do what works already now:
A global, distributed platform for data, information and knowledge integration
Linked Data Lifecycle:
...-> Interlinking/Fusing -> Classification/Enrichment -> Quality anal. -> Evolution/Repair -> Search/Browse/Explore -> Extraction -> Storage/Querying - Manual revision/authoring ->...
Extraction
- recently launched DBpedia Live (the most important part of Wikipedia are the Infoboxes which are higly interlinked in the original source)
DBpedia Live http://live.dbpedia.org/sparql (DBpedia Live is constantly updatet while the "official" DBpedia SPARQL endpoint is only updatet about twice a year)
Mappings Wiki - http://mappings.dbpedia.org
DBpedia inline: use the typed links approach to provide more information about internal links
Semantic wikis: Currently does not scale to Wikipedia's needs, but is perfectly ok for smaller wikis
LinkedGeoData - revealing the data behind OpenStreetMaps
OpenStreetMaps: "Wikipedia for GeoData"
- extremely rich source of data
- LinkedGeoData will try to exploit this rich informatioin source with Linked Data technology
Important work is the ongoing work on standardization of how to map RDB to RDF
- W3C RDB2RDF Working Group
From unstructured data:
- deploy existing NLP appr (OpenCalais, Ontos API) - NLP2RDF
From semi-structured sources
- efficient bi-directionsl synchronization
From structured sources
- declarative syntax and semantics
RDF Data Management
- still 5-50 times slower than RDBMS
- performance increases steadily
- a little performance decrease is acceptable, but not too much
DBpedia Benchmark
- uses DBpedia as data and a selection of 25 frequently executed queries
- ranking between different systems as expected, but the differences were bigger than other benchmarks
The performance gap between RDB and RDF must be reduced
More realistic benchmarks
Authoring
Two kinds of Semantic Wikis:
- semantic (text) wikis (edit the text)
OntoWiki - a semantic data wiki
- serves as LInked data/SPARQL endpoint
- semantic data wikis (edit triples)
RDFaCE - RDFa Content Editor (rdface.aksw.org)
- especially targeted for rNews
LOD Linking
- automatic
- semi-aut. (SILK, LIMES)
- manual
Interlinking challenges: Only 5 % of the information on the Data Web is actually linked
Enrichment
Linked Data is mainly instance data
Quality Analysis
Challenges: Establish measures for assessing the authority, provenance, reliability of Data Web resources
Evolution
Exploration
LOD Lifecycle supported by Debian based LOD2 stack (to be released in September)
- but will also be available through a web interface directly accessible from the web (without downloading or pre-installing)
Use cases:
- especially suited for governmental data (open data)
- Publicdata.eu
- scoreboard.lod2.eu
Michael Hausenblas (DERI, Univ. of Galway): Utilising Linked Open Data in Applications
Six steps for utilising LOD in applications:
1. Data awareness
- opendata.ie
- LOD cloud
2. Modeling
- Neologism
- Data Cube
- prefix.cc (established and controlled by DERI)
3. Publishing
- Google refine plug-in
- RDB2RDF/D2R
4. Discovery
- VoiD/DCat
- Sindice
- CKAN
5.Integration
- LATC
- Sig.ma
6. Use cases
- DERI in-house
- CSO/schools pilot
Workshop part II (Sören Auer)
Linked Data provides a global data-space with a uniform API (due to RDF as the data model)
Access methods:
- dereference URIs via HTTP GET
- SPARQL
- data dumps (RDF/XML)
When to use RDB/SQL and when to use RDF/SPARQL:
RDB/SQL:
- well defined RDB schema that won't change very much
- performance is an important issue
RDF/SPARQL
- getting more information out of your data
- highly dynamic, frequently changes in structure/schema
All in all the two technologies should be seen as complementing each others and not competing.
Ingen kommentarer:
Legg inn en kommentar