The Strongest Link: Libraries and Linked Data (6)

2010/11/18   点击数:1168

[作者] lisgirl

[单位] lis_girlの部落格

[摘要] The current stable version is Drupal 6, which does not include core support for RDF, but there are several contributed modules to produce RDF export (including evoc which allows you to import external vocabularies to Drupal, and expose those classes and properties to other Drupal modules for reuse). In Drupal 7 the ability to map the data structure to RDF and expose this in RDFa will be ported to the Drupal 7 core. This means that even if the site manager has no knowledge about RDF, Drupal 7 sites will expose the common elements like title, author, date, etc. as RDFa.

[关键词]  Libraries Linked Data



Appendix - Semantic Web Tools for Libraries

RDF Converters

·MARC/MODS RDFizer

·Marc2rdf-Modeler

·OAI-PMH RDFizer

·BibTEX -> RDF

·Dublin Core -> RDF Crosswalk

·DC.Metadata Gen

·Simile Project RDFizers

·D2R Server

RDF Publishing Tools

Drupal 6 & 7

The current stable version is Drupal 6, which does not include core support for RDF, but there are several contributed modules to produce RDF export (including evoc which allows you to import external vocabularies to Drupal, and expose those classes and properties to other Drupal modules for reuse). In Drupal 7 the ability to map the data structure to RDF and expose this in RDFa will be ported to the Drupal 7 core. This means that even if the site manager has no knowledge about RDF, Drupal 7 sites will expose the common elements like title, author, date, etc. as RDFa.

Eprints 3.2.1+

Eprints is open-source repository software that is widely deployed in libraries around the world. As of version 3.2.1 Eprints has included several semantic elements, including Export formats: RDF+XML, N3, N-Triples, URIs for derived entities like authors, events, or locations, and an extendable RDF system that uses the BIBO Ontology by default.

Fedora

Popular open source digital library platform Fedora is natively semantic, with an integrated RDF triple store called Mulgara. Fedora allows libraries to offer RDF output of record level citations.

Zemanta

Bloggers can incorporate RDF content by using semantic tagging services. Zemanta is a real-time semantic analysis tool that plugs into Movable Type, TypePad, and Drupal . As you blog, Zemanta performs on the fly term extraction, disambiguates by examining the surrounding context, and suggests appropriate enrichment material.

Semantic Media Wiki

Ontoprise has developed a set of Semantic MediaWiki (SMW)+ extensions to MediaWiki that provide a community-based environment for authoring ontologies and creating semantically enhanced wikis. Semantic mark-up of end-user data is enabled through structured webforms, easy-to-use tagging and annotation tools. Ability to output data in a variety of visual formats, as well as to XML standards like vCard & BibteX.

OpenCalais Semantic Proxy

SemanticProxy translates the content of any URL on the web to its semantic representation in RDF, HTML or Microformats. Give SemanticProxy.com the address of a web page. Get rich semantic metadata about the people, companies, events and relationships on that page.

Calais Viewer

To see the entity extraction process in action paste some unstructured text into the Calais Viewer. It will return the major entities, topics, and relationships. The Calais web service automatically attaches rich semantic metadata to the content you submit. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person "x" works for company "y"), and events (person "z" was appointed chairman of company "y" on date "x")

Virtuoso

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional RDBMS, ORDBMS, virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. The open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso.

Talis Platform

Supporting data publishers and developers, the Talis platform provides dedicated cloud storage for RDF data stores. The content and metadata becomes immediately accessible over the Web and discoverable using both SPARQL and a free text search system with built in ranking of results according to relevance to the search terms.

Linking Hubs

The major production linking hub of interest to libraries is the Library of Congress Authorities and Vocabularies Service, which offers all LCSH, including Genre/form headings, Children's subject headings, Subdivision records, and Validation records in SKOS and JSON formats. In the future, the Library of Congress plans to release other vocabularies this way including the Thesaurus of Graphic Materials, MARC Geographic Area Codes, MARC Language Codes, and MARC Relator Codes.

The Virtual International Authority File (VIAF), a joint project of several national libraries with support from OCLC, was released in linked data format in September 2009. It contains over 10 million personal names drawn from 17 participating institutions.

There are many other linking hubs that may be useful to libraries, including Dewey Summaries, MESH headings, and RAMEAU subject headings from the French national Library.

Ontologies & Schemas

MarcOnt is an initiative in development that attempts to capture concepts from MARC21 and other legacy bibliographic systems and transform them into machine-readable data. It involves a set of tools including an Ontology, Mediations Services, RDF Translator, MarcOnt Portal and is closely connected to the JeromeDL semantic digital library platform.

Bibliographic Ontology Specification, known as BIBO, has as its goal to "be used as a citation ontology, as a document classification ontology, or simply as a way to describe any kind of document in RDF." It is used as a format for exporting data from several semantic platforms, including Talis Aspire.

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) was developed to solve the problem of indentifying an aggregation of resources on the web, such as a Flickr photo stream. OAI-ORE uses Resource Maps to provide aggregations with individual URIs and machine-readable data describing it, including information about the aggregation itself (like who created it) as well as information about the relationships between the resources in the aggregation. OAI-ORE can be expressed in several semantic formats such as RDFa, RDF/XML, turtle and n3. A powerful tool for digital libraries, OAI-ORE is being used by Chronicling America to provide linked data about aggregations of items such as the individual pages in an issue of a newspaper.

Other ontologies/schemas that may be of use to libraries include the EDI Ontology and the FRBR Ontology from SchemaWeb.

Projects

LIBRIS is a Swedish union catalogue that contains approximately six million bibliographic records from 175 libraries. LIBRIS contains a URI for every resource and exposes data using FOAF, SKOS, BIBO and Dublin Core. It also links to external linked data sources such as dbpedia and LCSH.

The Library of Congress has developed its Chronicling America Digitized Newspaper Database and Directory as linked data, using Dublin Core, Bibliographic Ontology, FOAF, and Object Reuse and Exchange (OAI-ORE). The database contains digital views of historical newspapers as well as a directory of newspapers in the United States, 1690 to the present.

The Open Library has a goal of providing "One web page for every book". It provides a URI for each item in its system. The project is currently exploring other linked data approaches in order to enhance "the ability for people to connect our records with many more systems online."

Journal publishers are also doing work with semantic publishing:

The DBLP database provides bibliographic information on major computer science journals and conference proceedings. The database contains more than 800,000 articles and 400,000 authors. The D2R Server is based on the XML dump of the DBLP database, which has been read into a MySQL database. The complete RDF view on the database consists of approximately 15 million RDF triples, which are served in small easily consumable chunks.

The Royal Society of Chemistry launched its Prospect Structure Search in 2007. It uses enhanced HTML to incorporate standard metadata in articles that links related articles together and allows for search on structures and sub-structures in RSC articles.

Elsevier has experimented with Structured Digital Abstracts (SDAs) in the journal FEBS Letters. The abstracts of ninety papers were converted to machine-readable data, with information including all named entities such as genes and the results of the papers using controlled vocabularies.

Other efforts include the ChemSpider Journal of Chemistry and Public Library of Science (PLoS) Neglected Tropical Diseases (NTD).

Hosted on the Talis Platform, the Linked Periodicals Project provides a conversion of the journal metadata provided by CrossRef, Highwire and the NLM outputs citations as RDF/XML, JSON, Turtle.

原文连接:http://blog.sina.com.cn/s/blog_6c1cadb10100n4av.html