Open design issues
This is a living document to eventually include any and all design issues/questions that crop up in tDAR. The hope is to grow this into an actual spec / design document from the bottom up.
Ontology enhanced search
- The user maps data values (strings) to specific nodes in the ontology via the web interface. I've deferred this task for the time being to deal with bug fixes first but we can still have a test harness that creates these mappings internally in a hard-coded way as a way to test the ontology-based search.
- Initial ontology-enhanced-search use cases that we'd like to implement:
- synonym search, where a given search term maps into a set S of equivalent terms / synonyms (within an ontology), returning any resources / information resources with hits for any term within S. There are some issues here that I think we need to clarify before we can properly implement it:
- where will the synonyms / equivalence classes be generated? Are they editable by users or expected to be already encoded in the ontology?
- how should we represent the synonyms for a given node in the ontology? In our metadata RDBMS or within the OWL file itself as <owl:sameAs> or <owl:equivalentClass> elements, e.g.,
<owl:Class rdf:ID="FootballTeam"> <owl:sameAs rdf:resource="http://sports.org/US#SoccerTeam"/> </owl:Class>
or
<owl:Class rdf:ID="Wine"> <owl:equivalentClass rdf:resource="&vin;Wine"/> </owl:Class>
- synonym search, where a given search term maps into a set S of equivalent terms / synonyms (within an ontology), returning any resources / information resources with hits for any term within S. There are some issues here that I think we need to clarify before we can properly implement it:
-
- children-of search, where a given search term maps to a particular node in an ontology (potentially using the synonyms from the previous point), and all children of that node in the ontology (including synonyms?) are relevant to the search.
Representing ontologies and mapping to nodes in an ontology.
Use node URI, e.g., http://www.tdar.org/ontology/master/fauna#Artiodactyla or just a relative URI like #Artiodactyla, and to publish user-uploaded ontologies to an accessible location, e.g., http://www.tdar.org/ontologies/<resource-id>/2009/12/09/fauna.owl - all of our "master" ontologies would be published in a similar manner.
Resources
Resources in tDAR contain the numerous pieces of metadata and bookkeeping that we collect. These include keywords, temporal and geographic context as well as linkages to other resources.
InformationResources are important in particular. Add more detail here.
Relationships between resources
Many-to-many relationships between resources are captured in the resource_relationships table and ResourceRelationship/ResourceRelationshipService/ResourceRelationshipDao. These linkages are only established when created by a person that is not the submitter of the ResourceS in question.
Specific relationships between resources (1-many) are generally kept between the specific resource entities for convenience purposes.
For example, InformationResourceS always belong to a Project which can be retrieved via InformationResource.getProject().
- CodingSheetS and OntologyS can be associated with an arbitrary number of ResourceS but have a specific parent Project with which they were submitted. In order to get the Project that they were originally submitted with, use their getProject() method.
- If I associate an external CodingSheet (one that I didn't create) with a Project this will create a ResourceRelationship with ResourceRelationshipType.CODING_SHEET_PROJECT and are encoded in a ResourceRelationship as the first Resource (gettable via ResourceRelationship.getFirst()). The Project associated with the CodingSheet can be retrieved via getSecond(). To get all CodingSheets associated with a Project use ResourceRelationshipService.getCodingSheets(Project).
Translating data table columns with a coding sheet.
Data table columns can be annotated with a variety of metadata. When a data table column is associated with a coding sheet we will translate that column.
Re-using metadata mapping
If someone uses the same table structure / column names / etc., and wants to re-apply the metadata mapping they used for a previous dataset, how do they apply it there? Right now a metadata mapping may be associated with a coding sheet and so if the user selects a coding sheet to translate their dataset perhaps they automatically select the same metadata mapping for that coding sheet... will have to think about this more in the future.
File storage scheme
Currently there is a file.store.location property that acts as the root directory for wherever files should be placed (this can be either an absolute path or a relative path). In a production system this is probably best put in an absolute path that is backed up on a regular basis. We should come up with some consistent naming / path convention relative to this file.store.location root directory.
Authenticated access
View and search access to metadata should be open and not require authentication. Downloading the actual uploaded file or inspecting data does require authentication.
URL mappings
Enhance current URLs with a more RESTful set of URLs? This may be beneficial for future interoperability efforts.
Fedora Objects
The following items have been identified as those which will be first-class objects in Fedora:
- Project
- Dataset
- Ontology
- Coding Sheet
- Image
- Document
- Citation
It is assumed that archiving will be done at the project level, when archiving is initiated all child (associated
) resources will be updated or added as appropriate.
The relationships between the objects (expressed in Fedora as RELS-EXT) can come from the RELS-EXT ontology or from one of our own making. For instance, we can use info:fedora/fedora-system:def/relations-external#isMemberOfCollection or create our own http://tdar.org/namespace/relations#isMemberOfProject.
What are some of the relationships that we want to capture?
- isCodingSheetFor
- isOntologyOf – hasOntology