Should I model internal identifiers or fields as part of my mapping?

Priority 2, Best Practice Established

Problem Statement

Within our published data, there are many fields that exist as part of our internal systems, but do not have any meaning outside the system. Database IDs, Object IDs, timestamps, or generated GUIDs are examples of these fields.

Are these fields important to model as part of our mapping?

Best Practice:

Fields that have no use outside of the system do not need to be mapped. However, it is often difficult to determine what fields could be useful to an external user. For example, reconciliation against internal identifiers would require that internal ID to be present in the URI, and often legacy systems that exist within your institution count as "external users".

Our recommendation is that any ID that has been used by any system be mapped as part of the linked data. These IDs should be assigned a meaningful, unique type, and these types should be given a meaningful label.

We also recommend that the ID most commonly used by humans should be the mapped using the P48 has preferred identifier predicate. For museum objects, this is often the accession number, but which ID is up to the mapping instution.

Discussion:

(From Duane)

Both object ID and accession number should be emitted as identifiers in the Linked Data.

(From David)

Do we need to distinguish between different institutional terminology for things when labeling fields? (Object ID vs Accession Number)

(From Rob)

I would answer the first question as no, if the semantics are actually the same and thus the data is comparable. We should maintain some degree of separation between the semantic data and the way that’s rendered to users by different applications. If (e.g.) Princeton wishes to use Object ID in their application, and Colby prefer Accession Number, no problem. On the other hand, if there is a real semantic difference between those two, we should model them that way.

(From Rob, via email, 9/19/2016)

P48 seems pointless in RDF. The preferred identifier in Linked Open Data is surely the URI?

Otherwise agree with the discussion that identifiers are important to map.

(From David, via email, 9/19/2016)

I could argue that it's good to have a preferred human-readable ID, but I don't know that it's that big a deal, so I'm not going to.

Reference:

Linked Open Data FAQs

Defining Types

How do I specify types for entities?
Priority 1
✔ Best Practice Established
How do I specify types for predicates?
Priority 1
✔ Best Practice Established
What existing extensions to the CIDOC-CRM should I use?
Priority 2
✔ Best Practice Established

Defining URL Structures

What URL should I use for unknown Actors?
Priority 2
✔ Best Practice Established
What is the root URL for each AAC Partner?
Priority 2
✔ Best Practice Established
What is returned when a URL is dereferenced?
Priority 3
✔ Best Practice Established
Which ID is most appropriate for URL construction?
Priority 3
✔ Best Practice Established

Labeling

What are best practices for modeling text strings?
Priority 2
✔ Best Practice Established
What is best practice for labeling external authorities?
Priority 3
✔ Best Practice Established
How do I handle strings in languages other than english?
Priority 4
✔ Best Practice Established

Modeling

How do I handle complexity in knowledge representation?
Priority 1
✔ Best Practice Established
How do I model lists of entities or multiple values?
Priority 2
✔ Best Practice Established
How should I model parts of Actor names?
Priority 2
✔ Best Practice Established

Reconciliation

How do I reconcile objects to authorities?
Priority 2
✔ Best Practice Established
Which entity should I link to in an authority file?
Priority 2
✔ Best Practice Established

Triplestores, RDF, and Inferencing

Which namespace should I use for the CIDOC CRM as LOD?
Priority 1
✔ Best Practice Established
How do I create an RDF representation of an entity?
Priority 3
✔ Best Practice Established
Where should AAC-created vocabularies be hosted?
Priority 3
✔ Best Practice Established
What serialization of RDF should I publish?
Priority 4
✔ Best Practice Established