Associations Between the Principle of Provenance and Metadata: Practical Implementation and Cases

: The principle of provenance is the fundamental theory of archives science, which also possesses important reference value in various areas of research and practice. This paper commences with investigating the core value of the principle of provenance, considering the principle to be the unity of theory and method and its core value lies in following historicalism and respecting essential attributes of the archives. Then the authors focus on the realm of government information management and explore the metadata implementation of the core value of the principle of provenance in the era of electronic records. In its essence, the principle of provenance requires full understanding and preservation of the source information of the record. In the age of electronic records this has ascended to the significance of obtaining "contextual information", which is often stabilized as metadata in electronic records, reflecting the process of the record’s creation, maintenance and long term preservation. Source information as the contextual information reflecting the record creation process, its typical form of expression in metadata is provenance metadata. Therefore the authors explore the implications of provenance metadata and its essential forms. The authors hold that source information is the indispensable characteristic information in records and archives


Introduction
The concept of provenance has for many decades been a focus of archival discourse, and it has indeed formed the basis of a core archival principle, namely the Principle of Provenance, which was first prescribed for use in Denmark as instruction on the arrangement of financial archives [1]. In more recent times, the need to manage, preserve, and make accessible new digital forms of records has prompted archivists to re-examine the principle of provenance and its actual practice. Meanwhile, many other disciplines, such as library and information science, computer science, and digital forensics have also shown a keen interest in provenance and begun to research actively on what it is and how it can be effectively represented in different contexts [2].
The provenance of records can be represented through metadata. [3]. In the digital environment, metadata associated Practical Implementation and Cases with or embedded into records may provide relevant information on the provenance of the records about themselves or the systems in which they reside. Provenasnce, as a key form of metadata, can assist in evaluating the quality of the record as well as its trustworthiness [4]. Recently there has been some discussion of provenance concepts on diverse metadata in digital libraries [5,6]. However, there is very little discussion in the literature on the application of provenance concepts to the metadata of government records. Extending the discussion to this area would be of academic significance to further the exploration of the notions of provenance, as well as their potential uses and implementations in various contexts.
In the context of government information management, access to information with clear provenance ensures that government records and data can be trusted and reliably used. This paper, based on the core value of provenance, aims to explore how the Principle of Provenance is associated with metadata in the realm of government information management, especially how traditional definitions and applications of provenance may be enriched and expanded via provenance metadata.
The remainder of this paper is organized as follows. The conception and core values of the principle of provenance are discussed in Section 2, which leads to the analysis of metadata manifestation of the core value in the next section. In Section 3, the authors explain the concept of source information, highlight the relationship between provenance metadata and the core value of principle of provenance, and make concrete exposition on composition and function of provenance metadata. This paper concludes in Section 4, where the authors also make their suggestions for future research and practice.

Core Values of the Principle of Provenance
The principle of provenance is the fundamental theory of archival science [7]. With good understanding of its core values people will be able to successfully apply archival theories in various contexts.

Understanding Principle of Provenance: Unity of Principle and Method
The principle of provenance can be simply understood as organizing and arranging records according to institutions (agents) creating them. Actually this understanding interprets it squarely opposite to the principle of pertinence (a principle of arranging records based on content, without regard for their provenance or original order). Although this is easy to understand, it is just the start for a full interpretation of the principle of provenance. The principle of provenance has gone through a process of continual development. Its genesis is the principle "Respect des fonds" of France. Germany's "Registraturprinzip" christened it. The Dutch Manual made systematic theoretical arguments on it for the first time. The 1898 International Congress of Archives in Brussels formally established it as the basic principle of archival science. Afterwards the principle has undergone international development. Countries such as UK, France, USA, Germany and USSR combine their national contexts to understand and apply the principle of provenance from different perspectives. In the era of electronic records it also experiences the process of challenge and rediscovery. Its theoretical connotations can thus be continually deepened and developed.
Many famous archival science scholars proposed different understandings of the principle of provenance. Among them the understanding of renowned German archival science scholar Brenneke is the most comprehensive. He considered the principle of provenance to be the arrangement principle, as well as organization principle and research principle. [8] Regarding Brenneke's ideas, American archival science scholar Schellenberg explained --as arrangement principle, "it enables archives to have their own classification units"; as organization principle, "it results in archival depositories and libraries to be distinguished in fundamental methodology, so that the independence of the archival profession is guaranteed"; as research principle "it preserves the evidential value of archives, thus facilitating researchers to explore the process of historical development of events and clarify its entirety". [9] The concise generalizations of these two archival science scholars interpreted the connotations of the principle of provenance. These three aspects (arrangement, organization, research) have big significance in correctly understanding the principle of provenance.
Considered from the developmental history of the principle of provenance and interpretations on it by archival science scholars, The principle is exactly a foundational theory of archival science, which taking provenance as the guiding thought and fonds as the unit of arrangement. [10] With provenance as the core guiding it, from the outside it demands respect of the creator of the records, from the inside the original sequence of the records, essentially reflecting a principle: With fonds as the core arrangement unit, in practice, it demands fonds to be the first level of classification and management unit of archives [11], as a methodological manifestation. Of course the practical value of the principle of provenance is not only demonstrated in arrangement of records. It also possesses directing significance in value appraisal, searching and utilization of archives. The principle of provenance has both theoretical and practical sides. Thus it is a unity of principle and method.

Grasp the Core Value: Following Historicalism and Respecting Natural Attributes of Archives
The core value is the most stable and most ontological standard in the developmental process of the principle of provenance. Unlike concrete methods which can be applied in various modes (e.g., UK, USA and USSR have different interpretations of the unit of arrangement of archives), the core value enables the principle of provenance to always keep its theoretical buttressing role to archival science while the principle undergoes development and changes. The core value also guides new thinking and new practices emerging during archival work. The principle of provenance's core value is exactly the following of historicalism and the respect of the natural attributes of archives, which enable archives to have an existence independent of other kinds of documents and archival science to be independent from information science and library science. [12] As a method for humans to understand reality and get a handle on the world, the nature of historicalism is a direct identification of objective reality. As a methodology, historicalism is more about emphasizing respecting the historical, the objective and the facts. [13] Archives are historical records formed in practices and activities of social organizations and individuals. Only by adhering to the method of historicalism, namely arrangement in accordance with the original appearance which the archives came into being, would it be able to maintain and reflect the original picture of the creators' activities. [14] In the process of arranging archival entities, preserving the historical linkages between records is the concrete application of the historicalism method. [15] From principle of pertinence to principle of provenance, then to "new concept of provenance", the theory is always in line with the historicalism thinking. The wrestling between principle of pertinence and principle of provenance is actually the battle between historicalism and logicalism in the control of the realm of management of archival entities. Logicalism is the pondering and reconstruction of events, while historicalism is the comprehension and copy-painting. The principle of provenance demands archives from the same creator to be preserved independently and fully, as well as respecting the original sequence of arrangement of the records to the utmost, so as to truly reflect the original historical picture of the creator's activities. [16] The emergence of "new concept of provenance" requires archivists to obtain various data related to the creation, preservation and utilization of electronic records. Only with that would the creation process of records be better reflected and the pristine condition of the archives maintained. What is emanated here is also historicalism. As Prof Huang Xiaoyu remarked, principle of provenance "hinges its spirit on historicalism". The principle is the concrete manifestation of historicalism. It seeks the basis for arrangement from the reality of archives' objective presence. It successfully found an objective categorical unit most capable of maintaining the original face and intrinsic linkages during the formation of the archive --Fonds. [17] The essential attribute of archives is the nature of original record. Archives are the paramount carriers of the country and the national culture building up the memory of its civilization and inheriting its history. Archives are condensates of history and the "image capture" of human activities in history. [15] Essential attributes of archives also demand that archival practice must follow the method of historicalism. The respect of the original record in principle of provenance has continually developed and deepened since the principle's birth. The French "Respect des fonds" emphasizes preserving a fonds' independence and integrity. The German "Registraturprinzip" demands maintaining the original sequence of arrangement in the archives. Anglo-American "archive group" and "record group" as well as USSR and Germany's "fonds theory" and "free provenance" are concepts of disparate countries' archives personnel hoisting the banner of the principle of provenance. All these terms are basic units of archival depositories managing archival entities, for the sake of preserving the objective "provenance" in order to gather record entities to constitute different archival entities. [18] "Concept provenance" and "fonds alienation" in the era of electronic records emphasize the original linkage between records and the process of their creation, as well as the logical structural relationship between records. The employment of metadata greatly enhanced the authenticity, integrity, reliability and usability of records. Why the principle of provenance could become the theoretical pillar of archival science? It can draw the line between archival science and other disciplines. It demonstrates and maintains the natural attributes of archives, establish the theoretical basis and methodological principle of the archival profession and play the fundamental undergirding role of archival science. [19]

Metadata Manifestation of the Core Value
From its nature, the principle of provenance demands the full understanding and preserving of the source information of the records. In the electronic era it has risen to the significance of getting hold of "contextual information", which is often stabilized in electronic records in the form of metadata. They reflect the process of the record's creation, maintenance and long term preservation.

Source Information: Natural Form for Expression of Connotations of "Provenance"
The renowned information scholar and mathematician Shannon considers, "Information is what's used to eliminate random uncertainty". [20] "Provenance" in the principle of provenance needs to be expressed and controlled by information. Source information is the indispensable information of features in records. It is also the natural form for the principle of provenance to express "provenance".
Early source information was mainly based on "entity provenance", referring to the record's creator information or creating institution information. In entry 380 of Dictionary of Archival Terminology published in 1984 by the International Council on Archives it is written, "Provenance --The agency, institution, organization or individual that created, accumulated and maintained records/archives in the conduct of its business prior to their transfer to a records centre/archive". [21] It can be seen that the understanding in the terminology dictionary largely refers to the creating Practical Implementation and Cases institution of the records. In essence the record creator, simple and clear, becomes the entity arrangement principle for records. This principle enables every record to be imprinted with the characteristic information of the record creator, i.e., record creator information. In the age of electronic records, with the popularization of information technology and the Internet as well as increase in large scale collaboration of organizations and institutions, the creation of records becomes more and more complex. Together with the heterogeneity of record mediums and the changeability of archives information, singular creator information becomes ever more unsuitable to developments in archives work. From these "new concept of provenance" emerges. The source information from it based on "concept provenance" enriches record creation information, expanding it from the record's creator information to the background information of the record's creation process. Certainly, creator information is a factor involved in the record creation process. The traditional provenance (mainly institutional or organizational aspect) also becomes an aspect of the new provenance. New concept of provenance is more overall in scope compared to the traditional principle of provenance. [22] In reality, a record's characteristics need to be reflected through record information. Contents information, contextual information and structural information are the three basic aspects of record information. [23] The new concept of provenance realizes the paradigm shift from "entity" to "context" in source information. In essence the source information in electronic records is a kind of contextual information. However it cannot be equated with contextual information per se. It mainly reflects the contextual information in the course of creation of the record. Concretely, it refers to the information concerned with the record creator, the process of the record's creation and the context, from its inception to its transferral to be preserved as archive. Apart from source information, a record's contextual information also includes preservation information and usage information. The exact relationship can be seen in Figure 1. Among them, preservation information refers to information about the record's registration, maintenance, long term preservation, etc. created in the process of centralized upkeep by the archival depository body after the record has been transferred there. Usage information refers to information about user, usage method, usage time, etc. created in the process of using it as an archive piece after the record is filed. In the era of electronic records, the principle of provenance has even broader practical significance. It is concretely shown in the indispensable function of source information on the searching, appraisal and arrangement of electronic records and confirmation of their evidential value, renewing and expanding the manners of realization in practice of archives management. Thus the principle of provenance will not become obsolete. It will be side by side with management of electronic records. [18]

Provenance Metadata: Important Means for Realizing the Value of Principle of Provenance
The strength of metadata lies in that its connotations, structure and contents can all be clearly defined, as well as easy to understand and work with. As long as metadata is recorded and preserved accurately following the standard all along in the whole process of creating, managing and using electronic records, the original historical picture of the record's creation can be objectively displayed and maintained. [19] Moreau further identifies "Provenance as Annotations" such as Dublin Core metadata standard provide structure and semantics to metadata of resources in design, noting that many aspects of these schemas are provenance related, such as author, creation date, and version. Moreau argues that such metadata can also be seen as a specialization of a process-oriented definition of provenance. [24] Provenance metadata realizes the core value of the principle of provenance, displaying important functions in ensuring the authenticity, integrity, reliability and usability of electronic records, as well as being the basis for their long term preservation and usage.

Provenance Metadata Inheriting Historicalism and
Natural Attributes of Archives Concerning the characteristics of electronic records, the new concept of provenance faces squarely the insufficiency of traditional "institutional provenance" and introduces the relatively abstract "record creation process" as the expression of provenance. Since this concept is comparatively abstract and not easy to grasp, Chinese and foreign archives professionals broadly adopted the concept of "metadata" to denote the source information of electronic records' creation. [25] However not all source information are metadata. "Only data from description of background information or the inductive process of automatically discerning, separating, extracting and analyzing from computers and network systems belong to metadata." [26] A record's source information has to be preserved through description. At the same time it would also facilitate the searching and using of records. Prof Feng Huiling remarks, "In the world of electronic records, without the concrete, compound and fluid, source information and confirmation of the record creation process are lost. Thus often a record's authenticity and reliability cannot be confirmed. Conversely, if we possess the detailed record of the creation process of an electronic record, as if a birth certificate for every record, we will then have the basis for proving that record's authenticity." [27] This kind of detailed record is metadata. It can be seen that for every record, metadata is as important as a birth certificate.
The principle of provenance is the theoretical basis for the file specification record in archives that possesses very high significance for the information value of archives, especially regarding guaranteeing the credential value. [28] Metadata is the exhibition of the file specification record of records. Making use of metadata to further describe, manage and display source information, the main mode of expression is provenance metadata, as one way of source information being metadata. History is the origin of provenance metadata. Provenance metadata is a type of metadata reflecting the record and its creator, activities in its creation and the environment of its creation, as well as the historical linkage between the record and other records. Provenance metadata objectively displays the archives as historical records. Prof He Jiasun considers describing the source information of electronic records in metadata can guarantee their evidential value from the root. [29] Since provenance metadata can preserve the historical connections of a record's creation, maintain the originality and credential competence of the record, display the record's integrity and its systematic form; thus compared to the abstract record creation process provenance metadata is easier to understand and grasp. Furthermore provenance metadata can record all the original information in the process of the record's creation. "It is the best reflection of the organic connection of the record, truthfully displaying the historical picture of the archive's formation." [30] Therefore provenance metadata inherits the historicalism followed and the natural attributes of archives respected in the core value of the principle of provenance. It is an important manner of realizing the core value of the principle of provenance in the era of electronic records.

Composition and Function of Provenance Metadata
Provenance metadata is composed of entities and elements. Concretely speaking what's included are metadata entities such as institution, business, environment, relationship and time. They record source information of the record from creation until transferral to depository. In practice provenance metadata entities can be divided into multiple elements (see Table 1): Institution metadata records information of units, persons and departments responsible for the creation of the electronic records. Main attributes of the metadata include full names, abbreviations and history of units and departments as well as function, titles, eponyms and names in translation of individuals. Institution metadata exhibits respect of the record creator which the traditional principle of provenance emphasizes. The function lies in reflecting combined information of the record creator and clarifying the responsible person in the creation of the electronic records, facilitating its classification according to the creator in the traditional sense. The metadata can also enable information of the record's responsible person be the entrance to provide search service, ensuring the clarity of the electronic record source itself and the record's credential competence.
Business metadata records information of the business activity in the creation of the electronic record. Since the record is a product of the institution engaging in official business, business metadata is most capable of reflecting the purpose for creating the record. Sub-elements of business metadata can include: function of the institution, e.g., function area information such as party and public work, financial management, administrative management; institutional activity (conducted in the range of duties), e.g., information on activities such as budget management, account management, financial supervision under the function of financial management; actual business matter (based on institutional activity), e.g., business matter information such as foreign exchange budgeting, budget changes, infrastructure budgeting, budget execution actually carried out in budget management activities. [31] Business metadata exhibits the "record creation process" the new concept of provenance concerns. The metadata's function lies in reflecting the business activity information in the creation of the record, clarifying the process and purpose of the creation of the electronic record to facilitate classification according to official function. At the same time business metadata also provides business background reference for appraising the record and a provenance index based on official function. It also greatly enhances the richness and efficiency in searching electronic records, ensuring their reliability and authenticity.
Environment metadata records the software and hardware information, geographical position information as well as legal and policy standards information in the creation of the electronic record. Examples include standards the encoding format followed, is the electronic record under specific genre stipulation (e.g., electronic official record), is the responsible person of the electronic record creating it under some authorized permission. Environment metadata exhibits the regard to the external environment in the creation of the record. Its function lies in reflecting the integrated environmental context information during the creation. Among the information recorded, software and hardware environment information ensures the usability of the electronic record, facilitating its preservation and usage in posterity; geographical position information ascertains the authentic creation of the electronic record and helps retracing its historical face; legal and policy standards information ensures the credential competence and reliability of the electronic record, facilitating regulated management and legitimate reference usage afterwards.
Relationship metadata records the hierarchical relationship when the electronic record is filed, such as multiple types of hierarchical information like "fonds-category-file-record", "group-series-file-record". At the same time, also included are relationships among electronic records and relationship with other information, e.g., information such as see-reference, quotation, main text-appendix, new and old versions, received-reply. Relationship metadata is both development of the traditional fonds system in the era of electronic records and respect of the historical connections in the creation of the record. Its function is revealing the organic connections in the process of the creation of the electronic records, reflecting the complex and variegated relationships between records as well as between records and information, enabling the fonds structure and external connections of each record to be clear and visible, thus facilitating searching and usage of electronic records.
Time metadata records the creation time and the time of transferral to archives of the electronic record. The former is its initial point. The latter marks the electronic record transferred and preserved as electronic archive. Time metadata exhibits the vertical relationship in the creation of the electronic record. The function is enabling clarification of the key nodes in the electronic record's life cycle, facilitating analysis of the course of events during the record's creation and ensuring the electronic record's reliability.

Provenance Metadata in Electronic Records
Management Metadata Standards Individual metadata schemes or standards should be designed according to national or international standards, not only for the purpose of interoperability, but also because they usually include specific metadata elements conveying information on provenance.
ISAAR-CPF (International Standard Archival Authority Record for Corporate Bodies, Persons and Families) which the International Council on Archives compiled in 2004 and EAC (CPF) (Encoded Archival Context for Corporate Bodies, Persons and Families) provide overall principles respectively from semantics and grammar on the standardization of archival description on record creator and record creation background. [32] For the two, ISAAR-CPF commences from the four aspects of the characteristics of background information of electronic records, namely designation area, description area, relationship area, control area to stipulate the contextual information's logical structure, showing concern to the standardization of metadata elements in name of creator, creation date, function and activities, place, legal authorization, relationships, mandate, etc. A route of thought is provided to the standardization of the entities and elements of provenance metadata. Apart from the above special markup standards of contextual information of electronic records, the authors have also surveyed the provenance metadata involved within existing international, national and regional electronic records metadata standards. The results can be seen in Tables  2 and 3. Although there are differences in the application levels and objects of the standards as well as the delineation of the entities and elements in the two being not identical, to the utmost they have included (partially) provenance metadata such as institution, business, environment, relationship, and time.
The international level is shown in Table 2. Time metadata involved in the international standard ISO23081 is largely exhibited on the entity level. The standard emphasizes more on the logical classification in composition of elements. Regarding element set, the level is not deep. The standard's practical operability is not high. However as the international standard it provides a fine logical framework for laying down forthcoming related standards. It is noteworthy that in ISO23081 the time metadata is included in time elements of the record object and not as an entity on its own. This illustrates that compared to other provenance metadata, the independence of time metadata is not high. This is related to record movement having stages and thus it is a part of the time factor. In comparison, the types of provenance metadata in Encoded Archival Description (EAD) are richer. EAD mainly classifies on the level of element. It describes record provenance more fully and has a higher operability. As an example, in the institution metadata, apart from noting information of the institution itself there is also its historical information. This exhibits respect of the historical picture of the archive's formation, enabling records in the collection from the same source to be displayed together through description. Furthermore the relationship metadata not just includes description of relationship between records and between them and information; there are levels of classification for the records, exhibiting the principle of multiple level description. [33] Thus the logical historical connection of archives in the fonds is reflected, enabling metadata to have levels in the search tool and forming a systematic relationship from the whole to its parts. [34] United Nations Standards on Recordkeeping Metadata is between the two. It has metadata both quite rich and quite meager, e.g., the relationship metadata includes information of relationship and hierarchy while the environment metadata only consists of location information.  From national and local jurisdiction levels, this paper has selected countries more representative in electronic records management --China, Australia, UK, Canada and USA. For the USA, since what are issued from the Federal Government are mostly policy papers in form of guides (e.g., "Metadata Guidance for the Permanent Transfer of Electronic Records" [38] by National Archives), here the authors have selected more typical local standards (Minnesota Recordkeeping Metadata Standard). It can be discovered that there are different degrees of differences and similarities in the provenance metadata for metadata standards of electronic records for different countries and local jurisdictions. First, considered from the range of coverage of provenance metadata types in the standards, apart from the UK not including business metadata, other countries and local jurisdictions have all included the five main types of provenance metadata. Then, considered from choice in the entity and element levels, China and Australia have chosen the level of entity, except for time metadata. Other countries and local jurisdictions are on the level of element. Lastly, evaluated from the degree of richness of provenance metadata, Government of Canada Records Management Metadata Standard exhibits the richest content. It embodies different sub-elements in various provenance metadata, delineating them more fully and concretely.

Conclusion
In the realm of government information management, the principle of provenance further manifests its core value in the form of metadata and blooms new vivacity. Just as one researcher considers, "Principles and methods of electronic records management with universal suitability will only come from application of principles and concepts already widely employed and used in records management in traditional environments." [44] In future, it is necessary to establish a specific provenance metadata scheme or standard of government electronic records even though existing national and international metadata standards for government electronic records already contain some specific provenance elements. Those general standards may not represent adequately the complexity of concepts of provenance. [45] It is also important that provenance of digital records should Changle Tang and Xiaojuan Zhang: Associations Between the Principle of Provenance and Metadata: Practical Implementation and Cases be captured from metadata generated automatically when digital records are created, revised and used [46]. This follows a path first suggested by Bearman, that is, "archivists should find, not make, the information in their descriptive systems" [47]. In addition, in future research, to develop the models of interoperability of metadata to govern and represent provenance in a cross-domain environment could be an important topic to be investigated.