Archival Data
FAQ: Use of Archival Data
Archival Data are data that were collected in the past, usually for a purpose other than research. The complete data set must exist prior to the initiation of a research project using the data.
Archival data generally falls under the following categories:
- Publicly available data sets - Data collected by various government agencies and academic institutions make their data available to the public for research purposes. Any data set that is made available to the public and does not require special permission to access the data is considered a publicly available data set.
- Private data sets - Private data sets may include (but are not limited to): data collected previously by another researcher for another study, data collected by another agency for evaluative or research purposes, or your own data that you collected for a previous study. Private data sets generally require permission to access the data, and the Board will need to know that you will obtain (or have already obtained) proper permission to access the data. Except in the case where you already own rights to the data, if you have access to the data as part of your profession but do not “own” the data, you will need to obtain permission to use it for your research. For example, if you are a principal at school and have access to student records as part of your job, you cannot access student records for research purposes without the proper permission first.
- Private records - Private records are data that were not collected with the intent to conduct research, but instead exist for the purpose of collecting information on individuals for the individual’s own sake. For example, student records, medical records, credit histories, etc., are private records that are maintained by agencies other than the individual but contain personal information about the individual. Some of these records are collected by government agencies and, by law, are accessible to the public, thus they fall under the publicly-available data sets category. Private records are governed by privacy laws and regulations, thus requiring special permission to access the records as well as additional safeguards for using the data.
Generally, IRB review is required for projects that are considered research involving human subjects.

Oral History activities are typically designed to record, preserve and at times, interpret specific historical events or the experiences of individuals.
- If the oral history activities, such as open-ended interviews, are done without any intent to draw conclusions or generalize findings, then this research would NOT constitute “research” as defined in 45 CFR 46.
- Having said this, there are times when there are oral history activities, such as a systematic investigation involving open-ended interviews, that are designed to develop or contribute to generalizable knowledge that WOULD constitute “research” under 45 CFR 46. These types of activities would then require review and approval by the IRB before such activities could begin.
The 2018 Requirements at 45 CFR 46.102(l) provide a definition of “research” and identify scholarly and journalistic activities, that focus directly on specific individuals, as one of four categories of activities deemed not to be research. The objective of the activities in this category is to provide an accurate and evidence-based portrayal of the specific individual involved, and not to develop generalizable knowledge.
Journalistic activities focused on the collection, verification, reporting, and analysis of information or facts on current events, trends, issues or individuals involved in such events or issues do not meet the definition of research. There is no intent to test hypotheses, and activities cannot reasonably be characterized as systematic investigations.
It is not the particular field that removes the activity from the definition, but rather that the purpose and design of the particular activity is to focus on specific individuals and not to extend the activity’s findings to other individuals or groups. The list of examples of types of scholarly and journalistic activities is not exhaustive; there could be other such activities if they focus directly on specific individuals about whom the information is collected.
Studies using methods such as participant observation and ethnographic studies, in which investigators gather information from individuals in order to understand the beliefs, customs, and practices, not only of those individuals, but also of the community or group to which they belong, would not meet the category found at 45 CFR 46.102(l)(1). The purpose and design of such studies or activities is to reveal something about the community or group – that is, to develop generalizable knowledge. Because the purpose of such studies or activities is not to limit the inquiry to knowledge about the particular individuals being observed, the protections provided by the requirements of 45 CFR part 46 would require prior IRB review and approval of these projects.
A human subject is defined as a living individual about whom an investigator (whether professional or student) conducting research:
- Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens or
- Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens. 45 CFR 46.102(d)(pre-2018)/45 CFR 46.102(e)(1) (1/19/2017)
Archival data collected from deceased individuals would not fall within the definition of a human subject and would not require IRB review.
Public use data files are data files prepared by investigators or data suppliers with the intent of making them available for public use. The data available to the public are not individually identified or maintained in a readily identifiable form.
Public use data sets (such as portions of U.S. Census data, data from the National Center for Educational Statistics, National Center for Health Statistics, etc.) are data sets prepared with the intent of making them available for the public. The data available to the public are not individually identifiable and, therefore, the analysis of such data would not involve human subjects.
The key to qualifying for exemption is demonstrating that the data set is stripped of identifiers and the identity of an individual cannot be readily ascertained by the individual receiving the data or through association with other available information.
IRB review is not required if the data is:
- Publicly available AND
- De-identified so that it is impossible to link a record to a particular individual.
An IRB submission is required if the data is both:
- Not publicly available AND
- Is identifiable.
The research using this data set is considered "human subjects" research.
Research involving the analysis of publicly available data containing private identifiable information, or the analysis of non-publicly available data that will not be recorded by the investigator, in a manner that allows the direct or indirect identification of individuals, qualifies for Exempt review (45 CFR 46.101(b)(4).
Investigators must submit an exempt application with the IRB before accessing the data.
Investigators submitting protocols involving these research procedures are asked to provide the following information in their protocol submissions to aid the IRB in making a determination of exemption.
- Description of data set and availability;
- Description of data to be accessed for analysis, including what identifiers are present; and
- Copies of data use agreements required by the data holder.
Research that utilizes stored data (retrospective or prospective data, various outcome measures or artifacts, photographs and recordings) or materials (cells, tissues, fluids, and body parts) from individually identifiable living persons for research purposes qualifies as human subject research, and requires IRB review.
When data or materials are stored in a bank or repository for use in future research, the IRB will need to review a protocol detailing the repository’s policies and procedures for obtaining, storing, and sharing its resources, for verifying informed consent provisions, and for protecting subjects’ privacy and maintaining the confidentiality of data. The IRB may then determine the parameters under which the repository may share its data or materials with, or without, IRB review of individual research protocols.
A human subject is defined as a living individual about whom an investigator (whether professional or student) conducting research:
- Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens or
- Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens. 45 CFR 46.102(d)(pre-2018)/45 CFR 46.102(e)(1) (1/19/2017)
If the participants consented in the original consent form that they will allow their data to be used in future studies, you will not be required to consent the participants again, but the data may not be considered exempt if you will continue to use identifiable information.
If you did not obtain consent to use the participants’ data in future studies and the data are identifiable, the Board may ask that you contact the participants again and obtain consent to use their data in the new study.
Secondary research can fall in various levels of IRB review, depending on what is being done with identifying information. To be not human subjects, the study team cannot view (as shown by the eye ball) or record identifying information.
Exempt research allows the study team to view identifiers, but not record identifiers (shown by paper/pen).
If expedited, the study team can both view and record identifying information.
