PSID FAQ

Skip Navigation LinksHome > Documentation > FAQs

All FAQs
Restricted use data:
- Obtaining the data
- Existing contract holders

ALL FAQs

1.	How can I get started analyzing PSID data?

The PSID user guide provides historical context and basic design features of the PSID. There are several tutorials which provide step-by-step instructions on downloading and analyzing the data in a variety of ways.

The Data Center is the most popular means for obtaining PSID data, and it delivers thousands of customized data files to researchers and quantitative social science students each year. The Data Center is fully automated and allows for user-specified subsetting criteria when downloading and merging data. Data can be generated in a variety of formats including ASCII, SAS, SPSS, and Stata.

2.	How do I merge family- and individual- level files?

The Data Center provides automatic and customized merges of files. For the analyst who prefers to write their own programming code to merge data downloaded from our zip packages, sample SAS and SPSS programs have also been prepared to assist users with creating cross-year analysis files.

3.	How can I identify families from year to year?

Each family unit in a specific wave is assigned a unique "Family Interview (ID) Number" valid for that wave only. In addition, each family also has a "1968 Family Identifier", also known as the "1968 ID". This is the Family Interview (ID) Number that was assigned to the original family in the 1968 interviewing wave. When sample members in any family move out and establish their own household, we interview them (these families are called "splitoffs", in the first year they are formed). These new "splitoff" families have the same 1968 ID as the family they moved out of, and keep that same 1968 ID each year. All families with the same 1968 ID contain at least one of the original members from the 1968 family or their lineal descendents born after 1968.

4.	Do family ID numbers vary from year to year?

For each family, the family ID number will most certainly vary from year to year. Yearly IDs are assigned based on the order in which interviews are received--the first interview in from field is numbered 1, the second, 2, and so on. This means it's very unlikely that a family with the Family ID Number 1234 in one year will get the same Family ID Number the next year, or any other year.

5.	When is a new Reference Person ('Head' prior to 2017) selected?

A new Reference Person (the term ‘Reference Person’ has replaced ‘Head’ in 2017) is selected if any of the following conditions apply:

last year's Reference Person moved out of the FU (family unit), died, (in some cases) became incapacitated; or
a female Reference Person has gotten married to a male, or now has a male Partner; or
this is a splitoff family. (Note that this new Reference Person may have been the Reference Person of the family this new family split off from)

6.	How can I tell the current Reference Person and Spouse/Partner from mover-out Reference Person and Spouse/Partner? Why is this important? (the term ‘Reference Person’ has replaced ‘Head’ in 2017)

To tell the current Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head') and Spouse/Partner from mover-out Reference Person and Spouse/Partner, use the Sequence Number (SN) from the individual file. The current Reference Person will always have SN=1, the current Spouse/Partner (if there is one) will always be SN=2. A mover-out Reference Person or Spouse/Partner will have a SN in the range 50-89, depending on the move-out circumstances. The SN allows you to identify an individual's status with regard to the family unit and determine family composition change. It's important to understand family composition change to avoid spurious correlations in a longitudinal analysis where you are looking at variables pertinent to the same person(s) over time.

7.	How do I assemble a Reference Person/Spouse file from an individual file? (the term ‘Reference Person’ has replaced ‘Head’ in 2017)

The easiest way to do this is by visiting the PSID Data Center which will create a customized dataset for you automatically.

Instructions for creating Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head')/Spouse file from an individual file by writing your own programming code:

To create a single year Reference Person (‘Head’ prior to 2017)/Spouse file: Select individuals with Relationship to Reference Person of "Reference Person" (a code value of 1 for 1968-1982; code 10 from 1983 onward) and with the Sequence Number=1. The reason for using the Sequence Number variable is that non-response movers out have relationships to the PREVIOUS YEAR's Reference Person, so two individuals within one family may have relationships of Reference Person. One, however, is the real, current Reference Person; the other is a mover out. (The type of mover-out can be determined from the value for Sequence Number. Refer to the individual file codebook for details.) To illustrate the importance of Sequence Number, assume that in the last wave we have an elderly married couple. He is the Reference Person (‘Head’ prior to 2017) and she is the Spouse--Sequence Number=1 and Relationship to Reference Person=10 for him, Sequence Number=2 and Relationship to Reference Person=20 for her. When we find them for the new interview, he has died and she has become the new Reference Person--his Sequence Number=81 and Relationship to Reference Person=10, her Sequence Number=1 and Relationship to Reference Person=10. All the family data items about Reference Person in the current wave refer to HER, not to him. Information about his income, etc. is located in OFUM (other family unit members) variables only. Similarly, to subset Spouses or Partners in a current wave--select Relationship to Reference Person=20 or 22 and Sequence Number = 2.

To create a cross-year Reference Person (‘Head’ prior to 2017)/Spouse file: These concepts can be expanded to subset persons who have been the Reference Person over a period of years--the yearly values for Sequence Number must be 1, and 1 or 10 for Relationship to Reference Person. As a corollary, to select individuals who have been either Reference Persons or Spouses/Partners, yearly Sequence Numbers must equal 1 or 2 and yearly Relationships to Reference Person must be in the range 1, 2, 10, 20, or 22. Once that subset is made and family data are merged, information about an individual can be found in Reference Person variables (Reference Person's work hours, Reference Person's labor income, etc.) when his or her Relationship to Reference Person=1 or 10. When Relationship to Reference Person is 2, 20, or 22, then his or her information is found in variables about the Spouse/Partner.

8.	How can I identify splitoffs from the main family?

Select only current Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head') (Sequence Number=1 and Relationship to Head/Reference Person=10) from the individual file for the wave in question. Then, if Reference Person's moved in/out indicator=1 and month moved in/out=0, it's a splitoff. Otherwise it's a main family.

9.	How is an individual uniquely identified?

The combination of the 1968 ID and the person number uniquely identify each individual.

To identify an individual across waves use the 1968 ID and Person Number (Summary Variables ER30001 and ER30002). Though you can combine them uniquely in many ways we find that many researchers use the following method:

(ER30001 * 1000) + ER30002

(1968 ID multiplied by 1000) plus Person Number

10.	Why is using the latest version of the Cross Year Individual File essential?

The Cross Year Individual File has records for every person who has ever lived in a PSID study family (including some who moved out just before the initial interviewing year for each part of the sample, or were institutional in the first interviewing year; see the codebook for ER30002). Each study individual has a record for each study year; years when that individual was not listed in a study family are zero-filled. The file is organized by ID68, PN, and year.

In addition to those variables, the file contains individual-level information such as YEARID, SN, Relationship to Head/Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head'), AGE, SEX, birthdates, move-in and move-out dates, follow status, type of individual, why non-response, several health insurance variables, variables indicating eligibility for supplementary studies such as CDS and DUST, and individual level longitudinal and cross-sectional weights.

It is essential to use the latest version of the file for analysis. The entire file is regenerated for every wave’s release not only to add the latest wave’s info, but to make corrections based on information received in the latest wave’s interviewing. These corrections are often to information such as birthdate or Relationship to Head/Reference Person, and may affect several waves of data.

Most importantly, however, we also make corrections to ID68 and PNs, based on new information. Sometimes we discover that a person we believed to be a separate individual is actually the same as a family member we already know about, with a different person number. If the two individuals were response in different waves, we can combine the information into one record, keeping the record for one PN and deleting the record for the other. In other cases, we might learn that someone we thought was the biological child of a sample member actually isn’t. This would necessitate a change in PN from the "born-in sample" range (030-169) to the 170-and-up range, and also a change in follow status, since on the new information the child would no longer be sample. In another instance, we found that a woman in the immigrant sample with PN 170 was actually a spouse who had moved out prior to the first year of immigrant interviewing (1997 for this case), so should have had PN 227 (a PN with special meaning, see the codebook for ER30002).

Currently, we uncover about 100 of these person-number fixes each wave. So failure to use the latest cross-year individual file may result in errors in both data and analysis.

11.	How can I determine if data will be collected about an individual who is not present in an interviewed family?

It depends on the situation.

For persons who are mover out deceased, some OFUM (other family unit member) information is collected for the wave they are reported to have died.
For persons moving out to an institution, some OFUM information is collected during the wave they are reported to have moved out from the family.
For persons moving out to another household but no interviews are conducted with the new family unit, some OFUM information is collected the wave they are reported to have moved out.
For persons already in institutions, no new information is collected.
For persons who attrited from the study, no new information is collected. However, a large recontact effort was initiated in 1992.
For persons not yet born or not yet appearing in the study, no information is collected that wave.

Beginning with the 1999 wave, when the PSID switched from annual to biennial interviews, the following rules for movers out apply only if the person moved out in the calendar year before the interview. For example, in the 2007 data, there will be information for movers out on or after 1/1/2006, but no information for movers out before that date.

12.	How can I determine which variables are comparable across years?

The cross-year index can help you identify comparable variables across the years.

You can also look at the ‘Years Available’ section of each variable’s codebook entry for a year-by-year listing of when that variable is available in the data center.

13.	How can I tell if a variable value is actual or imputed?

A missing data value is either identified as such (value=9) or an imputed value is assigned in lieu of a missing data code. If an imputed value is assigned, an associated "accuracy code" variable describes the nature of the assignment.

14.	How can I identify the SEO (Survey of Economic Opportunity) sample and the SRC (Survey Research Center) sample?

You will need to look at the 1968 family interview number available in the individual-level files (variable ER30001).

SRC sample families have values less than 3000.
SEO sample families have values greater than 5000 and less than 7000.

Immigrant sample families have values greater than 3000 and less than 5000. (Values from 3001 to 3441 indicate that the original family was first interviewed in 1997; values from 3442 to 3511 indicate the original family was first interviewed in 1999; values from 4001-4851 indicate the original family was first interviewed in 2017; values from 4700-4851 indicate the original family was first interviewed in 2019.)

Latino sample families have values greater than 7000 and less than 9309. (Values from 7001 to 9043 indicate the original family was first interviewed in 1990; values from 9044 to 9308 indicate the original family was first interviewed in 1992.)

15.	For what years are Latino data available? How do Latino data differ from immigrant data?

In 1990 the PSID added 2,000 Latino households consisting of families originally from Mexico, Puerto Rico, and Cuba. But while this sample did represent three major groups of immigrants, it missed out on the full range of post-1968 immigrants, Asians in particular. Because of this crucial shortcoming, and a lack of sufficient funding, the Latino sample was dropped after 1995, and a sample of 441 post-1968 immigrant families was added in 1997. In 1999, an additional 70 families were added in for a total of 511 immigrant families as of 1999. These families are included on the files along with the core PSID families

16.	Where can I find information on the 1997-1999 and 2017-2019 Immigrant Samples?

Information on the Immigrant Sample is available in the 1997 and 1999 main interview documentation.

17.	Codebook information for some variables does not show index information. Why?

Variables from supplemental files, with the exception of CDS and TAS, are not yet in the index. We plan to add these variables to the index in the future.

18.	Why do some supplemental files have fewer observations than the main family files?

Some supplemental files were created only for sub-samples. For example, the Disability and Use of Time Supplement (DUST) only collects information on Heads ('Reference Person' starting in the 2017 wave) and Spouse/Partners of a certain age.

19.	What identification variables should I download?

The Data Center automatically includes all appropriate identification variables for your file.

20.	How does one analyze data from families from one wave to the next?

Users often want to look at data from the "same" family in adjacent waves. It is important to understand that there is no absolute definition of "same" family. Families are made up of individuals who may move in or out of study families from wave to wave. It is up to the user to decide what he or she means by "same" family. The user may want to restrict this definition to option 1) absolutely no changes in the composition of the family since the previous wave. All the individuals that were in the prior wave are still in the current wave - no one has moved in and no one has moved out. Alternatively, the user may want define "same" family as option 2) those who have the same Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head') in both waves.

In order to subset those cases which the user has defined as "same" family, he or she will find the Family Composition Change variable most useful. The Family Composition Change variable indicates the degree of change in this family since the prior wave's data collection. For option 1, the user would subset the families in the current wave where Family Composition Change variable = 0. For option 2, the user would subset the families in the current wave where Family Composition Change variable in (0,1, 2).

For 2019, for example, the Family Composition Change variable is ER72007.

21.	What is the CDS and how does it relate to the PSID?

The Child Development Supplement (CDS) is one research component of the Panel Study of Income Dynamics (PSID).

While the PSID has always collected some information about children, in 1997, PSID supplemented its main data collection with additional information on 0-12 year-old children and their parents. The objective was to provide researchers with a comprehensive, nationally representative, and longitudinal data base of children and their families with which to study the dynamic process of early human capital formation. The Original CDS was collected in three waves: CDS-I in 1997, CDS-II in 2002/2003, and CDS-III in 2007/2008.

In 2014 the CDS methodology was changed to a steady state design, collecting information on all sample children aged 0-17. For more information on the Ongoing CDS, please see the CDS-2014 User Guide, the CDS-2019 User Guide, the CDS-2020 User Guide, the CDS-2021 User Guide.

By nature of the CDS being a supplement to the PSID, the study takes advantage of an extensive amount of family demographic and economic data about the CDS target child's family, providing more extensive family data than any other nationally-representative longitudinal survey of children and youth in the U.S. In addition, the PSID-CDS data are "intergenerational" in structure with information contained in several decades of data about multiple family members. This rich data structure allows analysts a unique opportunity to fully link information on children, their parents, their grandparents, and other relatives to take advantage of the rich intergenerational and long-panel dimensions of the data.

22.	What information does the CDS collect about its sample children?

Within the context of family, neighborhood, and school environments, CDS studies a broad array of developmental outcomes including (but not limited to) physical health, emotional well-being, intellectual achievement, and social relationships with family and peers. These outcomes are measured through reliable, age-graded assessments of cognitive and behavioral development and health status indicators obtained from the primary caregiver and the sample children/youth themselves; anthropometric measures of height and weight of the sample children/youth; a comprehensive accounting of parental (or caregiver) time inputs to children/youth as well as other aspects of the way the children/youth spent their time; and other-than-time use measures of other resources for example, the learning environment in the home (using the HOME Scale measures), school resources, as reported through the National Center for Education Statistics Common Core of Data, and decennial-census-based measurement of neighborhood resources. The multi-level, interdisciplinary, and longitudinal nature of the research design facilitates analysis of the relationships between these developmental measures and changes in family structure and living arrangements, neighborhood economic and social conditions, and school resources and programs.

23.	Who funds the CDS?

The Original CDS (1997-2007) was made possible by the generous funding of the National Institute of Child Health and Human Development, the National Science Foundation, and the Economic Research Service of U.S. Department of Agriculture.

The William T. Grant Foundation, the Annie E. Casey Foundation, and the U.S. Department of Education provided additional funding for CDS-I.

The Ongoing CDS (2014 and beyond) is made possible by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the Economic Research Service of the US Department of Agriculture, MARS-Waltham, and the Center on Philanthropy at Indiana University.

24.	Where can I obtain copies of the questionnaires and other study documentation?

Questionnaires and supporting documentation for the PSID and its supplemental studies are located on the questionnaires and supporting documents page.

25.	Do I need to use the sample weights with CDS and TAS data?

The Original CDS-TAS sample was drawn from PSID families with children 0-12 years in 1997, and the Ongoing CDS sample was drawn from PSID families with children 0-17 years old in 2014, 2019, 2020, and 2021. The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples. Both the CDS-TAS and PSID samples are probability samples (i.e., samples for which every element in the population has a known nonzero chance of selection). Their combination is also a probability sample. The combination, however, is a sample with unequal selection probabilities, and as a result, compensatory weighting is needed in estimation, at least for descriptive statistics. Weight adjustments are also needed to attempt to compensate for differential nonresponse across waves. Weights supplied on CDS and TAS data files are designed to compensate for both unequal selection probabilities and differential attrition.

In the 2002, 2007, 2014, 2019 and 2021 CDS demographic files, you will find a set of indicator variables for each module that specify (a) if a case was eligible for that module and (b) if a record exists for that case in the corresponding data file. These variables are helpful to merge onto your Data Center data request if you are merging variables from multiple CDS modules. The sample weight in the Demographic file is adjusted only for the non-response in the main module, Primary Caregiver (sections A-H, J). The module indicator variables, however, will inform you about item missing data across modules. It is up to you to then decide on your preferred approach for addressing item missing data that results from differential response rates across modules (for example, you may leave it as missing, impute scores, etc). The TAS data files contain wave-specific sample weights.

More documentation on the CDS and TAS weights can be found on the documentation page.

26.	How do I find information about the CDS Target Child's demographic background?

Every individual in the PSID - including the children - has both an "ID68" (1968 Family Identifier - ER30001) and "PN" (Person Number - ER30002) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers to find information about the CDS targeted child and caregivers in the PSID data files. Background information about the CDS target child, such as birth date, sex, and relationship to the PSID family household Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head') can be obtained from the PSID individual and sampling variables files. Use the ER30001 and ER30002 combination to select the PSID variables for just the CDS target child sample, or, when you get to the "Output Options" page in the Data Center, after selecting the variables you want, select "CDS Children" at the bottom.

27.	Are there additional data files from the PSID that would be useful to me as a CDS data user?

There are two PSID family history files that may be of particular interest to CDS users: the Childbirth and Adoption History File and the Parent Identification File.

Childbirth and Adoption History File: The Childbirth and Adoption History File is specifically designed to facilitate access to detailed information collected since 1985 regarding histories of childbirth and adoption. Variables on this file include the identifiers for each parent and child, month and year of birth for both parent and child, birth order, birth weight and date of death for a child, year of most recent report and number of births/adoptions, etc. Data on this file are structured in a one-record-per-event format, with each record representing a specific childbirth or adoption event.

Parent Identification File: The Parent Identifier File synopsizes information collected from various sources since the 1983 wave of PSID about parent-child relationships. This file consists of identifier variables that link children with their parents. The file is intended to be used to facilitate linking children's and parents' data records from the Individual File. Linkages can be done from either the child's or a parent's standpoint.

28.	How do I obtain information collected in the main PSID about the CDS target child's caregivers?

There are a large number of variables in the PSID that can be used along with CDS.

Demographic, health, economic, and other family data about PCG (primary caregivers) and OCG (other caregivers) can be found in the PSID data files. Every individual in the PSID has both "ID68" (1968 Family Identifier - ER30001) and "PN" (Person Number- ER30002) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers to find information about the CDS targeted child and caregivers in the PSID data files. These identifier variables are available through a Child to Caregiver Map, provided with each Data Center download.

29.	How do I find the identification numbers of the CDS target child's caregivers?

The child to caregiver map, provided with each CDS data download, provides "1968 INTERVIEW NUMBER" (ID68) and "PERSON NUMBER 68" (PN) for CDS individuals. These CDS individuals are the target child, the target child's primary caregiver (PCG) in both the Original and Ongoing CDS, as well as the target child's other caregiver (OCG) in the Original CDS, if one exists. Missing data means that the child did not have an OCG for the CDS interview year.

All CDS files, by default, contain variables ER30001 (1968 INTERVIEW NUMBER) and ER30002 (PERSON NUMBER 68). Since these variables are also in the map file, the map file can be used to merge PCG and OCG data from PSID Individual data to CDS Child level data in a two step process.

30.	How do I identify siblings in the CDS data files?

There are two steps to locating data for siblings in the CDS data files:

In the Demographic Data File, there is a sibling indicator variable that tells you if a CDS target child had a sibling who also participated in the CDS data collection.

Automatically appended to your data download is the "Family Interview Identification Number" for the corresponding PSID main interview. This variable uniquely identifies the family.

Using these two variables, you can locate data on a wide range of information about the target children and their siblings in the CDS.

See also the codebook explanation text for the family identification number. There is a variable for any year in the PSID in both the individual and family files.

31.	How was height and weight measured in the CDS?

Original CDS: In CDS-I, height of the child was measured by the interviewer and weight was reported by the parent. In CDS-II and CDS-III, both height and weight were measured by the interviewer.

Ongoing CDS: In CDS-2014, CDS-2019, CDS-2020 and CDS-2021 height and weight were measured by the interviewer for families that participated in the in-home module. For children not included in the in-home module, height and weight were reported by the parent.

32.	What is the Behavior Problem Index (BPI) and how is it scored?

The Behavior Problem Index was originally developed by James Peterson and Nicholas Zill from the Achenbach Behavior Problems Checklist to measure in a survey setting the incidence and severity of child behavior problems. The BPI scale is based on responses by the primary caregiver as to whether a set of 32 problem behaviors is often, sometimes, or never true of the targeted child.

These items are then divided into two subscales: 1) a measure of externalizing or aggressive behavior and 2) a measure of internalizing, withdrawn or sad behavior. The User Guide specifies the individual items that map into the internalizing and externalizing subscales.

We performed a confirmatory factor analysis on our two expected subscales. The results showed that the items grouped into these two factors quite readily, with one variable overlapping on both subscales, as did in CDS-I, and two variables not loading at all. We constructed an overall or total BPI score, using all 32 items, as well as separate scores for each of the two subscales, internal or withdrawn and external or aggressive. Before scoring, the individual items are recoded such that a score of "1" becomes "0" and a score of "2" or "3" become a "1". Scores for the total BPI and Externalizing and Internalizing are sum scores. Higher scores on these measures imply a greater level of behavior problems. Cases were included if they had data approximately 75% valid data on the variables contributing to the BPI Indices.

In CDS-2020, PSID transitioned from using the BPI to the Strengths and Difficulties Questionnaire (SDQ) for assessing children’s personality and behavior. Please see the FAQ 19 (of CDS only FAQs) or 39 (of all PSID FAQs) for more information on the SDQ.

33.	What is the HOME-SF and how is it scored?

The Home Observation for Measurement of the Environment-Short Form from the Caldwell and Bradley HOME Inventory is used as a measure of cognitive stimulation and emotional support that parents provide to their children. The particular items used in the PSID Child Development Supplement were taken directly from the National Longitudinal Survey of Youth, Mother-Child Supplement so that the scales would be as similar as possible. The HOME-SF items include both parent/caregiver-reported items and interviewer observations of the home and neighborhood environment. The HOME-SF is divided into four parts:

Infant/Toddler (IT) HOME, designed for use during infancy (birth to age three);
Early Childhood (EC) HOME, designed for use between 3 and 6 years of age;
Middle Childhood (MC) HOME, for use between 6 and 10 years; and
Early Adolescent (EA) HOME, designed for use from 10 to 15 years old.

We have included three scores for HOME-SF for each age module appropriate for CDS-II and CDS-III data: 1) a total raw score, 2) an emotional support subscale raw score, and 3) a cognitive stimulation subscale raw score. The total and subscale raw scores for the HOME-SF are a summation of the recoded individual item scores and varies by age group, as the number of individual items varies according to the age of the targeted child / youth.

Additional information about the HOME-SF in the CDS can be found in the CDS User Guide.

34.	What is the Woodcock-Johnson Revised Test of Achievement and how is it scored?

The Woodcock-Johnson Psycho-Educational Battery-Revised (WJ-R) provides a normed set of tests for measuring cognitive abilities and academic achievement. In the Original CDS, CDS-I-III, we selected three subtests as a measure of reading and match achievement: the Letter-Word, the Passage Comprehension, and the Applied Problems tests (the Calculation test was additionally administered in CDS-I. These scales can be used individually, or in the case of the four subscales, combined to create scores for Broad Reading and Broad Math. When applicable, the Spanish version of the WJ-R (Batería-R, Form A), was used for children whose primary language was Spanish.

The Woodcock-Johnson Revised (WJ-R) tests of achievement have standardized administrative and scoring protocols. The tests are designed to provide a normative score that shows the CDS target child's reading and match abilities in comparison to national average for the child's age. The normed scores are constructed based on the child's raw score on the test (essentially the number of correct items completed) and the child's age to the nearest month. Raw scores are charted on normative tables based on the child's age and what percentile the child falls into. More information on scoring is provided in the CDS User Guides.

35.	Why isn't there a Broad Math Score for CDS-II?

In CDS I, we included two Woodcock Johnson - Revised math-skill tests: Calculations and Applied Problems. A broad math score was constructed based on these two tests. In CDS II, we only included the Applied Problems; hence, no broad math score can be constructed - just a score for applied problems.

36.	How do I know if PSID or CDS data files have been updated?

File release information is available through the News section of our website. You can also sign up to have the news delivered to your email by logging in and selecting to receive updates on the "Settings" page.

37.	Why won't the Data Center let me create a file merging CDS Time Diary data files with other data?

Only one file is allowed in your cart if CDS Time Diary is selected. To add CDS Time Diary variables to your cart, you must select variables from just one file, and there cannot be any variables from other files in your cart. Time diary data are not at an individual or family level (like other data in the data center), so the data center cannot merge them automatically.

38.	How do I open my data files from my Data Center download?

To open the .txt files into Stata (SAS, or SPSS), save both the .txt and .do (.sas. or .spss) file from your download to your machine and take note of the path where the .txt file is located.

Once you save the .txt file to your computer you will open the read-in statements (.do, .sas, or .spss) into your statistical program. Replace the section of the read-in statements “[path]” with the path of where you saved your data text file, for example, C:/yourcomputer/yourfiles/. Once you have identified the path (the location of the .txt file) you will run the read-in statements and the program will read-in the data and label your variables accordingly. You can then save the resulting data set as a data file.

For step-by-step instructions, please see the web tutorial Accessing and Downloading PSID Data.

39.	What is the Strengths and Difficulties Questionnaire (SDQ) and how is it scored?

The Strengths and Difficulties Questionnaire (SDQ) was originally developed by Robert N. Goodman. It consists of 25 items that are used for assessing children’s personality and behavior. As of CDS-2020, PSID has transitioned from using the Behavior Problems Index (BPI) to the SDQ. Please see FAQ 12 (of CDS only FAQs) or 32 (of all PSID FAQs) for more information on the BPI.

The SDQ is based on responses by the primary caregiver (PCG) for all children aged 3-18 years on whether behaviors are not true, somewhat true, or certainly true according about the child’s behavior over the last 6 months.

These items are divided into 5 subscales of 5 items each that assess: 1) prosocial behavior; 2) hyperactivity/inattention; 3). emotional problems; 4). conduct problems; and, 5). peer relationship problems. The prosocial subscale is available for all cases with a valid response to each of the five items in the scale. The subscales for hyperactivity/inattention, emotional problems, conduct problems, and peer relationship problems are the rounded mean of non-missing responses and are only calculated when at least 3 or the 5 component items have a valid response.

Additional information about the SDQ can be found in the CDS User Guides.

40.	How does the geographic information in the public release files differ from the restricted files?

The public release files, which can be downloaded directly from the PSID website, contain geographic information of a more generalized nature such as region and state of residence. A collapsed version of the Beale rural-urban code is available for some years in the data center, as well as in the supplemental files located here:

http://simba.isr.umich.edu/Zips/AuxiliaryFiles.aspx?pane=TURBC

These data will meet the needs of most users. Users in need of more specialized geographic information may want to request use of the restricted PSID Geocode Match files. These files include the identification codes necessary to link data from the PSID annual family files to Census data. This linkage allows the addition of information regarding the characteristics of the geographic area in which individuals and families lived (e.g., the neighborhood and/ or the labor market area) to the PSID individual- or family-level data. This should in turn allow investigation of the effects of non-family "context" variables on family and individual outcomes.

In the past, we provided selected variables from the Census in aggregated forms (i.e., Census Extract Files); however, we no longer support these files. In recent years, there has been a rapid growth of external sources that provide an increasing variety of measures of the neighborhood environment.

41.	Who may obtain restricted data?

Individuals who conduct scientific research and hold a full-time, permanent, doctoral-level faculty appointment and obtains the approval of their research and data protection plan through a human subjects institutional review board may request use of restricted data.

42.	What application materials need to be submitted to obtain restricted data?

The following materials must be submitted:

1. Curriculum vitae
2. Research plan
3. Institutional Review Board (IRB) approval
4. Data request form
5. MiCDA acceptable use policy (AUP)
6. VDI-Data security plan
7. Institute for Social Research Confidentiality pledge Confidentiality pledge

43.	How long does the process take to obtain restricted data?

Once all application materials have been submitted, review by the PSID restricted data committee usually occurs within two weeks. Once the application is approved, the contracts must be signed and submitted along with the non-refundable administrative fee. After these steps have been completed, the data are provided via secure remote access through a Virtual Data Enclave. The average processing time is between one to two months and on rare occasions as long as six months. The vast majority of delays occur as a result of contract language change requests by the requesting institution.

44.	May more than one type of restricted data set be requested?

Yes, more than one type of restricted data may be requested; however, the researcher must document in their research plan the need and purpose for use of more than one type of data set.

45.	May the restricted data be used for more than one research project?

No. Each contract is project specific and is validated through the research plan.

46.	May more than one person use the restricted data for their own specific project?

Multiple investigators may use the restricted data if they are all involved in the same research project and named on the research plan and on the IRB. All investigators must provide a CV, an AUP, DSP and Pledge of Confidentiality and sign the restricted data contract.

47.	Can other researchers who are not members of my institution work on my restricted use research project?

Yes. A PI may include investigators from other institutions who are collaborating on the research project by describing their roles and qualifications in the research plan, and naming them in the IRB. All investigators must provide a CV, an AUP, DSP and Pledge of Confidentiality and sign the restricted data contract.

48.	I am a graduate student. May I obtain the PSID restricted data files?

Graduate students must work with the restricted data under the supervision of a full-time, permanent, doctoral level faculty person at their institution. The faculty advisor is named as the Investigator on the contract and is responsible for ensuring that all confidentiality and security measures are upheld. Should a faculty advisor leave the receiving institution, they are responsible for notifying the PSID of this change. Graduate students must then obtain a new faculty advisor who will be named on the contract until their research project is completed.

49.	My research project is quite complex and may take some time to complete. Is there a time limit on a contract?

Contracts are limited to three years with possible three year extension. IRB update paperwork due annually, as requested by the PSID contract administrator.

50.	My role will be an acting faculty advisor to a graduate student. What are my responsibilities as faculty advisor?

The faculty adviser is named as the Investigator on the contract, and assumes responsibility for everyone on the project accessing the PSID restricted data. They are also responsible for ensuring the contract paperwork is kept current. This includes returning a complete Request for Extension form to the PSID every 180 days while the contract remains active. The faculty adviser must also be responsible for ensuring that security measures are kept in force for all restricted data work, and for seeing that the contract closure occurs once the dissertation has been published.

51.	As a faculty advisor, am I able to use my student's restricted data through their contract?

Multiple contracts for a number of research projects are permissible as long as an individual contract is submitted for each research project. A contract will be processed and established for each research project requested. Each contract has its own paperwork, contract and fee.

52.	My institution does not have an IRB. How may we meet this application requirement?

Your institution may have a Board that approves, monitors, and reviews research in order to protect human subjects from risk and harm. These boards may be known by various names including an Independent Ethics Committee (IEC), Ethical Review Board (ERB),or Research Ethics Board (REB).

If your institution does not have any such group, please contact the help desk for further assistance at [email protected], as an alternative must be established.

53.	Our IRB only responds via email when approval has been obtained. How can we provide the PSID with this information?

Researchers may forward the email approval to the PSID.

54.	Our IRB only grants approval for one year. Since contracts are for three years, how can we fulfill this requirement?

The researcher is responsible for maintaining active IRB approval during the entire course of the contract and must submit updates or renewals to the PSID.

55.	One application requirement is the Curriculum Vita [CV]. Whose CV should be submitted with the application materials?

Everyone involved in the research project should submit a CV. Graduate students should submit their CV as well as the most recent CV for their faculty advisor.

56.	My research project has received approval from the PSID restricted data committee. What is required next?

Researchers or graduate students are required to obtain signatures on the original contract and submit it to the PSID so it may be fully executed through the University of Michigan. An electronic signed contract will be returned to your institution.

57.	We are confused regarding the signatures on the contract - who signs where?

The “Representative of the Receiving Institution” is someone who is able to legally enter into negotiations with the University of Michigan. The faculty adviser signs as Principal Investigator. Co-investigators may also sign, where applicable. There is also a Supplemental Agreement form where Computing Support and Research Assistants may sign when appropriate, at which time the Investigator must also sign at the bottom of the supplemental page.

58.	Our state has its own legal requirements regarding entering into contracts, is the contract language negotiable?

Possibly. Please contact the help desk about any concerns regarding contract language issues. It is important to understand that any requests for language modification can create significant delays in fully executing the contract and access to the data.

59.	As a researcher, what are my responsibilities during the contract period?

During the three year period, the PSID will send the primary contract holder a Request for Extension every 180 days which requests updated contact information and must be returned to the PSID in order to keep the contract active. After the three year period, or at the conclusion of the project if this occurs before three years, access to the PSID data will be discontinued.

60.	What paperwork will be required for the new contract [second contract]?

The new contract will require all the same application documents as the first contract. For ease of transition, updated or modified documents are acceptable. An updated or new IRB approval must also be submitted. Once these are submitted, a new contract must be signed. The researcher may request updated restricted data if a new version has been released since their original project began.

61.	What happens if I do not submit the updated DSP/VDI form?

Your contract is considered Out-of-Compliance. Contracts that are active and in compliance are eligible to request Restricted Data Set updates. The updated data is provided at no additional charge to these researchers; however no such updates are provided in the event that a contract is out of compliance. Also, failure to submit this documentation during a contract could jeopardize a researcher's future request for restricted data. Sometimes unavoidable circumstances can cause delays in the submission of the Extension form and PSID staff are more than willing to work with institutions to facilitate the process.

62.	Another researcher wants to use my restricted use data files. Is this possible?

No. Under no circumstances can the restricted data be shared with individuals who are not named on the contract. Contracts are "project specific" - each researcher must obtain their own contract/user name and password for their research to ensure that they maintain respondent confidentiality.

63.	I have completed my research using the restricted use data. What are the next steps that need to be taken?

Contact the help desk to coordinate contract closure paperwork at [email protected]

64.	Can I keep all my data files that were created with the restricted data files?

All restricted data and derived files must either be destroyed or returned to the PSID for secured storage. Many researchers elect to return their files to the PSID for secured storage allowing them to use the data in the future. PSID Help will coordinate with a researcher the paperwork requirements and return of previously created data sets and derived files.

65.	The data files that are posted for each new wave are called Public Release. What does Public Release mean?

All Public Release data files have been processed and edited, and should meet the research needs of all users.

Over the past several years the PSID staff, using Computer Assisted Telephone Interview (CATI) technology and companion processing software, have significantly improved the quality and reliability of the timely release of data files. We now refer to the files posted for each new wave as Public Release Data. Note that:

1. Longitudinal data are subject to revision based on the most recent information received from individuals and families. New information that we find during family composition and economic editing in one wave may require revisions to previous waves. As additional data are collected through time on our two year collection cycle, prior files may be edited in light of the new information. Both the values of the variables themselves and the relationships of individuals to the families to which they are connected may be edited. Normally such changes are made only for a small number of cases.

2. An extensive set of computed or generated variables are included in the Public Release Data. As time and resources allow we occasionally add selected new generated variables for later release.

Since the PSID data files, as with the data files from any complex longitudinal study, are subject to minor changes and subsequent updated releases, due primarily to economic and family composition editing activities, it is therefore highly recommended that users retain and save all data files that are downloaded from this site and upon which individual research analysis is dependent. Only the most current data files are retained by PSID staff for distribution.

66.	Some older documents reference Public Release II and Public Release I data; what does that mean?

The term "Public Release I" is used to refer to files released for general public use after they have been reviewed for data quality checks and consistency in both the reported family listing and the relationships among family members (this review process is called "family composition editing").

The term "Public Release II" was previously used to refer to files which had undergone additional data checks to correct a very small number of cases and had been formatted in a more convenient form.

Because of successive improvements in our Computer Assisted Telephone Interviewing (CATI) software that PSID began using in 1993, the quality of the Public Release I files improved in recent waves, allowing the use of these data with confidence. There is now no longer a necessity to release two versions of the Public Release files.

67.	What is the definition of a main family, a reinterview family, and a split-off?

A reinterview family is a family unit that was interviewed in the prior wave.

A main family is one that is the source of a splitoff family (a new study family formed by a sample member who moves out and forms his or her own family unit). In some divorce or separation situations, both resulting families will contain sample members, so both will be interviewed. We interview the first spouse we are able to contact as the main family, while the other spouse will be in the splitoff family. In the case of children leaving home, the main family is almost always the parental family.

A split-off family consists of a person or group of people (at least one of whom is a "follow" person of any age) who moved out from a main family since the prior wave's interview to form a new, economically independent family unit living in a separate housing unit. Several criteria must be met for a split-off to occur. In addition to having moved out since the prior wave, and to being 'followable', the person or group of people in general may not have moved to an institution such as college or prison or to another family unit within the panel study. Moreover, the person or group of people who have moved out and formed their own family unit must be economically independent from the family unit from which they split off. These are general rules, however, and sometimes unique situations arise that determine whether a person or group of persons becomes a split-off. For example, while moving to an institution such as college does not generally meet the criteria for becoming a split-off, if the person is working, paying their own living expenses, and paying their own educational expenses in addition to attending school, then this person could be interviewed as a split-off. The living situation and interview data for each and every possible split-off case are first reviewed before split-off status is granted. Note that a splitoff family is only designated as a splitoff in the wave in which the family is newly formed and interviewed for the first time. In subsequent waves, they are considered a reinterview family.

68.	What is the difference between a family unit (FU), a household unit (HU), and a family unit member?

In the PSID study, we are attempting to learn about our sample members, and the families in which they live. Each of these families is called a family unit (FU). The FU is defined as a group of people living together as a family. They are almost always related by blood, marriage, or adoption. And they must all be living in the same HU (see below).

Occasionally, unrelated persons can be part of an FU. They need to be permanently living with the family and share both income and expenses.

Any person in a study family is a family unit member. The term "other family unit member" (OFUM) is used of members who are not the Reference Person (the term ‘Reference Person’ has replaced ‘Head’ in 2017) or Spouse/Partner.

The household unit (HU) is the physical dwelling where the members of the FU reside. It can be a house, townhouse, apartment, a room in a rooming house, even a tent or a car.

Not everyone living in an HU is automatically part of the FU. There may be other people living in the HU temporarily who do not meet the criteria of relatedness and economic integration. The PSID data is about FU Members only.

69.	What is the difference between 'Head' and 'Reference Person'?

Historically, PSID has used the term Head to refer to the husband in a heterosexual married couple and to a single adult of either sex. Starting in 2017, the term ‘Reference Person’ has replaced ‘Head.’ From 1968-2015, PSID conformed to the Census Bureau conventions’ in place at the time the study began by designating the husband in households with heterosexual married adults, the ‘Head’. In the last 50 years, however, substantial diversification in both family formation and composition has taken place. In order to reflect these changes in societal norms, as of 2017 the term ‘Head’ has been replaced by ‘Reference Person’. This change is not retroactive, however, so in historical contexts in 2015 and before we will continue to use the term head.

70.	What is the difference between 'Wife/"Wife"' and 'Spouse/Partner'?

The term Wife has been used for a female in a married couple, and “Wife” for a cohabiting female. This terminology was adopted from the Census Bureau in 1968 at the start of the PSID and has been maintained for consistency through the 2013 wave. Starting with the 2015 wave, the term Spouse/Partner has replaced Wife/“Wife”. Spouse indicates a legal marriage, while Partner is a cohabiting, non-legally married partner, where the couple can consist of heterosexual or same sex couples.

71.	Who is a Sample Member and what is Follow Status?

Sample Members are individuals who were living in the original family unit (FU) at the time of the very first interview and their lineal descendants born after 1968. (For subsequent samples, such as the immigrants, the year of the first interview serves as the base for determining who is an original sample member, and all individuals present in the family at that time qualify.)

Follow status indicates whether we are interested in continuing to interview an individual. In general, sample members are always considered Followable. Non-Sample Members can be Followable too, if they represent a population of current interest. For example, we have in the past, followed such people as Non-Sample parents of sample children who were aged 25 or younger.

You can tell who is a sample member by looking at the individual's Person Number and Follow Status. Original Sample Members who were living in the original study FU in the first year of interviewing were given Person Numbers in the range of 001-019. Any Reference Person's (the term ‘Reference Person’ has replaced ‘Head’ in 2017) Spouse in the original interviewing year who was living in an institution was given a Person Number of 020. In addition, children of the Reference Person (and Spouse/Partner, if present) who were under age 25 and in an institution the first year were considered Original Sample members and given Person Numbers in the range 0021-029. All of these people are followable.

Individuals who were born into a sample family after the first interviewing year and have a sample parent are considered "born-in Sample Members" and receive Person Numbers in the range of 030-169. All born in sample members are followable.

Some individuals who qualify as sample members (because they have a sample parent) are not born into a study family, but move in later. These "Moved in Sample Members" have Person Numbers of 170 or greater and are Followable.

All other people who have ever lived in a PSID family are not sample individuals. They also receive Person Numbers of 170 or greater, but are not Followable.

72.	What is the difference between response and nonresponse family unit members?

Response family unit members are those residing in an interviewed family at the time of interview. Nonresponse family unit members are those not residing in an interviewed family at the time of interview; they may have attrited, not yet appeared in the study, or not yet been born by a particular wave.

The phrase "main family nonresponse" means that both the individual and his or her family have at that time become lost to our study, although either or both may reappear in the study in subsequent waves. In the wave just prior to becoming nonresponse, the individual was connected with a family interviewed by our study; thus, both family and individual data are available for that prior year, and the individual's Sequence Number at that time was 01-59. However, data were collected for neither the individual nor his or her family in the nonresponse wave. The data for the wave in which nonresponse occurs (and all subsequent waves if and until the individual reappears as a member of a responding family unit, including a recontact family) are zeroes excepting the variables for type of individual record and reason for nonresponse, and if an individual was selected for recontact, follow status and reason for following the individual.

In contrast, mover-out nonresponse individuals have left a family that was still in the study. Since such individuals were usually present in that family for at least part of the calendar year preceding nonresponse, they have some additional nonzero data for the wave in which they became nonresponse, such as part-year income information. In later waves, mover-out nonresponse individuals are treated in two ways, depending on why they left the family. Those who moved out to institutions have several variables (Sequence Number, age, sex, Relationship to Reference Person--in 2017 the term 'Reference Person' replaced 'Head', type of individual and reason for nonresponse) with nonzero values, although income, housework, and other individual-level variables are filled with zeroes. Eventually, such an individual may (a) become response by moving into a family or by becoming a splitoff, (b) move from the institution and remain mover-out nonresponse (shown when Sequence Number=71-89), or (c) become main family nonresponse because the family itself became nonresponse. (See the preceding paragraph for an explanation of main family nonresponse data records.) The other type of mover-out nonresponse individual has either moved out, but not to an institution, or died. Later waves of data contain zeroes, as described above for main family nonresponse, unless they subsequently rejoined a responding family or were selected for recontact.

The data are released as one file, which includes not only those individuals with nonzero data records in the current data collection year (i.e., current response plus mover-out nonresponse), but also all other individuals-those who have zero data records for the current year (i.e., current year main family nonresponse and all nonresponse of either kind from earlier waves.

73.	How is Reference Person defined in the PSID? (the term ‘Reference Person’ has replaced ‘Head’ in 2017)

Within each wave of data, each FU (family unit) has one, and only one, current Reference Person (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head'). Originally, if the family contained a husband-wife pair, the husband was arbitrarily designated the Reference Person to conform with Census Bureau definitions in effect at the time the study began. The person designated as Reference Person may change over time as a result of other changes affecting the family. When a new Reference Person must be chosen (see conditions for selecting a new Reference Person below), the following rules apply:

The Reference Person (‘Head’ prior to 2017) of the FU must be at least 18 years old and the person with the most financial responsibility for the FU. If this person is female and she has a (male) spouse or partner in the FU, then he is designated as Reference Person. If she has a boyfriend with whom she has been living for at least one year, then he is Reference Person. However, if the husband or boyfriend is incapacitated and unable to fulfill the functions of Reference Person, then the FU will have a female Reference Person.

74.	What is a Husband of Reference Person, Uncooperative Spouse, or Uncooperative Partner? (the term ‘Reference Person’ has replaced ‘Head’ in 2017)

From 1968 to 2015, a married male Head ('Reference Person' starting in the 2017 wave) might become incapacitated in some way. (He might still be in the FU, or in an institution such as a nursing home.) In these cases, the female half of the couple was made Head and the husband became Husband of Head. A Husband of Head was asked the same questions as an Ofum. A male Head could also have been made Husband of Head if the female half of the couple insisted on being the Head, the female half of the couple was adamant about not giving out information about her husband, or the husband was adamant about not wanting to be included in the study. A Husband of Head had the Relationship to Head code 9 or 90. Once the study started coding same sex relationships in 2017, the Husband of Head Relationship was dropped. In its place, the study uses Uncooperative Spouse (Relationship to Reference Person (‘Head’ before 2017) code 90), or Uncooperative Partner (Relationship to Reference Person code 92). These designations are used when one half of the couple is adamant about not giving information about the other half, or when one half adamantly refuses to have their information included. In rare cases, these Relationships to Reference Person will be used when the sample half of a couple has moved out of the FU (family unit) and into an institution and is still in an institution the next wave.

75.	Are cohabitors treated differently from legally married couples?

Prior to 2017, when a new (opposite sex) romantic partner of Head ('Reference Person' starting in the 2017 wave) moved into the FU (family unit), but had been living there less than 1 year at the time of the interview, that person was labeled a Boyfriend or Girlfriend (code 88). However, if the cohabitor had been living in the FU one year or more, the couple was designated (male)Head and "Wife" (code 22 from 1983 on). If a Girlfriend or Boyfriend was still in the FU in the next wave, and the couple were not married, they became (male) Head and "Wife". If the person who moves in is married to the Head, they are of course, male Head and Wife (code 20), regardless of time living in the FU.

Boyfriends and Girlfriends are treated like other family members who are not Reference Person (‘Head' prior to 2017), Spouse or Partner. Considerably less information is obtained about them. In the waves since the late 1970s, information typically gathered for a Spouse has been gathered as well about a Partner ("Wife" before 2017).

Starting in 2017, the Girlfriend or Boyfriend can be the same sex as the Reference Person (‘Head' prior to 2017). In unmarried male plus female couples, the male still becomes the Reference Person once the "living in the FU for at least one year" criterion has been met, and the female half of the couple would be Partner. However, in same sex couples, the sample member, whether male or female, remains the Reference Person and the other person becomes the Partner.

Prior to 1983, the Relationship to Head ('Reference Person' starting in the 2017 wave) codes did not distinguish between legal Wives and long-term female cohabitors. However, first year cohabitors can be detected prior to 1983 with a little bit of work. For example, their Relationship to Head would be 8 (nonrelative), their gender would be the opposite of Head's, and in subsequent years they may become Wives or Heads, while the Head would stay as Head or become a Wife. Anyone fitting this pattern can be decisively identified as a cohabitor. PSID did not distinctively label same sex cohabitors prior to 2017.

76.	Why are data available as "Packaged" if they are also in the Data Center?

Before the Data Center was created, PSID data were distributed as "packaged" files. Because some users prefer the packaged files, we continue to provide data in this format.

77.	How long are my data available for download after they are created?

Data files are deleted from our servers when they are 7 days old. After that, you can re-create your data file by logging into the Data Center and selecting "Previous carts".

78.	The website seems to be displaying incorrectly, why is this?

For security reasons, we are no longer supporting some older web browsers. Make sure your web browser is up-to-date, and please note that we suggest the following web browsers:

Google Chrome
Mozilla Firefox
Microsoft Edge (Windows only)

If you are using the latest version of Internet Explorer and are having display problems, click on Settings>Compatibility View Settings> and add psid.org to your list.

If problems persist, please contact us at [email protected].

79.	How does the amount of data collected in each wave vary by family unit members?

In general, a substantial amount of detailed data is collected for the Reference Person (the term ‘Reference Person’ has replaced ‘Head’ in 2017) and Spouse/Partner, if present. Considerably less detail is collected for other family unit members (OFUMs).

80.	What data are available in the area of housing?

The PSID collects many data elements about housing, including housing type, characteristics, ownership, tax, insurance, etc. A list of such items collected in each wave is available here.

81.	Where can I obtain information regarding release dates for files?

File release information is available through the News section of our website.

82.	How does the PSID distinguish between main and secondary jobs in the data files?

Up through the 2001 interviewing year, the PSID distinguished between Main and Extra jobs. Someone could not have an Extra job unless he/she held a Main job during the same time period. The extra job must be held simultaneously with the main job. We made this distinction between main and extra jobs throughout. If two (or more) employers overlapped, the interviewer was supposed to ask which was the main one during that time and note in an open ended question the overlap and the hours and earnings of both jobs. Then this overlap period was to be included in the extra job sequences (BD82-BD106/CE74-CE98). Those who are only temporarily laid off are still employed at a main job and, therefore, could have an extra job during that time period. However, those who are unemployed, whether looking or not, have no main job employer during the time in question. Hence, any small job they may have is considered a main job--since it's the ONLY job. Use the month strings and dates of beginning and ending employment in the work history to tell whether time at B/D72-74a or C/E64-66a is temporary layoff or unemployment.

Beginning with the 2003 interviewing year, the PSID dropped the main vs. extra job distinction as defined above. Jobs are now classified as "current main job", "most recent main job" or "other" job. If someone reports 2 or more current jobs, or 2 or more recent jobs that ended at the same time, the interviewer asks which job he/she considers his/her main job. That one is listed as the current (or most recent) main job. Any other job is listed as an "other" job. A job can be an "other" job even when it does not overlap with a current or most recent main job. This situation could arise, for instance, when someone reports two jobs, with the current main job beginning before the old ("other") job ended.

83.	What information about physical and mental health is collected by the PSID?

The PSID contains a wealth of information that can be used to study the health of Americans and their family members. Information collected in the main interview is summarized here. Health information collected in the Child Development Supplement is summarized here.

84.	How has the occupation-industry code classification system changed?

The PSID used a one-digit, and later a two-digit, occupation code until 1981 when the three-digit 1970 Census code became standard for the main jobs of employed Reference Persons (starting with the 2017 wave, the term ‘Reference Person’ has replaced ‘Head') and Spouses/Partners. It was also used for the most recent jobs held by Reference Persons and Spouses/Partners who were currently unemployed and looking for work and for any job held in 1980 by a Reference Person or Spouse/Partner who was currently retired or no longer in the labor force. From 2003, all occupation-industry data was coded using the three-digit 2000 Census code. A retrospective coding project used the 2000 Census to code first occupation and industry of all Reference Persons and Spouses as of 2003 and that of their fathers and mothers. Starting in 2017, the study is using the 4-digit 2010 occupation and 2012 industry codes.

85.	In some cases there are discrepancies from wave to wave for the age of the individual. Why is this?

Ages of individuals are asked and reported in each wave of the study. But interviews are seldom taken exactly twelve months apart for the same family from wave to wave. In fact, a family responding early in the interviewing period one year might respond late in the next year’s interviewing period, with 18 or more months between interviews for annual interviews (from 1968-1997). Conversely, a late responder in one wave could be an early responder in the next wave. Since the PSID transitioned to biennial interviewing (1999 through the present), the age gap can widen even further. Because of interview dates, there is a good possibility that an individual appears to have aged excessively or not at all. Also, individuals’ ages or birthdates can be misreported. Consistency checks for age discrepancies have always been done internally, but they are not altered if it cannot be determined which age is correct.

86.	Why does fertility appear to be higher in the PSID than in the US for Black individuals in some birth cohorts?

A modest upward distortion in the weighted estimates of Black individuals with children has been identified in PSID, beginning in the late 1990s, for selected cohorts. We recommend analysts use caution in estimating with the PSID beginning in 1997 the percentages with children in the household (and related statistics such as childlessness and fertility) for Black women born in the late 1960s and in the 1970s, and for Black men born in the late 1960s, 1970s, and early 1980s. This distortion may also affect estimates of multigenerational Black families in later years. Analysts who are not addressing questions of this type can reasonably ignore this concern. When estimating multivariate models with fertility-related outcomes for any year, we recommend that analysts include as a control variable the CDS eligibility indicator available on the 1997 Individual File (ER33418). Some analysts may wish to post-stratify the assigned PSID weights for Black individuals to the Current Population Survey (CPS) totals by presence of a child under the age of 13. PSID Technical Series Paper 16-01 provides additional details.

87.	How do I cite my use of PSID data?

PSID reminds data users to cite the data and acknowledge our funding source in all publications using the data in one of two ways:

Citation:
Panel Study of Income Dynamics, public use dataset [restricted use data, if appropriate]. Produced and distributed by the Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI (year data were downloaded).
BiblaTex
@dataset{PSID,
author = {{Social Research Center}},
title = {Panel Study of Income Dynamics, public use dataset [restricted use data, if appropriate].},
publisher = {Institute for Social Research, University of Michigan},
address = {Ann Arbor, MI},
urldate = {date data was downloaded}
}

Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and R01 AG040213, and the National Science Foundation under award numbers SES 1157698 and 1623684.

Effective May 25, 2008, anyone submitting an application, proposal, or progress report to the NIH must include the PubMed Central reference number (PMCID) or NIH Manuscript Submission reference number when citing applicable articles that arise from their NIH funded research: http://publicaccess.nih.gov/citation_methods.htm. In consideration of this policy, PSID requests that all journal articles based on analysis of PSID data or its supplements (either public or restricted-use) receive a PubMed Central reference number (PMCID). Journal articles must be submitted to PubMed Central to receive a PMCID. The method of PubMed Central submission and Investigator responsibility for submission depend on the journal and its publisher:

Some journals automatically submit published articles to PubMed Central, while other journal publishers may submit the articles to PubMed Central automatically or upon request by the author.
If neither the journal nor the journal publisher will submit the article to PubMed Central, the Investigator will be responsible for the submission. For detailed instructions on the process of submitting a journal article to PubMed Central, please see the NIH website: https://pmc.ncbi.nlm.nih.gov/about/public-access-info/

Researchers with PSID restricted-use contracts should include PMCIDs in their list of PSID publications submitted in biennial reports.

Researchers using PSID public-use data should send citations based on PSID publications you have authored which have PMCIDs to [email protected].

88.	The Journal I am publishing in wants me to submit my data, am I allowed to do that?

Though your code may be posted to other sites, the data itself should not be, per our Conditions of Use. We now have an account through the ICPSR’s our Data Repository , where you can safely store your data extracts, code, and documentation for your project.

89.	When is data collected for each wave of the study?

The interview period (field season) is roughly between March and November, with a few years being exceptions and going into December. If a user is interested in when a specific interview was conducted, there is a variable in the dataset (Date of Interview) which indicates month and day of interview.

90.	How often are main interview data collected for the PSID study?

Between 1968 and 1997, data were collected every year. Starting in 1999, the PSID collected data biennially (i.e., every other year). All waves of data starting with 1968 are available on the website, with each wave's public release file being posted on the website as soon as editing and processing can be completed.

91.	Why does the PSID provide weights for analysis?

The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples. Both samples are probability samples (i.e., samples for which every element in the population has a known nonzero chance of selection). Their combination is also a probability sample. The combination, however, is a sample with unequal selection probabilities, and as a result, compensatory weighting is needed in estimation, at least for descriptive statistics. Weight adjustments are also needed to attempt to compensate for differential nonresponse in 1968 and subsequent waves. Weights supplied on PSID data files are designed to compensate for both unequal selection probabilities and differential attrition.

In 1997, the Panel Study of Income Dynamics (PSID) underwent several important design changes that would affect weighting. Leading these changes was a roughly 1/3 reduction in the number of PSID Core families that will be eligible for continuous longitudinal data collection. A second important change to the 1997 PSID was the addition of a nationally representative sample of immigrant households and individuals that would not be eligible for PSID under the original 1968 sample recruitment and sample family "following rules". The 1997 data collection year also began the transition to every second year data collection for PSID. Finally, the 1997 PSID data collection included a special supplemental study of children age 0-12 in PSID Core and Immigrant Supplement families. Additional documentation describing the weights is provided on the documentation page.

92.	What variables should I use for complex sample survey variance estimation?

Variables ER31996 and ER31997 are used for computing complex sample design corrected standard errors/variance estimates via the Taylor Series Linearization or Repeated Replication methods. These variables may be used with a variety of software programs that incorporate the complex sample design into variance estimation, including Stata, SAS, Sudaan, SPSS and others. The Sampling Error Stratum variable (ER31996) may be specified as the "Stratum variable" in the design specification and the Sampling Error Cluster variable (ER31997) may be specified as the "Cluster Variable". Sampling error estimation in design-based analysis of the PSID data can be found here.

93.	Why do some cases exist where there is data available for certain variables, but the family weight is equal to zero?

These are families that contained no sample members. These are not mistakes in the data, but rather show cases where information was gathered about individuals not directly linked to a sample member. The PSID purposely followed some nonsample individuals, e.g., the nonsample elderly (1990-1996), nonsample parents (1994-2003). In some cases, families are response and contain only followable but nonsample individuals and therefore all the individual weights and thus the family weight for these cases are zero.