1.How can I get started analyzing PSID data? 
The PSID user manual provides historical context and basic design features of the PSID. There are several tutorials which provide step-by-step instructions on downloading and analyzing the data in a variety of ways.

The Data Center is the most popular means for obtaining PSID data, and it delivers about thousands of customized data files to researchers and quantitative social science students each year. The Data Center is fully automated and allows for user-specified subsetting criteria when downloading and merging data. Data can be generated in a variety of formats including ASCII, SAS, SPSS, and Stata.
2.How do I merge family- and individual- level files? 
The Data Center provides automatic and customized merges of files. For the analyst who prefers to write their own programming code to merge data downloaded from our zip packages, sample SAS and SPSS programs have also been prepared to assist users with creating cross-year analysis files.
3.How can I identify families from year to year? 
Each family unit in a specific wave is assigned a unique "Family Interview (ID) Number" valid for that wave only. In addition, each family also has a "1968 Family Identifier", also known as the "1968 ID". This is the Family Interview (ID) Number that was assigned to the original family in the 1968 interviewing wave. When sample members in any family move out and establish their own household, we interview them (these families are called "splitoffs", in the first year they are formed). These new "splitoff" families have the same 1968 ID as the family they moved out of, and keep that same 1968 ID each year. All families with the same 1968 ID contain at least one of the original members from the 1968 family or their lineal descendents born after 1968.
4.Do family ID numbers vary from year to year? 
For each family, the family ID number will most certainly vary from year to year. Yearly IDs are assigned based on the order in which interviews are received--the first interview in from field is numbered 1, the second, 2, and so on. This means it's very unlikely that a family with the Family ID Number 1234 in one year will get the same Family ID Number the next year, or any other year.
5.When is a new Head selected? 
A new Head is selected if any of the following conditions apply:
  • last year's Head moved out of the FU (family unit), died or (in some cases) became incapacitated; or
  • a female Head has gotten married; or
  • this is a splitoff family. (Note that this new head may have been the head of the family this new family split off from)
6.How can I tell the current Head and Wife from mover-out Head and Wife? Why is this important? 
To tell the current Head and Wife from mover-out Head and Wife, use the Sequence Number (SN) from the individual file. The current Head will always have SN=1, the current Wife/"Wife" (if there is one) will always be SN=2. A mover-out Head or Wife will have a SN in the range 50-89, depending on the move-out circumstances. The SN allows you to identify an individual's status with regard to the family unit and determine family composition change. It's important to understand family composition change to avoid spurious correlations in a longitudinal analysis where you are looking at variables pertinent to the same person(s) over time.
7.How do I assemble a Head/Wife file from an individual file? 
The easiest way to do this is by visiting the PSID Data Center which will create a customized dataset for you automatically.

Instructions for creating Head/Wife file from an individual file by writing your own programming code:

To create a single year Head/Wife file: Select individuals with Relationship to Head of "Head" (a code value of 1 for 1968-1982; code 10 from 1983 onward) and with values for Sequence Number in the range 1-20. The reason for using the Sequence Number variable is that non-response movers out have relationships to the PREVIOUS YEAR's Head, so two individuals within one family may have relationships of Head. One, however, is the real, current Head; the other is a mover out. (The type of mover-out can be determined from the value for Sequence Number. Refer to the individual file codebook for details.) To illustrate the importance of Sequence Number, assume that in the last wave we have an elderly married couple. He is the Head and she is the Wife--Sequence Number=1 and Relationship to Head=10 for him, Sequence Number=2 and Relationship to Head=20 for her. When we find them for the new interview, he has died and she has become the new Head--his Sequence Number=81 and Relationship to Head=10, her Sequence Number=1 and Relationship to Head=10. All the family data items about Head in the current wave refer to HER, not to him. Information about his income, etc. is located in OFUM (other family unit members) variables only. Similarly, to subset Wives or "Wives" in a current wave--select Relationship to Head=20 or 22 and Sequence Number=1-20.

To create a cross-year Head/Wife file: These concepts can be expanded to subset persons who have been Heads over a period of years--the yearly values for Sequence Number must be 1-20, and 1 or 10 for Relationship to Head. As a corollary, to select individuals who have been either Heads or Wives/"Wives", yearly Sequence Numbers must equal 1-20 and yearly Relationships to Head must be in the range 1, 2, 10, 20, or 22. Once that subset is made and family data are merged, information about an individual can be found in Head variables (Head's work hours, Head's labor income, etc.) when his or her Relationship to Head=1 or 10. When Relationship to Head is 2, 20, or 22, then her information is found in variables about the Wife/"Wife".
8.How can I identify splitoffs from the main family? 
Select only current heads (sn=1 and rth=10) from the individual file for the wave in question. Then, if head's moved in/out indicator=1 and month moved in/out=0, it's a splitoff. Otherwise it's a main family.
9.How is an individual uniquely identified? 
The combination of the 1968 ID and the person number uniquely identify each individual.

To identify an individual across waves use the 1968 ID and Person Number 68 Summary Variables ER30001 and ER30002. Though you can combine them uniquely in many ways we find that many researchers use the following method:

(ER30001 * 1000) + ER30002

(1968 ID multiplied by 1000) plus Person Number 68
10.Why using the latest version of the Cross Year Individual File is essential? 
The Cross Year Individual File has records for every person who has ever lived in a PSID study family (including some who moved out just before the initial interviewing year for each part of the sample, or were institutional in the first interviewing year; see the codebook for ER30002). Each study individual has a record for each study year; years when that individual was not listed in a study family are zero-filled. The file is organized by ID68, PN, and year.

In addition to those variables, the file contains individual-level information such as YEARID, SN, RTH, AGE, SEX, birthdates, move-in and move-out dates, follow status, type of individual, why non-response, several health insurance variables, variables indicating eligibility for supplementary studies such as CDS and DUST, and individual level longitudinal and cross-sectional weights.

It is essential to use the latest version of the file for analysis. The entire file is regenerated for every wave’s release not only to add the latest wave’s info, but to make corrections based on information received in the latest wave’s interviewing. These corrections are often to in information such as birthdate or RTH, and may affect several waves of data.

Most importantly, however, we also make corrections to ID68 and PNs, based on new information. Sometimes we discover that a person we believed to be a separate individual is actually the same as a family member we already know about, with a different person number. If the two individuals were response in different waves, we can combine the information into one record, keeping the record for one PN and deleting the record for the other. In other cases, we might learn that someone we thought was the biological child of a sample member actually isn’t. This would necessitate a change in PN from the "born-in sample" range (030-169) to the 170-and-up range, and also a change in follow status, since on the new information the child would no longer be sample. In another instance, we found that a woman in the immigrant sample with PN 170 was actually a spouse who had moved out prior to the first year of immigrant interviewing (1997 for this case), so should have had PN 227 (a PN with special meaning, see the codebook for ER30002).

Currently, we uncover about 100 of these person-number fixes each wave. So failure to use the latest cross-year individual file may result in errors in both data and analysis.
11.How can I determine if data will be collected about an individual who is not present in an interviewed family? 

It depends on the situation.

  • For persons who are mover out deceased, some OFUM (other family unit member) information is collected for the wave they are reported to have died.
  • For persons moving out to an institution, some OFUM information is collected during the wave they are reported to have moved out from the family.
  • For persons moving out to another household but no interviews are conducted with the new family unit, some OFUM information is collected the wave they are reported to have moved out.
  • For persons already in institutions, no new information is collected.
  • For persons who attrited from the study, no new information is collected. However, a large recontact effort was initiated in 1992.
  • For persons not yet born or not yet appearing in the study, no information is collected that wave.

Beginning with the 1999 wave, when the PSID switched from annual to biennial interviews, the following rules for movers out apply only if the person moved out in the calendar year before the interview. For example, in the 2007 data, there will be information for movers out on or after 1/1/2006, but no information for movers out before that date.

12.How can I determine which variables are comparable across years? 
The cross-year index can help you identify comparable variables across the years.

You can also look at the ‘Years Available’ section of each variable’s codebook entry for a year-by-year listing of when that variable is available in the data center.
13.How can I tell if a variable value is actual or imputed? 
A missing data value is either identified as such (value=9) or an imputed value is assigned in lieu of a missing data code. If an imputed value is assigned, an associated "accuracy code" variable describes the nature of the assignment.
14.How can I identify the SEO (Survey of Economic Opportunity) sample and the SRC (Survey Research Center) sample? 
You will need to look at the 1968 family interview number available in the individual-level files (V30001 and ER30001 in 2007).

SRC sample families have values less than 3000.
SEO sample families have values greater than 5000 and less than 7000.

Immigrant sample families have values greater than 3000 and less than 5000. (Values from 3001 to 3441 indicate that the original family was first interviewed in 1997; values from 3442 to 3511 indicate the original family was first interviewed in 1999.)

Latino sample families have values greater than 7000 and less than 9309. (Values from 7001 to 9043 indicate the original family was first interviewed in 1990; values from 9044 to 9308 indicate the original family was first interviewed in 1992.)
15.For what years are Latino data available? How do Latino data differ from immigrant data? 
In 1990 the PSID added 2,000 Latino households consisting of families originally from Mexico, Puerto Rico, and Cuba. But while this sample did represent three major groups of immigrants, it missed out on the full range of post-1968 immigrants, Asians in particular. Because of this crucial shortcoming, and a lack of sufficient funding, the Latino sample was dropped after 1995, and a sample of 441 post-1968 immigrant families was added in 1997. In 1999, an additional 70 families were added in for a total of 511 immigrant families as of 1999. These families are included on the files along with the core PSID families
16.Where can I find information on the 1997-1999 Immigrant Sample? 
Information on the Immigrant Sample is available in the 1997 and 1999 main interview documentation.
17.Codebook information for some variables does not show index information. Why? 
Variables from supplemental files are not yet in the index. We plan to add these variables to the index in the future.
18.Why do some supplemental files have fewer observations than the main family files? 
Some supplemental files were created only for sub-samples. For example, the Disability and Use of Time Supplement (DUST) only collects information on Heads and Wives of a certain age.
19.What identification variables should I download? 
The Data Center automatically includes all appropriate identification variables for your file.
20.How does one analyze data from families from one wave to the next? 

Users often want to look at data from the "same" family in adjacent waves. It is important to understand that there is no absolute definition of "same" family. Families are made up of individuals who may move in or out of study families from wave to wave. It is up to the user to decide what he or she means by "same" family. The user may want to restrict this definition to option 1) absolutely no changes in the composition of the family since the previous wave. All the individuals that were in the prior wave are still in the current wave - no one has moved in and no one has moved out. Alternatively, the user may want define "same" family as option 2) those who have the same Head in both waves.

In order to subset those cases which the user has defined as "same" family, he or she will find the Family Composition Change variable most useful. The Family Composition Change variable indicates the degree of change in this family since the prior wave's data collection. For option 1, the user would subset the families in the current wave where Family Composition Change variable = 0. For option 2, the user would subset the families in the current wave where Family Composition Change variable in (0,1, 2).

For 2007, for example, the Family Composition Change variable is ER36007.

21.What is the CDS and how does it relate to the PSID? 
The Child Development Supplement (CDS) is one research component of the Panel Study of Income Dynamics (PSID), a longitudinal study of a representative sample of U.S. individuals and the families in which they reside. Since 1968, the PSID has collected data on family composition changes, housing and food expenditures, marriage and fertility histories, employment, income, time spent in housework, health, consumption, wealth, and more.

While the PSID has always collected some information about children, in 1997, PSID supplemented its main data collection with additional information on 0-12 year-old children and their parents. The objective was to provide researchers with a comprehensive, nationally representative, and longitudinal data base of children and their families with which to study the dynamic process of early human capital formation. The CDS-I successfully completed interviews with 2,394 families (88%), providing information on 3,563 children. In 2002-2003, CDS re-contacted families in CDS-I who remained active in the PSID panel as of 2001 for CDS-II, and again in 2007-2008 for CDS-III. A new cohort of the CDS was begun in 2014.

By nature of the CDS being a supplement to the PSID, the study takes advantage of an extensive amount of family demographic and economic data about the CDS target child's family, providing more extensive family data than any other nationally-representative longitudinal survey of children and youth in the U.S. In addition, the PSID-CDS data are "intergenerational" in structure with information contained in several decades of data about multiple family members. This rich data structure allows analysts a unique opportunity to fully link information on children, their parents, their grandparents, and other relatives to take advantage of the rich intergenerational and long-panel dimensions of the data.
22.What information does the CDS collect about its sample children? 

Within the context of family, neighborhood, and school environments, CDS studies a broad array of developmental outcomes including (but not limited to) physical health, emotional well-being, intellectual achievement, and social relationships with family and peers. These outcomes are measured through reliable, age-graded assessments of cognitive and behavioral development and health status indicators obtained from the primary caregiver, a secondary caregiver, the elementary school teacher (for the younger children), and the sample children/youth themselves; anthropometric measures of height and weight of the sample children/youth; a comprehensive accounting of parental (or caregiver) time inputs to children/youth as well as other aspects of the way the children/youth spent their time; and other-than-time use measures of other resources for example, the learning environment in the home (using the HOME Scale measures), school resources, as reported through the National Center for Education Statistics Common Core of Data, and decennial-census-based measurement of neighborhood resources. The multi-level, interdisciplinary, and longitudinal nature of the research design facilitates analysis of the relationships between these developmental measures and changes in family structure and living arrangements, neighborhood economic and social conditions, and school resources and programs.

23.Who funds the CDS? 
The CDS is made possible by the generous funding of the National Institute of Child Health and Human Development, the National Science Foundation, and the Economic Research Service of U.S. Department of Agriculture.

The William T. Grant Foundation, the Annie E. Casey Foundation, and the U.S. Department of Education provided additional funding for CDS-I.
24.How can I get started analyzing PSID data? 
The PSID user manual provides historical context and basic design features of the PSID. There are several tutorials which provide step-by-step instructions on downloading and analyzing the data in a variety of ways.

The Data Center is the most popular means for obtaining PSID data, and it delivers about thousands of customized data files to researchers and quantitative social science students each year. The Data Center is fully automated and allows for user-specified subsetting criteria when downloading and merging data. Data can be generated in a variety of formats including ASCII, SAS, SPSS, and Stata.
25.Where can I obtain copies of the questionnaires and other study documentation? 
CDS questionnaires are located at the questionnaires and supporting documents page.
26.Do I need to use the sample weights with CDS and TA data? 

The CDS-TA sample was drawn from PSID families with children 0-12 years in 1997. The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples. Both the CDS-TA and PSID samples are probability samples (i.e., samples for which every element in the population has a known nonzero chance of selection). Their combination is also a probability sample. The combination, however, is a sample with unequal selection probabilities, and as a result, compensatory weighting is needed in estimation, at least for descriptive statistics. Weight adjustments are also needed to attempt to compensate for differential nonresponse across waves. Weights supplied on CDS and TA data files are designed to compensate for both unequal selection probabilities and differential attrition.

In the 2002 and 2007 CDS demographic files, you will find a set of indicator variables for each module that specify (a) if a case was eligible for that module and (b) if a record exists for that case in the corresponding data file. These variables are helpful to merge onto your Data Center data request if you are merging variables from multiple CDS modules. The sample weight in the Demographic file is adjusted only for the non-response in the main module, Primary Caregiver (sections A-H, J). The module indicator variables, however, will inform you about item missing data across modules. It is up to you to then decide on your preferred approach for addressing item missing data that results from differential response rates across modules (for example, you may leave it as missing, impute scores, etc). The TA data files contain wave-specific sample weights.

More documentation on the CDS and TA weights can be found at the documentation page.

27.How do I find information about the CDS Target Child's demographical background? 
Every individual in the PSID - including the children - has both an "ID68" (1968 Family Identifier - ER30001 in 2007, for example) and "PN" (Person Number- ER30002 in 2007, for example) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers to find information about the CDS targeted child and caregivers in the PSID data files. Background information about the CDS target child, such as birth date, sex, and relationship to the PSID family household head can be obtained from the PSID individual and sampling variables files. Use the ER30001 and ER30002 combination to select the PSID variables for just the CDS target child sample, or, when you get to the "Output Options" page in the Data Center, after selecting the variables you want, select "CDS Children" at the bottom.
28.Are there additional data files from the PSID that would be useful to me as a CDS data user? 

There are two PSID family history files that may be of particular interest to CDS users: the Childbirth and Adoption History File and the Parent Identification File.

Childbirth and Adoption History File: The Childbirth and Adoption History File is specifically designed to facilitate access to detailed information collected since 1985 regarding histories of childbirth and adoption. Variables on this file include the identifiers for each parent and child, month and year of birth for both parent and child, birth order, birth weight and date of death for a child, year of most recent report and number of births/adoptions, etc. Data on this file are structured in a one-record-per-event format, with each record representing a specific childbirth or adoption event.

Parent Identification File: The Parent Identifier File synopsizes information collected from various sources since the 1983 wave of PSID about parent-child relationships. This file consists of identifier variables that link children with their parents. The file is intended to be used to facilitate linking children's and parents' data records from the Individual File. Linkages can be done from either the child's or a parent's standpoint.

29.How do I obtain information collected in the main PSID about the CDS target child's caregivers? 
There are a large number of variables in the PSID that can be used along with CDS.

Demographic, health, economic, and other family data about PCG (primary caregivers) and OCG (other caregivers) can be found in the PSID data files. Every individual in the PSID has both "ID68" (1968 Family Identifier - ER30001) and "PN" (Person Number- ER30002) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers to find information about the CDS targeted child and caregivers in the PSID data files. These identifier variables are available through a Child to Caregiver Map.
30.How do I find the identification numbers of the CDS target child's caregivers? 
The "child to caregiver map" provides "1968 INTERVIEW NUMBER" (ID68) and "PERSON NUMBER 68" (PN) for CDS individuals. These CDS individuals are the target child, the target child's primary caregiver (PCG) and the target child's other caregiver (OCG), if one exists. Missing data means that the child did not have an OCG for the CDS interview year.

All CDS files, by default, contain variables ER30001 (1968 INTERVIEW NUMBER) and ER30002 (PERSON NUMBER 68). Since these variables are also in the map file, the map file can be used to merge PCG and OCG data from PSID Individual data to CDS Child level data in a two step process.
31.How do I identify siblings in the CDS-II data files? 
There are two steps to locating data for siblings in the CDS-II data files:

In the Demographic Data File, there is a sibling indicator variable that tells you if a CDS target child had a sibling who also participated in the CDS-II data collection.

Automatically appended to your data download is the Family Interview or Identification number for the corresponding PSID main interview. This variable uniquely identifies the family.

Using these two variables, you can locate data on a wide range of information about the target children and their siblings in the CDS.

See also the codebook explanation text for the family identification number. There is a variable for any year in the PSID in both the individual and family files.
32.How was height and weight measured in CDS-I , CDS-II, and CDS-III? 
In CDS-I, height of the child was measured by the interviewer and weight was reported by the parent. In CDS-II and CDS-III, both height and weight were measured by the interviewer.
33.What is the Behavior Problem Index (BPI)? 
The Behavior Problem Index was originally developed by James Peterson and Nicholas Zill from the Achenbach Behavior Problems Checklist to measure in a survey setting the incidence and severity of child behavior problems. The BPI scale is based on responses by the primary caregiver as to whether a set of 32 problem behaviors is often, sometimes, or never true of the targeted child.
34.What subscales are available on the Behavior Problem Index (BPI)? 
These items are then divided into two subscales: 1) a measure of externalizing or aggressive behavior and 2) a measure of internalizing, withdrawn or sad behavior. The User Guide specifies the individual items that map into the internalizing and externalizing subscales.
35.How is BPI scored? 
We performed a confirmatory factor analysis on our two expected subscales. The results showed that the items grouped into these two factors quite readily, with one variable overlapping on both subscales, as did in CDS-I, and two variables not loading at all. We constructed an overall or total BPI score, using all 32 items, as well as separate scores for each of the two subscales, internal or withdrawn and external or aggressive. Before scoring, the individual items are recoded such that a score of "1" becomes "0" and a score of "2" or "3" become a "1". Scores for the total BPI and Externalizing and Internalizing are sum scores. Higher scores on these measures imply a greater level of behavior problems. Cases were included if they had data approximately 75% valid data on the variables contributing to the BPI Indices.
36.What is the HOME-SF? 

The Home Observation for Measurement of the Environment-Short Form from the Caldwell and Bradley HOME Inventory is used as a measure of cognitive stimulation and emotional support that parents provide to their children. The particular items used in the PSID Child Development Supplement were taken directly from the National Longitudinal Survey of Youth, Mother-Child Supplement so that the scales would be as similar as possible. The HOME-SF items include both parent/caregiver-reported items and interviewer observations of the home and neighborhood environment. The HOME-SF is divided into four parts:

  • Infant/Toddler (IT) HOME, designed for use during infancy (birth to age three);
  • Early Childhood (EC) HOME, designed for use between 3 and 6 years of age;
  • Middle Childhood (MC) HOME, for use between 6 and 10 years; and
  • Early Adolescent (EA) HOME, designed for use from 10 to 15 years old.

Additional information about the HOME-SF in the CDS can be found in the CDS User Guide.

37.How is the HOME-SF Scored? 
We have included three scores for HOME-SF for each age module appropriate for CDS-II and CDS-III data: 1) a total raw score, 2) an emotional support subscale raw score, and 3) a cognitive stimulation subscale raw score. The total and subscale raw scores for the HOME-SF are a summation of the recoded individual item scores and varies by age group, as the number of individual items varies according to the age of the targeted child / youth.
38.What is the Woodcock-Johnson Revised Test of Achievement? 
The Woodcock-Johnson Psycho-Educational Battery-Revised (WJ-R) provides a normed set of tests for measuring cognitive abilities and academic achievement. In the CDS-I, CDS-II and CDS-III, we selected three subtests as a measure of reading and match achievement: the Letter-Word, the Passage Comprehension, and the Applied Problems tests (the Calculation test was additionally administered in CDS-I. These scales can be used individually, or in the case of the four subscales, combined to create scores for Broad Reading and Broad Math. When applicable, the Spanish version of the WJ-R (Batería-R, Form A), was used for children whose primary language was Spanish.
39.How are the Woodcock Johnson Tests scored? 
The Woodcock-Johnson Revised (WJ-R) tests of achievement have standardized administrative and scoring protocols. The tests are designed to provide a normative score that shows the CDS target child's reading and match abilities in comparison to national average for the child's age. The normed scores are constructed based on the child's raw score on the test (essentially the number of correct items completed) and the child's age to the nearest month. Raw scores are charted on normative tables based on the child's age and what percentile the child falls into. More information on scoring is provided in the CDS User Guides.
40.Why isn't there a Broad Math Score for CDS-II? 
In CDS I, we included two Woodcock Johnson - Revised math-skill tests: Calculations and Applied Problems. A broad math score was constructed based on these two tests. In CDS II, we only included the Applied Problems; hence, no broad math score can be constructed - just a score for applied problems.
41.How do I know if PSID or CDS data files have been updated? 
File release information is available through the News section of our website. You can also sign up to have the news delivered to your email by logging in and selecting to receive updates on the "Settings" page.
42.Why won't the Data Center let me create a file merging CDS Time Diary data files with other data? 
Only one file is allowed in your cart if CDS Time Diary is selected. To add CDS Time Diary variables to your cart, you must select variables from just one file, and there cannot be any variables from other files in your cart. Time diary data are not at an individual or family level (like other data in the data center), so the data center does not "know" how to merge it.
43.How does the geographic information in the public release files differ from the restricted files? 
The public release files, which can be downloaded directly from the PSID website, contain geographic information of a more generalized nature such as region and state of residence. A collapsed version of the Beale rural-urban code is available for some years in the data center, as well as in the supplemental files located here:

These data will meet the needs of most users. Users in need of more specialized geographic information may want to request use of the restricted PSID Geocode Match files. These files include the identification codes necessary to link data from the PSID annual family files to Census data. This linkage allows the addition of information regarding the characteristics of the geographic area in which individuals and families lived (e.g., the neighborhood and/ or the labor market area) to the PSID individual- or family-level data. This should in turn allow investigation of the effects of non-family "context" variables on family and individual outcomes.

In the past, we provided selected variables from the Census in aggregated forms (i.e., Census Extract Files); however, we no longer support these files. In recent years, there has been a rapid growth of external sources that provide an increasing variety of measures of the neighborhood environment.
44.Who may obtain restricted data? 
Individuals who conduct scientific research and hold a full-time, permanent, doctoral-level faculty appointment and obtains the approval of their research and data protection plan through a human subjects institutional review board may request use of restricted data.
45.What application materials need to be submitted to obtain restricted data? 
Materials needed to initiate a restricted data contract include:
  1. research plan - outlines and validates the need for restricted data use
  2. restricted data protection plan - describes in detail the work environment in which the data analysis will occur, the qualifications of the personnel that will perform the work, and the steps in place to ensure that the data will remain secure and confidential
  3. IRB approval from your institution
  4. Curriculum Vita - for each person involved in the research project
  5. PSID Order Request Form - indicates which restricted data Files are requested for use
  6. Contract - legal agreement between your institution and the University of Michigan - Panel Study of Income Dynamics [PSID]
  7. Non-refundable administrative fee of $750.00 USD.
46.How long does the process take to obtain restricted data? 
Once all application materials have been submitted, review by the PSID restricted data committee usually occurs within two weeks. Once the application is approved, the contracts must be signed and submitted along with the non-refundable administrative fee. After these steps have been completed, the data are provided via secure ftp or shipped on a CD. The average processing time is between one to two months and on rare occasions as long as six months. The vast majority of delays occur as a result of contract language change requests by the requesting institution.
47.May more than one type of restricted data set be requested? 
Yes, more than one type of restricted data may be requested; however, the researcher must document in their research plan the need and purpose for use of more than one type of data set.
48.May the restricted data be used for more than one research project? 
No. Each contract is project specific and is validated through the research plan.
49.May more than one person use the restricted data for their own specific project? 
Multiple research projects are not permitted on one contract. There may be multiple persons involved in a research project, but all must be working to achieve a common goal/outcome which is described in the research plan.
50.Can other researchers who are not members of my institution work on my restricted use research project? 
Yes, but the restricted data may only be accessed and used by the personnel named on the contract at the contract institution and within the description of the work environment described in the data protection plan.
51.Is it possible for others outside my institution to also have access to the restricted data? 
Yes. Each PhD level faculty member not at the receiving institution must enter into a separate contract. This requires that they submit all the required documentation, sign a separate contract and pay their own non-refundable fee. When this occurs, each contract holder is provided with the restricted data.
52.I am a graduate student. May I obtain the PSID restricted data files? 
Graduate students must work with the restricted data under the supervision of a full-time, permanent, doctoral level faculty person at their institution. The faculty advisor is named as the Investigator on the contract and is responsible for ensuring that all confidentiality and security measures are upheld. Should a faculty advisor leave the receiving institution, they are responsible for notifying the PSID of this change. Graduate students must then obtain a new faculty advisor who will be named on the contract until their research project is completed.
53.My research project is quite complex and may take some time to complete. Is there a time limit on a contract? 
Contracts are limited to three years with extension paperwork due every six months, as requested by the PSID contract administrator.
54.My role will be an acting faculty advisor to a graduate student. What are my responsibilities as faculty advisor? 
The faculty advisor is named as the Investigator on the contract, and assumes responsibility for maintaining the protection of the data and is responsible for ensuring that the contract paperwork is kept current. This includes returning a complete Request for Extension form to the PSID every 180 days while the contract remains active. The faculty advisor is also responsible for ensuring that security measures are kept in force for all restricted data work and restricted data storage. They are also responsible for seeing that the contract closure occurs once the dissertation has been published.
55.As a faculty advisor, am I able to use my student's restricted data through their contract? 
Multiple contracts for a number of research projects are permissible as long as an individual contract is submitted for each research project. A contract will be processed and established for each research project requested. Each contract has its own paperwork, contract and fee.
56.My institution does not have an IRB. How may we meet this application requirement? 
The PSID has a list of external IRB agencies that researchers may use to fulfill this application requirement. These agencies do charge a fee for their services, and the researcher must work directly with them to obtain approval. Contact the help desk for further assistance.
57.Our IRB only responds via email when approval has been obtained. How can we provide the PSID with this information? 
Researchers may forward the email approval to the PSID.
58.Our IRB only grants approval for one year. Since contracts are for three years, how can we fulfill this requirement? 
The researcher is responsible for maintaining active IRB approval during the entire course of the contract and must submit updates or renewals to the PSID.
59.One application requirement is the Curriculum Vita [CV]. Whose CV should be submitted with the application materials? 
Everyone involved in the research project should submit a CV. Graduate students should submit their CV as well as the most recent CV for their faculty advisor.
60.My research project has received approval from the PSID restricted data committee. What is required next? 
Researchers or graduate students are required to obtain signatures on three original contracts and submit them to the PSID so they may be fully executed through the University of Michigan. One original signed contract will be returned to your institution. To expedite the finalization of the contract process, researchers are encouraged to include payment with in the contract packet sent to the PSID.
61.We are confused regarding the signatures on the contract - who signs where? 
The researcher signs as Principal Investigator. Co-investigators may also sign, where applicable. There is also a Supplemental Agreement form where Computing Support and Research Assistants may sign when appropriate.
62.Who signs in the Receiving Institution Section on the contract? 
This section should be completed by those personnel who are designated by your institution to enter into contract on behalf of the Receiving Institution. Usually this is the General Counsel's Office, or Legal Counsel. At times the Director of Purchasing may be designated to sign since the contract is being issued to obtain restricted data. If further clarification is needed, please contact the help desk for assistance.
63.Our state has its own legal requirements regarding entering into contracts, is the contract language negotiable? 
Possibly. Please contact the help desk about any concerns regarding contract language issues. It is important to understand that any requests for language modification can create significant delays in fully executing the contract and shipment of the data.
64.What methods of payment are acceptable to cover the administrative fee? 
Payment may be received through personal check, purchase order with institutional check or by personal or institutional credit card.
65.My institution will not pay for the restricted data without an invoice. How may a request for invoice be made? 
Invoices may be obtained by emailing the help desk.
66.What is the purpose of the administrative fee? 
The administrative fee covers contract administration expenditures, production of CD-ROMs, and four hours of consulting time. This fee also partially covers the cost of administering the contract over the three year span.
67.Can the administrative fee for using the restricted use data be waived or the amount negotiated? 
The Administrative fee is not negotiable because the PSID is federally funded and is required to provide the same services and keep the same regulations and guidelines for all contract holders.
68.As a researcher, what are my responsibilities during the contract period? 
During the three year period, the PSID will send the primary contract holder a Request for Extension every 180 days which requests updated contact information and must be returned to the PSID in order to keep the contract active. After the three year period, or at the conclusion of the project if this occurs before three years, the data must be destroyed or returned to PSID. PSID will hold the data for the contract holder as long as needed.
69.What paperwork will be required for the new contract [second contract]? 
The new contract will require all the same application documents as the first contract. For ease of transition, updated or modified documents are acceptable. An updated or new IRB approval must also be submitted. Once these are submitted, a new contract must be signed and the $750 non-refundable fee also submitted. The researcher may request updated restricted data if a new version has been released since their original project began.
70.I have received the restricted use data. What is next? 
Once you have determined that you have received the data you expected, please email to confirm receipt of the data.
71.Are there any other requirements now that I have the restricted use data and my contract is now active? 
Yes. Researchers must submit a Request for Extension Form every 180 days until contract closure [twice per year]. There will be a total of five Requests for Extension submitted during a three year contract. The 180 days begins from the date your shipment was received. This form notifies the PSID there are no changes to your contract, your contact information has not changed and your research is still actively being conducted. Graduate students will need to work with their faculty advisors for signatures and e-mailing of the form to the PSID.
72.What happens if I do not submit the Request for Extension Form? 
Your contract is considered Out-of-Compliance. Contracts that are active and in compliance are eligible to request Restricted Data Set updates. The updated data is provided at no additional charge to these researchers; however no such updates are provided in the event that a contract is out of compliance. Also, failure to submit this documentation during a contract could jeopardize a researcher's future request for restricted data. Sometimes unavoidable circumstances can cause delays in the submission of the Extension form and PSID staff are more than willing to work with institutions to facilitate the process.
73.Another researcher wants to use my restricted use data files. Is this possible? 
No. Under no circumstances can the restricted data be shared with individuals who are not named on the contract. Contracts are "project specific" - each researcher must obtain their own contract for their research to ensure that they maintain respondent confidentiality.
74.I have completed my research using the restricted use data. What are the next steps that need to be taken? 
Contact the help desk to coordinate contract closure paperwork.
75.Can I keep all my data files that were created with the restricted data files? 
All restricted data and derived files must either be destroyed or returned to the PSID for secured storage. Many researchers elect to return their files to the PSID for secured storage allowing them to use the data in the future. PSID Help will coordinate with a researcher the paperwork requirements and return of previously created data sets and derived files.
76.We have destroyed the restricted data and all derived files. What is next? 
The Certificate of Compliance Form needs to be completed, signed and returned to the PSID. All data sets may also be sent to the PSID at this time. For assistance in completing this paperwork, contact PSID Help.
77.The data files that are posted for each new wave are called Public Release. What does Public Release mean?  

All Public Release data files have been processed and edited, and should meet the research needs of all users.

Over the past several years the PSID staff, using Computer Assisted Telephone Interview (CATI) technology and companion processing software, have significantly improved the quality and reliability of the timely release of data files. We now refer to the files posted for each new wave as Public Release Data. Note that:

1. Longitudinal data are subject to revision based on the most recent information received from individuals and families. New information that we find during family composition and economic editing in one wave may require revisions to previous waves. As additional data are collected through time on our two year collection cycle, prior files may be edited in light of the new information. Both the values of the variables themselves and the relationships of individuals to the families to which they are connected may be edited. Normally such changes are made only for a small number of cases.

2. An extensive set of computed or generated variables are included in the Public Release Data. As time and resources allow we occasionally add selected new generated variables for later release.

Since the PSID data files, as with the data files from any complex longitudinal study, are subject to minor changes and subsequent updated releases, due primarily to economic and family composition editing activities, it is therefore highly recommended that users retain and save all data files that are downloaded from this site and upon which individual research analysis is dependent. Only the most current data files are retained by PSID staff for distribution.

78.Some older documents reference Public Release II and Public Release I data; what does that mean? 
The term "Public Release I" used to refer to files released for general public use after they have been reviewed for data quality checks and consistency in both the reported family listing and the relationships among family members (this review process is called "family composition editing").

The term "Public Release II" was previously used to refer to files which had undergone additional data checks to correct a very small number of cases and had been formatted in a more convenient form.

Because of successive improvements in our Computer Assisted Telephone Interviewing (CATI) software that PSID began using in 1993, the quality of the Public Release I files improved in recent waves, allowing the use of these data with confidence. There is now no longer a necessity to release two versions of the Public Release files.
79.What is the definition of a main family, a reinterview family, and a split off? 
A reinterview family is a family unit that was interviewed in the prior wave.

A main family is one that is the source of a splitoff family (a new study family formed by a sample member who moves out and forms his or her own family unit). In some divorce or separation situations, both resulting families will contain sample members, so both will be interviewed. We interview the first spouse we are able to contact as the main family, while the other spouse will be in the splitoff family. In the case of children leaving home, the main family is almost always the parental family.

A split-off family consists of a person or group of people (at least one of whom is a "follow" person of any age) who moved out from a main family since the prior wave's interview to form a new, economically independent family unit living in a separate housing unit. Several criteria must be met for a split-off to occur. In addition to having moved out since the prior wave, and to being 'followable', the person or group of people in general may not have moved to an institution such as college or prison or to another family unit within the panel study. Moreover, the person or group of people who have moved out and formed their own family unit must be economically independent from the family unit from which they split off. These are general rules, however, and sometimes unique situations arise that determine whether a person or group of persons becomes a split-off. For example, while moving to an institution such as college does not generally meet the criteria for becoming a split-off, if the person is working, paying their own living expenses, and paying their own educational expenses in addition to attending school, then this person could be interviewed as a split-off. The living situation and interview data for each and every possible split-off case are first reviewed before split-off status is granted. Note that a splitoff family is only designated as a splitoff in the wave in which the family is newly formed and interviewed for the first time. In subsequent waves, they are considered a reinterview family.
80.What is the difference between a family unit (FU), a household unit (HU), and a family unit member? 

In the PSID study, we are attempting to learn about our sample members, and the families in which they live. Each of these families is called a family unit (FU). The FU is defined as a group of people living together as a family. They are almost always related by blood, marriage, or adoption. And they must all be living in the same HU (see below).

Occasionally, unrelated persons can be part of an FU. They need to be permanently living with the family and share both income and expenses.

Any person in a study family is a family unit member. The term "other family unit member" (OFUM) is used of members who are not the Head or Wife/"Wife".

The household unit (HU) is the physical dwelling where the members of the FU reside. It can be a house, townhouse, apartment, a room in a rooming house, even a tent or a car.

Not everyone living in an HU is automatically part of the FU. There may be other people living in the HU temporarily who do not meet the criteria of relatedness and economic integration. The PSID data is about FU Members only.

81.Who is a Sample Member and what is Follow Status? 

Sample Members are individuals who were living in the original FU at the time of the very first interview their lineal descendants born after 1968. (For subsequent samples, such as the immigrants, the year of the first interview serves as the base for determining who is an original sample member, and all individuals present in the family at that time qualify.)

Follow status indicates whether we are interested in continuing to interview an individual. In general, sample members are always considered Followable. Non-Sample Members can be Followable too, if they represent a population of current interest. For example, we have in the past, followed such people as Non-Sample parents of sample children who were aged 25 or younger.

You can tell who is a sample member by looking at the individual's Person Number and Follow Status. Original Sample Members who were living in the original study FU in the first year of interviewing were given Person Numbers in the range of 001-019. Any Head's Spouse in the original interviewing year who was living in an institution was given a Person Number of 020. In addition, children of the Head (and Wife if present) who were under age 25 and in an institution the first year were considered Original Sample members and given Person Numbers in the range 0021-029. All of these people are followable.

Individuals who were born into a sample family after the first interviewing year and have a sample parent are considered "born-in Sample Members" and receive Person Numbers in the range of 030-169. All born in sample members are followable.

Some individuals who qualify as sample members (because they have a sample parent) are not born into a study family, but move in later. These "Moved in Sample Members" have Person Numbers of 170 or greater and are Followable.

All other people who have ever lived in a PSID family are not sample individuals. They also receive Person Numbers of 170 or greater, but are not Followable.

82.What is the difference between response and nonresponse family unit members? 

Response family unit members are those residing in an interviewed family at the time of interview. Nonresponse family unit members are those not residing in an interviewed family at the time of interview; they may have attrited, not yet appeared in the study, or not yet been born by a particular wave.

The phrase "main family nonresponse" means that both the individual and his or her family have at that time become lost to our study, although either or both may reappear in the study in subsequent waves. In the wave just prior to becoming nonresponse, the individual was connected with a family interviewed by our study; thus, both family and individual data are available for that prior year, and the individual's Sequence Number at that time was 01-59. However, data were collected for neither the individual nor his or her family in the nonresponse wave. The data for the wave in which nonresponse occurs (and all subsequent waves if and until the individual reappears as a member of a responding family unit, including a recontact family) are zeroes excepting the variables for type of individual record and reason for nonresponse, and if an individual was selected for recontact, follow status and reason for following the individual.

In contrast, mover-out nonresponse individuals have left a family that was still in the study. Since such individuals were usually present in that family for at least part of the calendar year preceding nonresponse, they have some additional nonzero data for the wave in which they became nonresponse, such as part-year income information. In later waves, mover-out nonresponse individuals are treated in two ways, depending on why they left the family. Those who moved out to institutions have several variables (Sequence Number, age, sex, Relationship to Head, type of individual and reason for nonresponse) with nonzero values, although income, housework, and other individual-level variables are filled with zeroes. Eventually, such an individual may (a) become response by moving into a family or by becoming a splitoff, (b) move from the institution and remain mover-out nonresponse (shown when Sequence Number=71-89), or (c) become main family nonresponse because the family itself became nonresponse. (See the preceding paragraph for an explanation of main family nonresponse data records.) The other type of mover-out nonresponse individual has either moved out, but not to an institution, or died. Later waves of data contain zeroes, as described above for main family nonresponse, unless they subsequently rejoined a responding family or were selected for recontact.

The data are released as one file, which includes not only those individuals with nonzero data records in the current data collection year (i.e., current response plus mover-out nonresponse), but also all other individuals-those who have zero data records for the current year (i.e., current year main family nonresponse and all nonresponse of either kind from earlier waves.

83.How is Head defined in the PSID? 
Within each wave of data, each FU (family unit) has one and only one current Head. Originally, if the family contained a husband-wife pair, the husband was arbitrarily designated the Head to conform with Census Bureau definitions in effect at the time the study began. The person designated as Head may change over time as a result of other changes affecting the family. When a new Head must be chosen (see conditions for selecting a new Head below), the following rules apply:
The Head of the FU must be at least 16 years old and the person with the most financial responsibility for the FU. If this person is female and she has a husband in the FU, then he is designated as Head. If she has a boyfriend with whom she has been living for at least one year, then he is Head. However, if the husband or boyfriend is incapacitated and unable to fulfill the functions of Head, then the FU will have a female Head.
84.Who are the Husbands of Heads? 
Husbands of Heads are extremely rare in the study. Early on, an FU might have a Female Head and a Husband of Head (instead of Head and Wife) if the Head was incapacitated in some way. (He may be still in the FU or he may have moved to an institution.) There are also a few cases where the female half of a married couple insists on being the Head, or where the male half of a married couple is adamant about not wanting to have his information included in the study. A Husband of Head has the Relationship to Head code 9 or 90.
85.Are cohabitors treated differently from legally married couples? 

In the PSID, an opposite sex romantic partner who has moved into an FU less than 1 year prior to the interview is labeled a boyfriend or girlfriend (code 88) in that first wave that he or she appears in the study. If the cohabitor has moved in at least one year before the interview, the couple will be coded as Head and "Wife" (code 22 from 1983 on). In the next wave, if the boyfriend or girlfriend is still living in the FU and the couple is still unmarried, they are recoded as Head and "Wife" (That is, a male head will remain head but his girlfriend will be labeled "Wife" or a Female Head will become "Wife" while her boyfriend will become Head).

Boyfriends and girlfriends are treated like family members who are not Heads or Wives/"Wives" — considerably less information is obtained about them. In waves since the late 1970s, information typically gathered for Wives has been gathered as well about "Wives".

Starting in 1983, the Relationship to Head (RTH) code allowed for differentiation between legal Wives and long-term female cohabitors. However, first year cohabitors can be detected prior to 1983 with a little bit of work. For example, their RTH would be 8 (nonrelative), their gender would be opposite that of Head's, and in subsequent years they may become Wives or Heads, while the Head would stay as Head or become a Wife. Anyone fitting this pattern can be decisively identified as a cohabitor. PSID has not distinctively labeled same sex cohabitors.

86.Why are data available as "Packaged" if they are also in the Data Center? 
Before the Data Center was created, PSID data were distributed as "packaged" files. Because some users prefer the packaged files, we continue to provide data in this format.
87.How long are my data available for download after they are created? 
Data files are deleted from our servers when they are 7 days old. After that, you can re-create your data file by logging into the Data Center and selecting "Previous carts".
88.How does the amount of data collected in each wave vary by family unit members? 
In general, a substantial amount of detailed data is collected for the Head, and Wife/"Wife" if present. Considerably less detail is collected for other family unit members (OFUMs).
89.What data are available in the area of housing? 
The PSID collects many data elements about housing, including housing type, characteristics, ownership, tax, insurance, etc. A list of such items collected in each wave is available here.
90.Where can I obtain information regarding release dates for files? 
File release information is available through the News section of our website.
91.How does the PSID distinguish between main and secondary jobs in the data files? 

Up through the 2001 interviewing year, the PSID distinguished between Main and Extra jobs. Someone could not have an Extra job unless he/she held a Main job during the same time period. The extra job must be held simultaneously with the main job. We made this distinction between main and extra jobs throughout. If two (or more) employers overlapped, the interviewer was supposed to ask which was the main one during that time and note in an open ended question the overlap and the hours and earnings of both jobs. Then this overlap period was to be included in the extra job sequences (BD82-BD106/CE74-CE98). Those who are only temporarily laid off are still employed at a main job and, therefore, could have an extra job during that time period. However, those who are unemployed, whether looking or not, have no main job employer during the time in question. Hence, any small job they may have is considered a main job--since it's the ONLY job. Use the month strings and dates of beginning and ending employment in the work history to tell whether time at B/D72-74a or C/E64-66a is temporary layoff or unemployment.

Beginning with the 2003 interviewing year, the PSID dropped the main vs. extra job distinction as defined above. Jobs are now classified as "current main job", "most recent main job" or "other" job. If someone reports 2 or more current jobs, or 2 or more recent jobs that ended at the same time, the interviewer asks which job he/she considers his/her main job. That one is listed as the current (or most recent) main job. Any other job is listed as an "other" job. A job can be an "other" job even when it does not overlap with a current or most recent main job. This situation could arise, for instance, when someone reports two jobs, with the current main job beginning before the old ("other") job ended.

92.What information about physical and mental health is collected by the PSID? 
The PSID contains a wealth of information that can be used to study the health of Americans and their family members. Information collected in the main interview is summarized here. Health information collected in the Child Development Supplement is summarized here.
93.How has the occupation-industry code classification system changed? 
The PSID used a one-digit occupation code, and later a two-digit, until 1981 when the three-digit 1970 Census code became standard for the main jobs of employed Heads and Wives. It was also used for the most recent jobs held by Heads and Wives who were currently unemployed and looking for work and for any job held in 1980 by a Head or Wife who was currently retired or no longer in the labor force. Starting in 2003, all occupation-industry data has been coded using the three-digit 2000 Census code. A retrospective coding project used the 2000 Census to code first occupation and industry of all Heads and Wives as of 2003 and that of their fathers and mothers.
94.In some cases there are discrepancies from wave to wave for the age of the individual. Why is this? 
Ages of individuals are asked and reported in each wave of the study. But interviews are seldom taken exactly twelve months apart for the same family from wave to wave. In fact, a family responding early in the interviewing period one year might respond late in the next year’s interviewing period, with 18 or more months between interviews for annual interviews (from 1968-1997). Conversely, a late responder in one wave could be an early responder in the next wave. Since the PSID transitioned to biennial interviewing (1999 through the present), the age gap can widen even further. Because of interview dates, there is a good possibility that an individual appears to have aged excessively or not at all. Also, individuals’ ages or birthdates can be misreported. Consistency checks for age discrepancies have always been done internally, but they are not altered if it cannot be determined which age is correct.
95.How do I cite my use of PSID data? 
PSID reminds data users to cite the data and acknowledge our funding source in all publications using the data.

Citation: Panel Study of Income Dynamics, public use dataset [restricted use data, if appropriate]. Produced and distributed by the Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI (year data were downloaded).

Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.

Effective May 25, 2008, anyone submitting an application, proposal, or progress report to the NIH
must include the PubMed Central reference number (PMCID) or NIH Manuscript Submission
reference number when citing applicable articles that arise from their NIH funded
research: In consideration of this policy, PSID
requests that all journal articles based on analysis of PSID data or its supplements (either public
or restricted-use) receive a PubMed Central reference number (PMCID). Journal articles must be
submitted to PubMed Central to receive a PMCID. The method of PubMed Central submission and
Investigator responsibility for submission depend on the journal and its publisher:

  1. Some journals automatically submit published articles to PubMed

  2. Some journal publishers may submit the articles to PubMed Central automatically or
    upon request by the author:

  3. If neither the journal nor the journal publisher will submit the article to PubMed
    Central, the Investigator will be responsible for the submission. For detailed instructions
    on the process of submitting a journal article to PubMed Central, please see the NIH

Researchers with PSID restricted-use contracts should include PMCIDs in their list of PSID
publications submitted in biennial reports.

Researchers using PSID public-use data should send citations based on PSID publications you
have authored which have PMCIDs to
96.When is data collected for each wave of the study? 
The interview period (field season) is roughly between March and November, with a few years being exceptions and going into December. If a user is interested in when a specific interview was conducted, there is a variable in the dataset (Date of Interview) which indicates month and day of interview.
97.How often are main interview data collected for the PSID study? 
Between 1968 and 1997, data were collected every year. Starting in 1999, the PSID collected data biennially (i.e., every other year). All waves of data starting with 1968 are available on the website, with each wave's public release file being posted on the website as soon as editing and processing can be completed.
98.Why does the PSID provide weights for analysis? 
The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples. Both samples are probability samples (i.e., samples for which every element in the population has a known nonzero chance of selection). Their combination is also a probability sample. The combination, however, is a sample with unequal selection probabilities, and as a result, compensatory weighting is needed in estimation, at least for descriptive statistics. Weight adjustments are also needed to attempt to compensate for differential nonresponse in 1968 and subsequent waves. Weights supplied on PSID data files are designed to compensate for both unequal selection probabilities and differential attrition.

In 1997, the Panel Study of Income Dynamics (PSID) underwent several important design changes that would affect weighting. Leading these changes was a roughly 1/3 reduction in the number of PSID Core families that will be eligible for continuous longitudinal data collection. A second important change to the 1997 PSID was the addition of a nationally representative sample of immigrant households and individuals that would not be eligible for PSID under the original 1968 sample recruitment and sample family "following rules". The 1997 data collection year also began the transition to every second year data collection for PSID. Finally, the 1997 PSID data collection included a special supplemental study of children age 0-12 in PSID Core and Immigrant Supplement families. Additional documentation describing the weights is provided on the documentation page.
99.What variables should I use for complex sample survey variance estimation? 
Variables ER31996 and ER31997 are used for computing complex sample design corrected standard errors/variance estimates via the Taylor Series Linearization or Repeated Replication methods. These variables may be used with a variety of software programs that incorporate the complex sample design into variance estimation, including Stata, SAS, Sudaan, SPSS and others. The Sampling Error Stratum variable (ER31996) may be specified as the "Stratum variable" in the design specification and the Sampling Error Cluster variable (ER31997) may be specified as the "Cluster Variable". Sampling error estimation in design-based analysis of the PSID data can be found here.
100.Why do some cases exist where there is data available for certain variables, but the family weight is equal to zero? 
These are families that contained no sample members. These are not mistakes in the data, but rather show cases where information was gathered about individuals not directly linked to a sample member. The PSID purposely followed some nonsample individuals, e.g., the nonsample elderly (1990-1996), nonsample parents (1994-2003). In some cases, families are response and contain only followable but nonsample individuals and therefore all the individual weights and thus the family weight for these cases are zero.
101.Why does fertility appear to be higher in the PSID than in the US for Black individuals in some birth cohorts? 
A modest upward distortion in the weighted estimates of Black individuals with children has been identified in PSID, beginning in the late 1990s, for selected cohorts. We recommend analysts use caution in estimating with the PSID beginning in 1997 the percentages with children in the household (and related statistics such as childlessness and fertility) for Black women born in the late 1960s and in the 1970s, and for Black men born in the late 1960s, 1970s, and early 1980s. This distortion may also affect estimates of multigenerational Black families in later years. Analysts who are not addressing questions of this type can reasonably ignore this concern. When estimating multivariate models with fertility-related outcomes for any year, we recommend that analysts include as a control variable the CDS eligibility indicator available on the 1997 Individual File (ER33418). Some analysts may wish to post-stratify the assigned PSID weights for Black individuals to the Current Population Survey (CPS) totals by presence of a child under the age of 13. PSID Technical Series Paper 16-01 provides additional details.