|The PSID user manual provides historical context and basic design features of the PSID. There are several
tutorials which provide step-by-step instructions on downloading and analyzing the data in a variety of ways. |
The Data Center is the most popular means for obtaining PSID data, and it delivers about thousands of customized data
files to researchers and quantitative social science students each year. The Data Center is fully automated and allows for user-specified
subsetting criteria when downloading and merging data. Data can be generated in a variety of formats including ASCII, SAS, SPSS, and Stata.
The Data Center provides automatic and customized merges of files. For the analyst who prefers to write their own programming code to merge data downloaded from our zip packages, sample SAS and SPSS programs have also been prepared to assist users with creating cross-year analysis files.
Each family unit in a specific wave is assigned a unique "Family Interview (ID) Number" valid for that wave only. In addition, each family also has a "1968 Family Identifier", also known as the "1968 ID". This is the Family Interview (ID) Number that was assigned to the original family in the 1968 interviewing wave. When sample members in any family move out and establish their own household, we interview them (these families are called "splitoffs", in the first year they are formed). These new "splitoff" families have the same 1968 ID as the family they moved out of, and keep that same 1968 ID each year. All families with the same 1968 ID contain at least one of the original members from the 1968 family or their lineal descendents born after 1968.
For each family, the family ID number will most certainly vary from year to year. Yearly IDs are assigned based on the order in which interviews are received--the first interview in from field is numbered 1, the second, 2, and so on. This means it's very unlikely that a family with the Family ID Number 1234 in one year will get the same Family ID Number the next year, or any other year.
A new Head is selected if any of the following conditions apply:|
- last year's Head moved out of the FU (family unit), died or (in some cases) became incapacitated; or
- a female Head has gotten married; or
- this is a splitoff family. (Note that this new head may have been the head of the family this new family split off from)
To tell the current Head and Wife from mover-out Head and Wife, use the Sequence Number (SN) from the individual file. The current Head will always have SN=1, the current Wife/"Wife" (if there is one) will always be SN=2. A mover-out Head or Wife will have a SN in the range 50-89, depending on the move-out circumstances. The SN allows you to identify an individual's status with regard to the family unit and determine family composition change. It's important to understand family composition change to avoid spurious correlations in a longitudinal analysis where you are looking at variables pertinent to the same person(s) over time.
|The easiest way to do this is by visiting the PSID Data Center which will create a customized dataset for you automatically.
Instructions for creating Head/Wife file from an individual file by writing your own programming code:
To create a single year Head/Wife file: Select individuals with Relationship to Head of "Head" (a code value of 1 for 1968-1982; code 10 from 1983 onward) and with values for Sequence Number in the range 1-20. The reason for using the Sequence Number variable is that non-response movers out have relationships to the PREVIOUS YEAR's Head, so two individuals within one family may have relationships of Head. One, however, is the real, current Head; the other is a mover out. (The type of mover-out can be determined from the value for Sequence Number. Refer to the individual file codebook for details.) To illustrate the importance of Sequence Number, assume that in the last wave we have an elderly married couple. He is the Head and she is the Wife--Sequence Number=1 and Relationship to Head=10 for him, Sequence Number=2 and Relationship to Head=20 for her. When we find them for the new interview, he has died and she has become the new Head--his Sequence Number=81 and Relationship to Head=10, her Sequence Number=1 and Relationship to Head=10. All the family data items about Head in the current wave refer to HER, not to him. Information about his income, etc. is located in OFUM (other family unit members) variables only. Similarly, to subset Wives or "Wives" in a current wave--select Relationship to Head=20 or 22 and Sequence Number=1-20.
To create a cross-year Head/Wife file: These concepts can be expanded to subset persons who have been Heads over a period of years--the yearly values for Sequence Number must be 1-20, and 1 or 10 for Relationship to Head. As a corollary, to select individuals who have been either Heads or Wives/"Wives", yearly Sequence Numbers must equal 1-20 and yearly Relationships to Head must be in the range 1, 2, 10, 20, or 22. Once that subset is made and family data are merged, information about an individual can be found in Head variables (Head's work hours, Head's labor income, etc.) when his or her Relationship to Head=1 or 10. When Relationship to Head is 2, 20, or 22, then her information is found in variables about the Wife/"Wife".
|Select only current heads (sn=1 and rth=10) from the individual file for the wave in question. Then, if head's moved in/out indicator=1 and month moved in/out=0, it's a splitoff. Otherwise it's a main family.|
The combination of the 1968 ID and the person number uniquely identify each individual.
To identify an individual across waves use the 1968 ID and Person Number 68 Summary Variables ER30001 and ER30002. Though you can combine them uniquely in many ways we find that many researchers use the following method:
(ER30001 * 1000) + ER30002
(1968 ID multiplied by 1000) plus Person Number 68
The Cross Year Individual File has records for every person who has ever lived in a PSID study family (including some who moved out just before the initial interviewing year for each part of the sample, or were institutional in the first interviewing year; see the codebook for ER30002). Each study individual has a record for each study year; years when that individual was not listed in a study family are zero-filled. The file is organized by ID68, PN, and year.
In addition to those variables, the file contains individual-level information such as YEARID, SN, RTH, AGE, SEX, birthdates, move-in and move-out dates, follow status, type of individual, why non-response, several health insurance variables, variables indicating eligibility for supplementary studies such as CDS and DUST, and individual level longitudinal and cross-sectional weights.
It is essential to use the latest version of the file for analysis. The entire file is regenerated for every wave’s release not only to add the latest wave’s info, but to make corrections based on information received in the latest wave’s interviewing. These corrections are often to in information such as birthdate or RTH, and may affect several waves of data.
Most importantly, however, we also make corrections to ID68 and PNs, based on new information. Sometimes we discover that a person we believed to be a separate individual is actually the same as a family member we already know about, with a different person number. If the two individuals were response in different waves, we can combine the information into one record, keeping the record for one PN and deleting the record for the other. In other cases, we might learn that someone we thought was the biological child of a sample member actually isn’t. This would necessitate a change in PN from the "born-in sample" range (030-169) to the 170-and-up range, and also a change in follow status, since on the new information the child would no longer be sample. In another instance, we found that a woman in the immigrant sample with PN 170 was actually a spouse who had moved out prior to the first year of immigrant interviewing (1997 for this case), so should have had PN 227 (a PN with special meaning, see the codebook for ER30002).
Currently, we uncover about 100 of these person-number fixes each wave. So failure to use the latest cross-year individual file may result in errors in both data and analysis.
It depends on the situation.
- For persons who are mover out deceased, some OFUM (other family unit member) information is collected for the wave they are reported to have died.
- For persons moving out to an institution, some OFUM information is collected during the wave they are reported to have moved out from the family.
- For persons moving out to another household but no interviews are conducted with the new family unit, some OFUM information is collected the wave they are reported to have moved out.
- For persons already in institutions, no new information is collected.
- For persons who attrited from the study, no new information is collected. However, a large recontact effort was initiated in 1992.
- For persons not yet born or not yet appearing in the study, no information is collected that wave.
Beginning with the 1999 wave, when the PSID switched from annual to biennial interviews, the following rules for movers out apply only if the person moved out in the calendar year before the interview. For example, in the 2007 data, there will be information for movers out on or after 1/1/2006, but no information for movers out before that date.
The cross-year index can help you identify comparable variables across the years.
You can also look at the ‘Years Available’ section of each variable’s codebook entry for a year-by-year listing of when that variable is available in the data center.
A missing data value is either identified as such (value=9) or an imputed value is assigned in lieu of a missing data code. If an imputed value is assigned, an associated "accuracy code" variable describes the nature of the assignment.
You will need to look at the 1968 family interview number available in the individual-level files (V30001 and ER30001 in 2007). |
SRC sample families have values less than 3000.
SEO sample families have values greater than 5000 and less than 7000.
Immigrant sample families have values greater than 3000 and less than 5000. (Values from 3001 to 3441 indicate that the original family was first interviewed in 1997; values from 3442 to 3511 indicate the original family was first interviewed in 1999.)
Latino sample families have values greater than 7000 and less than 9309. (Values from 7001 to 9043 indicate the original family was first interviewed in 1990; values from 9044 to 9308 indicate the original family was first interviewed in 1992.)
In 1990 the PSID added 2,000 Latino households consisting of families originally from Mexico, Puerto Rico, and Cuba. But while this sample did represent three major groups of immigrants, it missed out on the full range of post-1968 immigrants, Asians in particular. Because of this crucial shortcoming, and a lack of sufficient funding, the Latino sample was dropped after 1995, and a sample of 441 post-1968 immigrant families was added in 1997. In 1999, an additional 70 families were added in for a total of 511 immigrant families as of 1999. These families are included on the files along with the core PSID families
Information on the Immigrant Sample is available in the 1997 and 1999 main interview documentation.
Variables from supplemental files are not yet in the index. We plan to add these variables to the index in the future.
Some supplemental files were created only for sub-samples. For example, the Disability and Use of Time Supplement (DUST) only collects information on Heads and Wives of a certain age.|
|The Data Center automatically includes all appropriate identification variables for your file.|
Users often want to look at data from the "same" family in adjacent waves. It is important to understand that there is no absolute definition of "same" family. Families are made up of individuals who may move in or out of study families from wave to wave. It is up to the user to decide what he or she means by "same" family. The user may want to restrict this definition to option 1) absolutely no changes in the composition of the family since the previous wave. All the individuals that were in the prior wave are still in the current wave - no one has moved in and no one has moved out. Alternatively, the user may want define "same" family as option 2) those who have the same Head in both waves.
In order to subset those cases which the user has defined as "same" family, he or she will find the Family Composition Change variable most useful. The Family Composition Change variable indicates the degree of change in this family since the prior wave's data collection. For option 1, the user would subset the families in the current wave where Family Composition Change variable = 0. For option 2, the user would subset the families in the current wave where Family Composition Change variable in (0,1, 2).
For 2007, for example, the Family Composition Change variable is ER36007.
|The Child Development Supplement (CDS) is one research component of the Panel Study of Income Dynamics (PSID), a longitudinal study of a representative sample of U.S.
individuals and the families in which they reside. Since 1968, the PSID has collected data on family composition changes, housing and food expenditures, marriage and fertility
histories, employment, income, time spent in housework, health, consumption, wealth, and more.
While the PSID has always collected some information about children, in 1997, PSID supplemented its main data collection with additional information on 0-12 year-old children and their parents.
The objective was to provide researchers with a comprehensive, nationally representative, and longitudinal data base of children and their families with which to study the dynamic process of early human capital formation.
The CDS-I successfully completed interviews with 2,394 families (88%), providing information on 3,563 children. In 2002-2003, CDS re-contacted families in CDS-I who remained active in the PSID panel as of 2001 for CDS-II,
and again in 2007-2008 for CDS-III. A new cohort of the CDS was begun in 2014.
By nature of the CDS being a supplement to the PSID, the study takes advantage of an extensive amount of family demographic and economic data about the CDS target child's family, providing more extensive family data
than any other nationally-representative longitudinal survey of children and youth in the U.S. In addition, the PSID-CDS data are "intergenerational" in structure with information contained in several decades of data about multiple
family members. This rich data structure allows analysts a unique opportunity to fully link information on children, their parents, their grandparents, and other relatives to take advantage of the rich intergenerational and long-panel
dimensions of the data.
Within the context of family, neighborhood, and school environments, CDS studies a broad array of developmental outcomes including (but not limited to) physical health, emotional well-being, intellectual achievement, and social relationships with family and peers. These outcomes are measured through reliable, age-graded assessments of cognitive and behavioral development and health status indicators obtained from the primary caregiver, a secondary caregiver, the elementary school teacher (for the younger children), and the sample children/youth themselves; anthropometric measures of height and weight of the sample children/youth; a comprehensive accounting of parental (or caregiver) time inputs to children/youth as well as other aspects of the way the children/youth spent their time; and other-than-time use measures of other resources for example, the learning environment in the home (using the HOME Scale measures), school resources, as reported through the National Center for Education Statistics Common Core of Data, and decennial-census-based measurement of neighborhood resources. The multi-level, interdisciplinary, and longitudinal nature of the research design facilitates analysis of the relationships between these developmental measures and changes in family structure and living arrangements, neighborhood economic and social conditions, and school resources and programs.
|The PSID user manual provides historical context and basic design features of the PSID. There are several
tutorials which provide step-by-step instructions on downloading and analyzing the data in a variety of ways. |
The Data Center is the most popular means for obtaining PSID data, and it delivers about thousands of customized data
files to researchers and quantitative social science students each year. The Data Center is fully automated and allows for user-specified
subsetting criteria when downloading and merging data. Data can be generated in a variety of formats including ASCII, SAS, SPSS, and Stata.
|CDS questionnaires are located at the questionnaires and supporting documents page.|
The CDS-TA sample was drawn from PSID families with children 0-12 years in 1997.
The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples.
Both the CDS-TA and PSID samples are probability samples (i.e., samples for which every element in the population has a known nonzero chance
of selection). Their combination is also a probability sample. The combination, however, is a sample with unequal selection probabilities, and as a result,
compensatory weighting is needed in estimation, at least for descriptive statistics. Weight adjustments are also needed to attempt to
compensate for differential nonresponse across waves. Weights supplied on CDS and TA data files are designed to compensate for both unequal selection probabilities and differential attrition.
In the 2002 and 2007 CDS demographic files, you will find a set of indicator variables for each module that specify (a)
if a case was eligible for that module and (b) if a record exists for that case in the corresponding data file.
These variables are helpful to merge onto your Data Center data request if you are merging variables from multiple CDS modules.
The sample weight in the Demographic file is adjusted only for the non-response in the main module, Primary Caregiver (sections A-H, J).
The module indicator variables, however, will inform you about item missing data across modules. It is up to you to then decide on your preferred
approach for addressing item missing data that results from differential response rates across modules (for example, you may leave it as missing,
impute scores, etc). The TA data files contain wave-specific sample weights.
More documentation on the CDS and TA weights can be found at the documentation page.
Every individual in the PSID - including the children - has both an "ID68" (1968 Family Identifier - ER30001 in 2007, for example) and "PN" (Person Number- ER30002 in 2007, for example) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers to find information about the CDS targeted child and caregivers in the PSID data files. Background information about the CDS target child, such as birth date, sex, and relationship to the PSID family household head can be obtained from the PSID individual and sampling variables files. Use the ER30001 and ER30002 combination to select the PSID variables for just the CDS target child sample, or, when you get to the "Output Options" page in the Data Center, after selecting the variables you want, select "CDS Children" at the bottom.
There are two PSID family history files that may be of particular interest to CDS users: the Childbirth and Adoption History File and the Parent Identification File.
Childbirth and Adoption History File:
The Childbirth and Adoption History File is specifically designed to facilitate access to detailed information collected since 1985 regarding histories of childbirth and adoption. Variables on this file include the identifiers for each parent and child, month and year of birth for both parent and child, birth order, birth weight and date of death for a child, year of most recent report and number of births/adoptions, etc. Data on this file are structured in a one-record-per-event format, with each record representing a specific childbirth or adoption event.
Parent Identification File:
The Parent Identifier File synopsizes information collected from various sources since the 1983 wave of PSID about parent-child relationships. This file consists of identifier variables that link children with their parents. The file is intended to be used to facilitate linking children's and parents' data records from the Individual File. Linkages can be done from either the child's or a parent's standpoint.
There are a large number of variables in the PSID that can be used along with CDS.
Demographic, health, economic, and other family data about PCG (primary caregivers) and OCG (other caregivers) can be found in the PSID data files. Every individual in the PSID has
both "ID68" (1968 Family Identifier - ER30001) and "PN" (Person Number- ER30002) that combine to uniquely identify that individual. As a user of the CDS data, you can use these identifiers
to find information about the CDS targeted child and caregivers in the PSID data files. These identifier variables are available through a Child to Caregiver Map.
The "child to caregiver map" provides "1968 INTERVIEW NUMBER" (ID68) and "PERSON NUMBER 68" (PN) for CDS individuals.
These CDS individuals are the target child, the target child's primary caregiver (PCG) and the target child's other caregiver (OCG),
if one exists. Missing data means that the child did not have an OCG for the CDS interview year. |
All CDS files, by default, contain variables ER30001 (1968 INTERVIEW NUMBER) and ER30002 (PERSON NUMBER 68). Since these variables are also in the map file, the map file can be used to merge PCG and OCG data from PSID Individual data to CDS Child level data in a two step process.
There are two steps to locating data for siblings in the CDS-II data files:|
In the Demographic Data File, there is a sibling indicator variable that tells you if a CDS target child had a sibling who also participated in the CDS-II data collection.
Automatically appended to your data download is the Family Interview or Identification number for the corresponding PSID main interview. This variable uniquely identifies the family.
Using these two variables, you can locate data on a wide range of information about the target children and their siblings in the CDS.
See also the codebook explanation text for the family identification number. There is a variable for any year in the PSID in both
the individual and family files.
|In CDS-I, height of the child was measured by the interviewer and weight was reported by the parent. In CDS-II and CDS-III, both height and weight were measured by the interviewer.|
|The Behavior Problem Index was originally developed by James Peterson and Nicholas Zill from the Achenbach Behavior Problems Checklist to measure in a survey setting the incidence and severity of child behavior problems. The BPI scale is based on responses by the primary caregiver as to whether a set of 32 problem behaviors is often, sometimes, or never true of the targeted child.|
|These items are then divided into two subscales: 1) a measure of externalizing or aggressive behavior and 2) a measure of internalizing, withdrawn or sad behavior. The User Guide specifies the individual items that map into the internalizing and externalizing subscales.|
|We performed a confirmatory factor analysis on our two expected subscales. The results showed that the items grouped into these two factors quite readily, with one variable overlapping on both subscales, as did in CDS-I, and two variables not loading at all. We constructed an overall or total BPI score, using all 32 items, as well as separate scores for each of the two subscales, internal or withdrawn and external or aggressive. Before scoring, the individual items are recoded such that a score of "1" becomes "0" and a score of "2" or "3" become a "1". Scores for the total BPI and Externalizing and Internalizing are sum scores. Higher scores on these measures imply a greater level of behavior problems. Cases were included if they had data approximately 75% valid data on the variables contributing to the BPI Indices.|
The Home Observation for Measurement of the Environment-Short Form from the Caldwell and Bradley HOME Inventory is used as a measure of cognitive stimulation and emotional support that parents provide to their children. The particular items used in the PSID Child Development Supplement were taken directly from the National Longitudinal Survey of Youth, Mother-Child Supplement so that the scales would be as similar as possible. The HOME-SF items include both parent/caregiver-reported items and interviewer observations of the home and neighborhood environment. The HOME-SF is divided into four parts:
- Infant/Toddler (IT) HOME, designed for use during infancy (birth to age three);
- Early Childhood (EC) HOME, designed for use between 3 and 6 years of age;
- Middle Childhood (MC) HOME, for use between 6 and 10 years; and
- Early Adolescent (EA) HOME, designed for use from 10 to 15 years old.
Additional information about the HOME-SF in the CDS can be found in the CDS User Guide.
We have included three scores for HOME-SF for each age module appropriate for CDS-II and CDS-III data: 1) a total raw score, 2) an emotional support subscale raw score, and 3) a cognitive stimulation subscale raw score. The total and subscale raw scores for the HOME-SF are a summation of the recoded individual item scores and varies by age group, as the number of individual items varies according to the age of the targeted child / youth.
The Woodcock-Johnson Psycho-Educational Battery-Revised (WJ-R) provides a normed set of tests for measuring cognitive abilities and academic achievement. In the CDS-I, CDS-II and CDS-III, we selected three subtests as a measure of reading and match achievement: the Letter-Word, the Passage Comprehension, and the Applied Problems tests (the Calculation test was additionally administered in CDS-I. These scales can be used individually, or in the case of the four subscales, combined to create scores for Broad Reading and Broad Math. When applicable, the Spanish version of the WJ-R (Batería-R, Form A), was used for children whose primary language was Spanish.
The Woodcock-Johnson Revised (WJ-R) tests of achievement have standardized administrative and scoring protocols. The tests are designed to provide a normative score that shows the CDS target child's reading and match abilities in comparison to national average for the child's age. The normed scores are constructed based on the child's raw score on the test (essentially the number of correct items completed) and the child's age to the nearest month. Raw scores are charted on normative tables based on the child's age and what percentile the child falls into. More information on scoring is provided in the
CDS User Guides.
|In CDS I, we included two Woodcock Johnson - Revised math-skill tests: Calculations and Applied Problems. A broad math score was constructed based on these two tests. In CDS II, we only included the Applied Problems; hence, no broad math score can be constructed - just a score for applied problems.|
File release information is available through the News section of our website. You can also sign up to have the news delivered to your
email by logging in and selecting to receive updates on the "Settings" page.
|Only one file is allowed in your cart if CDS Time Diary is selected. To add CDS Time Diary variables to your cart, you must select variables from just one file, and there cannot be any variables from other files in your cart. Time diary data are not at an individual or family level (like other data in the data center), so the data center does not "know" how to merge it.|
The public release files, which can be downloaded directly from the PSID website, contain geographic information of a more generalized nature such as region and state of residence. A collapsed version of the Beale rural-urban code is available for some years in the data center, as well as in the supplemental files located here:
These data will meet the needs of most users. Users in need of more specialized geographic
information may want to request use of the restricted PSID Geocode Match files. These files
include the identification codes necessary to link data from the PSID annual family files to
Census data. This linkage allows the addition of information regarding the characteristics
of the geographic area in which individuals and families lived (e.g., the neighborhood and/
or the labor market area) to the PSID individual- or family-level data. This should in turn
allow investigation of the effects of non-family "context" variables on family and individual
In the past, we provided selected variables from the Census in aggregated forms (i.e.,
Census Extract Files); however, we no longer support these files. In recent years, there has
been a rapid growth of external sources that provide an increasing variety of measures of
the neighborhood environment.
|Individuals who conduct scientific research and hold a full-time, permanent, doctoral-level faculty appointment and obtains the approval of their research and data protection plan through a human subjects institutional review board may request use of restricted data.|
|The following materials must be submitted:
1. Curriculum vitae
2. Research plan
3. Institutional Review Board (IRB) approval
4. Data request form
5. MiCDA acceptable use policy (AUP)
6. VDI-Data security plan
7. Institute for Social Research Confidentiality pledge Confidentiality pledge
8. After approval, a $750 non-refundable administration fee will be collected.
|Once all application materials have been submitted, review by the PSID restricted data committee usually occurs within two weeks. Once the application is approved, the contracts must be signed and submitted along with the non-refundable administrative fee. After these steps have been completed, the data are provided via secure remote access through a Virtual Data Enclave. The average processing time is between one to two months and on rare occasions as long as six months. The vast majority of delays occur as a result of contract language change requests by the requesting institution.
|Yes, more than one type of restricted data may be requested; however, the researcher must document in their research plan the need and purpose for use of more than one type of data set.|
|No. Each contract is project specific and is validated through the research plan.|
|Multiple investigators may use the restricted data if they are all involved in the same research project and named on the
research plan and on the IRB. All investigators must provide a CV, an AUP, DSP and Pledge of Confidentiality and sign the restricted data contract.|
|Yes. A PI may include investigators from other institutions who are collaborating on the research project by describing
their roles and qualifications in the research plan, and naming them in the IRB.
All investigators must provide a CV, an AUP, DSP and Pledge of Confidentiality and sign the restricted data contract.|
Graduate students must work with the restricted data under the supervision of a full-time, permanent, doctoral level faculty person at their institution. The faculty advisor is named as the Investigator on the contract and is responsible for ensuring that all confidentiality and security measures are upheld. Should a faculty advisor leave the receiving institution, they are responsible for notifying the PSID of this change. Graduate students must then obtain a new faculty advisor who will be named on the contract until their research project is completed.
|Contracts are limited to three years with extension paperwork due every six months, as requested by the PSID contract administrator.|
|The faculty adviser is named as the Investigator on the contract, and assumes responsibility for everyone on the project accessing the PSID restricted data.
They are also responsible for ensuring the contract paperwork is kept current. This includes returning a complete Request for Extension form to the PSID
every 180 days while the contract remains active. The faculty adviser must also be responsible for ensuring that security measures are kept in force for all
restricted data work, and for seeing that the contract closure occurs once the dissertation has been published.|
Multiple contracts for a number of research projects are permissible as long as an individual contract is submitted for each research project. A contract will be processed and established for each research project requested. Each contract has its own paperwork, contract and fee.
There are external IRB agencies that researchers may use to fulfill this application requirement. These agencies do charge a fee for their services, and
the researcher must work directly with them to obtain approval. Contact the help desk for further assistance
|Researchers may forward the email approval to the PSID.|
|The researcher is responsible for maintaining active IRB approval during the entire course of the contract and must submit updates or renewals to the PSID.|
Everyone involved in the research project should submit a CV. Graduate students should submit their CV as well as the most recent CV for their faculty advisor.
|Researchers or graduate students are required to obtain signatures on the original contract and submit it to the PSID so it may be fully
executed through the University of Michigan. An electronic signed contract will be returned to your institution.|
|The “Representative of the Receiving Institution” is someone who is able to legally enter into negotiations with the University of Michigan.
The faculty adviser signs as Principal Investigator. Co-investigators may also sign, where applicable. There is also a Supplemental Agreement
form where Computing Support and Research Assistants may sign when appropriate, at which time the Investigator must also sign at the bottom
of the supplemental page.|
Possibly. Please contact the help desk about any concerns regarding contract language issues. It is important to understand that any requests for language modification can create significant delays in fully executing the contract and shipment of the data.
|Regarding payment of the $750 administrative fee for use of the data, there are several ways that PSID accepts payment: credit card,
check or wire/ACH (Automated Clearing House). Users will be directed by the PSID help desk to the ISR Business Office for detailed payment options.|
|Our Business Office will provide an invoice upon request. Contact information for the Business Office can be provided by the
PSID help desk. |
|The administrative fee covers contract administration expenditures, consulting time, charges for use of MiCDA enclave, among other expenses.|
The administrative fee is not negotiable because the PSID is federally funded and is required to provide the same services and keep the same regulations and guidelines for all contract holders.|
|During the three year period, the PSID will send the primary contract holder a Request for Extension every 180 days which requests updated contact information
and must be returned to the PSID in order to keep the contract active. After the three year period, or at the conclusion of the project if this occurs before three
years, access to the PSID data will be discontinued.|
|The new contract will require all the same application documents as the first contract. For ease of transition, updated or modified documents are acceptable. An updated or new IRB approval must also be submitted. Once these are submitted, a new contract must be signed and the $750 non-refundable fee also submitted. The researcher may request updated restricted data if a new version has been released since their original project began.|
Your contract is considered Out-of-Compliance. Contracts that are active and in compliance are eligible to request Restricted Data Set updates. The updated data is provided at no additional charge to these researchers; however no such updates are provided in the event that a contract is out of compliance. Also, failure to submit this documentation during a contract could jeopardize a researcher's future request for restricted data. Sometimes unavoidable circumstances can cause delays in the submission of the Extension form and PSID staff are more than willing to work with institutions to facilitate the process.
|No. Under no circumstances can the restricted data be shared with individuals who are not named on the contract.
Contracts are "project specific" - each researcher must obtain their own contract/user name and password for their
research to ensure that they maintain respondent confidentiality.
Contact the help desk to coordinate contract closure paperwork at email@example.com|
|All restricted data and derived files must either be destroyed or returned to the PSID for secured storage. Many researchers elect to return their files to the PSID for secured storage allowing them to use the data in the future. PSID Help will coordinate with a researcher the paperwork requirements and return of previously created data sets and derived files.|
All Public Release data files have been processed and edited, and should meet the research needs of all users.
Over the past several years the PSID staff, using Computer Assisted Telephone Interview (CATI) technology and companion processing software, have significantly improved the quality and reliability of the timely release of data files. We now refer to the files posted for each new wave as Public Release Data. Note that:
1. Longitudinal data are subject to revision based on the most recent information received from individuals and families. New information that we find during family composition and economic editing in one wave may require revisions to previous waves. As additional data are collected through time on our two year collection cycle, prior files may be edited in light of the new information. Both the values of the variables themselves and the relationships of individuals to the families to which they are connected may be edited. Normally such changes are made only for a small number of cases.
2. An extensive set of computed or generated variables are included in the Public Release Data. As time and resources allow we occasionally add selected new generated variables for later release.
Since the PSID data files, as with the data files from any complex longitudinal study, are subject to minor changes and subsequent updated releases, due primarily to economic and family composition editing activities, it is therefore highly recommended that users retain and save all data files that are downloaded from this site and upon which individual research analysis is dependent. Only the most current data files are retained by PSID staff for distribution.
The term "Public Release I" used to refer to files released for general public use after they have been reviewed for data quality checks and consistency in both the reported family listing and the relationships among family members (this review process is called "family composition editing").|
The term "Public Release II" was previously used to refer to files which had undergone additional data checks to correct a very small number of cases and had been formatted in a more convenient form.
Because of successive improvements in our Computer Assisted Telephone Interviewing (CATI) software that PSID began using in 1993, the quality of the Public Release I files improved in recent waves, allowing the use of these data with confidence. There is now no longer a necessity to release two versions of the Public Release files.
A reinterview family is a family unit that was interviewed in the prior wave. |
A main family is one that is the source of a splitoff family (a new study family formed by a sample member who moves out and forms his or her own family unit). In some divorce or separation situations, both resulting families will contain sample members, so both will be interviewed. We interview the first spouse we are able to contact as the main family, while the other spouse will be in the splitoff family. In the case of children leaving home, the main family is almost always the parental family.
A split-off family consists of a person or group of people (at least one of whom is a "follow" person of any age) who moved out from a main family since the prior wave's interview to form a new, economically independent family unit living in a separate housing unit. Several criteria must be met for a split-off to occur. In addition to having moved out since the prior wave, and to being 'followable', the person or group of people in general may not have moved to an institution such as college or prison or to another family unit within the panel study. Moreover, the person or group of people who have moved out and formed their own family unit must be economically independent from the family unit from which they split off. These are general rules, however, and sometimes unique situations arise that determine whether a person or group of persons becomes a split-off. For example, while moving to an institution such as college does not generally meet the criteria for becoming a split-off, if the person is working, paying their own living expenses, and paying their own educational expenses in addition to attending school, then this person could be interviewed as a split-off. The living situation and interview data for each and every possible split-off case are first reviewed before split-off status is granted. Note that a splitoff family is only designated as a splitoff in the wave in which the family is newly formed and interviewed for the first time. In subsequent waves, they are considered a reinterview family.
In the PSID study, we are attempting to learn about our sample members, and the families in which they live. Each of these families is called a family unit (FU). The FU is defined as a group of people living together as a family. They are almost always related by blood, marriage, or adoption. And they must all be living in the same HU (see below).
Occasionally, unrelated persons can be part of an FU. They need to be permanently living with the family and share both income and expenses.
Any person in a study family is a family unit member. The term "other family unit member" (OFUM) is used of members who are not the Head or Wife/"Wife".
The household unit (HU) is the physical dwelling where the members of the FU reside. It can be a house, townhouse, apartment, a room in a rooming house, even a tent or a car.
Not everyone living in an HU is automatically part of the FU. There may be other people living in the HU temporarily who do not meet the criteria of relatedness and economic integration. The PSID data is about FU Members only.
Sample Members are individuals who were living in the original FU at the time of the very first interview their lineal descendants born after 1968. (For subsequent samples, such as the immigrants, the year of the first interview serves as the base for determining who is an original sample member, and all individuals present in the family at that time qualify.)
Follow status indicates whether we are interested in continuing to interview an individual. In general, sample members are always considered Followable. Non-Sample Members can be Followable too, if they represent a population of current interest. For example, we have in the past, followed such people as Non-Sample parents of sample children who were aged 25 or younger.
You can tell who is a sample member by looking at the individual's Person Number and Follow Status. Original Sample Members who were living in the original study FU in the first year of interviewing were given Person Numbers in the range of 001-019. Any Head's Spouse in the original interviewing year who was living in an institution was given a Person Number of 020. In addition, children of the Head (and Wife if present) who were under age 25 and in an institution the first year were considered Original Sample members and given Person Numbers in the range 0021-029. All of these people are followable.
Individuals who were born into a sample family after the first interviewing year and have a sample parent are considered "born-in Sample Members" and receive Person Numbers in the range of 030-169. All born in sample members are followable.
Some individuals who qualify as sample members (because they have a sample parent) are not born into a study family, but move in later. These "Moved in Sample Members" have Person Numbers of 170 or greater and are Followable.
All other people who have ever lived in a PSID family are not sample individuals. They also receive Person Numbers of 170 or greater, but are not Followable.
Response family unit members are those residing in an interviewed family at the time of interview. Nonresponse family unit members are those not residing in an interviewed family at the time of interview; they may have attrited, not yet appeared in the study, or not yet been born by a particular wave.
The phrase "main family nonresponse" means that both the individual and his or her family have at that time become lost to our study, although either or both may reappear in the study in subsequent waves. In the wave just prior to becoming nonresponse, the individual was connected with a family interviewed by our study; thus, both family and individual data are available for that prior year, and the individual's Sequence Number at that time was 01-59. However, data were collected for neither the individual nor his or her family in the nonresponse wave. The data for the wave in which nonresponse occurs (and all subsequent waves if and until the individual reappears as a member of a responding family unit, including a recontact family) are zeroes excepting the variables for type of individual record and reason for nonresponse, and if an individual was selected for recontact, follow status and reason for following the individual.
In contrast, mover-out nonresponse individuals have left a family that was still in the study. Since such individuals were usually present in that family for at least part of the calendar year preceding nonresponse, they have some additional nonzero data for the wave in which they became nonresponse, such as part-year income information. In later waves, mover-out nonresponse individuals are treated in two ways, depending on why they left the family. Those who moved out to institutions have several variables (Sequence Number, age, sex, Relationship to Head, type of individual and reason for nonresponse) with nonzero values, although income, housework, and other individual-level variables are filled with zeroes. Eventually, such an individual may (a) become response by moving into a family or by becoming a splitoff, (b) move from the institution and remain mover-out nonresponse (shown when Sequence Number=71-89), or (c) become main family nonresponse because the family itself became nonresponse. (See the preceding paragraph for an explanation of main family nonresponse data records.) The other type of mover-out nonresponse individual has either moved out, but not to an institution, or died. Later waves of data contain zeroes, as described above for main family nonresponse, unless they subsequently rejoined a responding family or were selected for recontact.
The data are released as one file, which includes not only those individuals with nonzero data records in the current data collection year (i.e., current response plus mover-out nonresponse), but also all other individuals-those who have zero data records for the current year (i.e., current year main family nonresponse and all nonresponse of either kind from earlier waves.
|Within each wave of data, each FU (family unit) has one and only one current Head. Originally, if the family contained a husband-wife pair, the husband was arbitrarily designated the Head to conform with Census Bureau definitions in effect at the time the study began. The person designated as Head may change over time as a result of other changes affecting the family. When a new Head must be chosen (see conditions for selecting a new Head below), the following rules apply:
The Head of the FU must be at least 16 years old and the person with the most financial responsibility for the FU. If this person is female and she has a husband in the FU, then he is designated as Head. If she has a boyfriend with whom she has been living for at least one year, then he is Head. However, if the husband or boyfriend is incapacitated and unable to fulfill the functions of Head, then the FU will have a female Head.
Husbands of Heads are extremely rare in the study. Early on, an FU might have a Female Head and a Husband of Head (instead of Head and Wife) if the Head was incapacitated in some way. (He may be still in the FU or he may have moved to an institution.) There are also a few cases where the female half of a married couple insists on being the Head, or where the male half of a married couple is adamant about not wanting to have his information included in the study. A Husband of Head has the Relationship to Head code 9 or 90.
In the PSID, an opposite sex romantic partner who has moved into an FU less than 1 year prior to the interview is labeled a boyfriend or girlfriend (code 88) in that first wave that he or she appears in the study. If the cohabitor has moved in at least one year before the interview, the couple will be coded as Head and "Wife" (code 22 from 1983 on). In the next wave, if the boyfriend or girlfriend is still living in the FU and the couple is still unmarried, they are recoded as Head and "Wife" (That is, a male head will remain head but his girlfriend will be labeled "Wife" or a Female Head will become "Wife" while her boyfriend will become Head).
Boyfriends and girlfriends are treated like family members who are not Heads or Wives/"Wives" — considerably less information is obtained about them. In waves since the late 1970s, information typically gathered for Wives has been gathered as well about "Wives".
Starting in 1983, the Relationship to Head (RTH) code allowed for differentiation between legal Wives and long-term female cohabitors. However, first year cohabitors can be detected prior to 1983 with a little bit of work. For example, their RTH would be 8 (nonrelative), their gender would be opposite that of Head's, and in subsequent years they may become Wives or Heads, while the Head would stay as Head or become a Wife. Anyone fitting this pattern can be decisively identified as a cohabitor. PSID has not distinctively labeled same sex cohabitors.
Before the Data Center was created, PSID data were distributed as "packaged" files. Because some users prefer the packaged files, we continue to provide data in this format.
Data files are deleted from our servers when they are 7 days old. After that, you can re-create your data file by logging into the
Data Center and selecting "Previous carts".
In general, a substantial amount of detailed data is collected for the Head, and Wife/"Wife" if present. Considerably less detail is collected for other family unit members (OFUMs).
The PSID collects many data elements about housing, including housing type, characteristics, ownership, tax, insurance, etc. A list of such items collected in each wave is available here.
File release information is available through the News section of our website.
Up through the 2001 interviewing year, the PSID distinguished between Main and Extra jobs. Someone could not have an Extra job unless he/she held a Main job during the same time period. The extra job must be held simultaneously with the main job. We made this distinction between main and extra jobs throughout. If two (or more) employers overlapped, the interviewer was supposed to ask which was the main one during that time and note in an open ended question the overlap and the hours and earnings of both jobs. Then this overlap period was to be included in the extra job sequences (BD82-BD106/CE74-CE98). Those who are only temporarily laid off are still employed at a main job and, therefore, could have an extra job during that time period. However, those who are unemployed, whether looking or not, have no main job employer during the time in question. Hence, any small job they may have is considered a main job--since it's the ONLY job. Use the month strings and dates of beginning and ending employment in the work history to tell whether time at B/D72-74a or C/E64-66a is temporary layoff or unemployment.
Beginning with the 2003 interviewing year, the PSID dropped the main vs. extra job distinction as defined above. Jobs are now classified as "current main job", "most recent main job" or "other" job. If someone reports 2 or more current jobs, or 2 or more recent jobs that ended at the same time, the interviewer asks which job he/she considers his/her main job. That one is listed as the current (or most recent) main job. Any other job is listed as an "other" job. A job can be an "other" job even when it does not overlap with a current or most recent main job. This situation could arise, for instance, when someone reports two jobs, with the current main job beginning before the old ("other") job ended.
The PSID contains a wealth of information that can be used to study the health of Americans and their family members. Information collected in the main interview is summarized here. Health information collected in the Child Development Supplement is summarized here.
|The PSID used a one-digit occupation code, and later a two-digit, until 1981 when the three-digit 1970 Census code became standard for the main jobs of employed Heads and Wives. It was also used for the most recent jobs held by Heads and Wives who were currently unemployed and looking for work and for any job held in 1980 by a Head or Wife who was currently retired or no longer in the labor force. Starting in 2003, all occupation-industry data has been coded using the three-digit 2000 Census code. A retrospective coding project used the 2000 Census to code first occupation and industry of all Heads and Wives as of 2003 and that of their fathers and mothers.|
|Ages of individuals are asked and reported in each wave of the study. But interviews are seldom taken exactly twelve months apart for the same family from wave to wave. In fact, a family responding early in the interviewing period one year might respond late in the next year’s interviewing period, with 18 or more months between interviews for annual interviews (from 1968-1997). Conversely, a late responder in one wave could be an early responder in the next wave. Since the PSID transitioned to biennial interviewing (1999 through the present), the age gap can widen even further. Because of interview dates, there is a good possibility that an individual appears to have aged excessively or not at all.
Also, individuals’ ages or birthdates can be misreported. Consistency checks for age discrepancies have always been done internally, but they are not altered if it cannot be determined which age is correct.|
|PSID reminds data users to cite the data and acknowledge our funding source in all publications using the data.
Citation: Panel Study of Income Dynamics, public use dataset [restricted use data, if appropriate]. Produced and
distributed by the Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI
(year data were downloaded).
Acknowledgement: The collection of data used in this study was partly supported by the National Institutes of
Health under grant number R01 HD069609 and the National Science Foundation under award number 1157698.
Effective May 25, 2008, anyone submitting an application, proposal, or progress report to the NIH
must include the PubMed Central reference number (PMCID) or NIH Manuscript Submission
reference number when citing applicable articles that arise from their NIH funded
research: http://publicaccess.nih.gov/citation_methods.htm. In consideration of this policy, PSID
requests that all journal articles based on analysis of PSID data or its supplements (either public
or restricted-use) receive a PubMed Central reference number (PMCID). Journal articles must be
submitted to PubMed Central to receive a PMCID. The method of PubMed Central submission and
Investigator responsibility for submission depend on the journal and its publisher:
Some journals automatically submit published articles to PubMed
Some journal publishers may submit the articles to PubMed Central automatically or
upon request by the author: http://publicaccess.nih.gov/select_deposit_publishers.htm#b.
If neither the journal nor the journal publisher will submit the article to PubMed
Central, the Investigator will be responsible for the submission. For detailed instructions
on the process of submitting a journal article to PubMed Central, please see the NIH
Researchers with PSID restricted-use contracts should include PMCIDs in their list of PSID
publications submitted in biennial reports.
Researchers using PSID public-use data should send citations based on PSID publications you
have authored which have PMCIDs to firstname.lastname@example.org.
The interview period (field season) is roughly between March and November, with a few years being exceptions and going into December. If a user is interested in when a specific interview was conducted, there is a variable in the dataset (Date of Interview) which indicates month and day of interview.
Between 1968 and 1997, data were collected every year. Starting in 1999, the PSID collected data biennially (i.e., every other year). All waves of data starting with 1968 are available on the website, with each wave's public release file being posted on the website as soon as editing and processing can be completed.
|The PSID sample combines the SRC (Survey Research Center) and SEO (Survey of Economic Opportunity) samples. Both samples are probability
samples (i.e., samples for which every element in the population has a known nonzero chance of selection). Their combination is also a probability sample.
The combination, however, is a sample with unequal selection probabilities, and as a result, compensatory weighting is needed in estimation, at least for descriptive statistics.
Weight adjustments are also needed to attempt to compensate for differential nonresponse in 1968 and subsequent waves. Weights supplied on PSID data files are designed
to compensate for both unequal selection probabilities and differential attrition.
In 1997, the Panel Study of Income Dynamics (PSID) underwent several important design changes that would affect weighting. Leading these changes was a roughly 1/3 reduction in the number of PSID Core
families that will be eligible for continuous longitudinal data collection. A second important change to the 1997 PSID was the addition of a nationally representative sample of immigrant households and individuals
that would not be eligible for PSID under the original 1968 sample recruitment and sample family "following rules". The 1997 data collection year also began the transition to every second year data collection for PSID.
Finally, the 1997 PSID data collection included a special supplemental study of children age 0-12 in PSID Core and Immigrant Supplement families. Additional documentation describing the weights is provided on the
Variables ER31996 and ER31997 are used for computing complex sample design corrected standard errors/variance estimates via the
Taylor Series Linearization or Repeated Replication methods. These variables may be used with a variety of software programs that
incorporate the complex sample design into variance estimation, including Stata, SAS, Sudaan, SPSS and others. The Sampling Error Stratum
variable (ER31996) may be specified as the "Stratum variable" in the design specification and the Sampling Error Cluster variable (ER31997)
may be specified as the "Cluster Variable". Sampling error estimation in design-based analysis of the PSID data can be found
|These are families that contained no sample members. These are not mistakes in the data, but rather show cases where information was gathered about individuals not directly linked to a sample member. The PSID purposely followed some nonsample individuals, e.g., the nonsample elderly (1990-1996), nonsample parents (1994-2003). In some cases, families are response and contain only followable but nonsample individuals and therefore all the individual weights and thus the family weight for these cases are zero.|
|A modest upward distortion in the weighted estimates of Black individuals with children has been identified in PSID, beginning in the late 1990s, for selected cohorts. We recommend analysts use caution in estimating with the PSID beginning in 1997 the percentages with children in the household (and related statistics such as childlessness and fertility) for Black women born in the late 1960s and in the 1970s, and for Black men born in the late 1960s, 1970s, and early 1980s. This distortion may also affect estimates of multigenerational Black families in later years. Analysts who are not addressing questions of this type can reasonably ignore this concern. When estimating multivariate models with fertility-related outcomes for any year, we recommend that analysts include as a control variable the CDS eligibility indicator available on the 1997 Individual File (ER33418). Some analysts may wish to post-stratify the assigned PSID weights for Black individuals to the Current Population Survey (CPS) totals by presence of a child under the age of 13.
PSID Technical Series Paper 16-01 provides additional details.|