Friday, May 9
Tutorials | Overview | User Guide | FAQ | Contact/Help | News | Data Quality | File Structure | CDS R/D | Sponsorship | More...

PSID File Structure and Merging PSID Data Files

***Note: The PSID data center automatically merges PSID and CDS data. The instructions below are intended for informative purposes only and will help you understand the structure of the PSID data.***

Contents

  • A. PSID File Structure
  • B. Assembling A Cross-Year Family-Individual File
  • Method 1 -- creating a 1990-1992 family-individual file
  • SAS setup
  • SPSS setup
  • Method 2 -- creating a 1968-1992 family-individual file
  • SAS setup
  • SPSS setup
  • C. Assembling A Cross-Year Family File
  • D. Single-Year Family Files And Single-Year Family-Individual Files
  • E. Additional Help

  • This information is presented in four separate sections: a) PSID file structure, b) two methods of assembling a cross-year family-individual file, c) assembling a cross-year family file, and d) single-year family files and single-year family-individual files.


    A. PSID File Structure

    The traditional cross-year family-individual file used for the PSID through 1989 has been replaced by separate single-year family files and a cross-year individual file. For instance, through the 1992 data collection year there are 25 single-year family files containing family-level variables collected in each wave of the study from 1968 through 1992 and a single cross-year individual file containing all individual-level variables collected from 1968 to 1992 for both respondents and non-respondents. Thus the "main" PSID data files include two types of data files -- a) single-year family files and b) a cross-year individual file.

    1. The single-year family files

    Each single-year family file contains one record for each family interviewed in the specified year. The twenty-five single-year family files (one for each year of the study from 1968 through 1992) contain all of the family-level variables collected in each wave. The records in each file are identified by the family Interview Number for that year, in sort order by that variable, and contain the family-level variables for that year.

    Annual Family Files -- Contain Family-Level Data Collected In A Single Wave
    +-----+
    |68fam|
    +-----+
    format:     family data 1968
    records:    one record for each family in 1968
    ids:        1968 family Interview Number
    sort order: 1968 family Interview Number
    N:          4,802 families
    MB of data: 3.4 MB
    +-----+
    |69fam|
    +-----+
    format:     family data 1969
    records:    one record for each family in 1969
    ids:        1969 family Interview Number
    sort order: 1969 family Interview Number
    N:          4,460 families
    MB of data: 4.4 MB
    .
    .
    .
    .
    +-----+
    |92fam|
    +-----+
    format:     family data 1992
    records:    one record for each family in 1992
    ids:        1992 family Interview Number
    sort order: 1992 family Interview Number
    N:          9,829 families
    MB of data: 22.0 MB
    

    2. The cross-year individual file

    The cross-year individual file contains one record for each person ever in a PSID family from the beginning of the study through the current year. The records in the cross-year individual file are identified by 1968 family Interview Number (V30001) and Person Number (V30002) and are in sort order by these variables. The file also contains the Interview Number of the family with which the person was associated in each year after 1968 and all other individual-level variables from 1968 through 1992.

    1968-1992 Cross-Year Individual File -- Contains All Individual-Level Data Collected From 1968-1992
    +--------+ +-----+-----+   +-----+
    |sortid's| |68ind|69ind|...|92ind|
    +--------+ +-----+-----+   +-----+
    format:     individual data for 1968-1992
    records:    one record for each person ever-in through 1992
    ids:        1968 family Interview Number and Person Number
    sort order: 1968 family Interview Number and Person Number
    N:          50,915 persons
    MB of data: 91.7 MB
    


    B. Assembling A Cross-Year Family-Individual File

    Few analysts will want to analyze the full data file for all persons ever in the study, and so your first step is to decide which variables, individuals and years of data interest you.

    The basic principle in merging data from a single-year family file with data from the cross-year individual file involves matching the two files using annual Interview Numbers for the year in which the family variables were collected. Thus it is critical that the annual Interview Number variables be retained as part of any subsetted data, either family or individual. The chart below shows the family Interview Number variables for the single-year family files and cross-year individual file.

    Family Interview Numbers in Single-year Family Files and in Cross-year Individual File
    ______________________________
    ------------------------------
    Year    Family  Individual
    File    File
    ------------------------------
    1968        V3  V30001
    1969      V442  V30020
    1970     V1102  V30043
    1971     V1802  V30067
    1972     V2402  V30091
    1973     V3002  V30117
    1974     V3402  V30138
    1975     V3802  V30160
    1976     V4302  V30188
    1977     V5202  V30217
    1978     V5702  V30246
    1979     V6302  V30283
    1980     V6902  V30313
    1981     V7502  V30343
    1982     V8202  V30373
    1983     V8802  V30399
    1984    V10002  V30429
    1985    V11102  V30463
    1986    V12502  V30498
    1987    V13702  V30535
    1988    V14802  V30570
    1989    V16302  V30606
    1990    V17702  V30642
    1991    V19002  V30689
    1992    V20302  V30733
    ------------------------------
    

    Note that not each record in the cross-year individual file will have a matching record in every single-year family file. This happens when an individual who was once part of a responding family moves away or dies and is no longer associated with a family in the study; the person is said to be non-response. The non-response person's Interview Number in the cross-year individual file is filled with 0s (as are the other variables) for years in which no data were collected about him or her.

    When merging the cross-year individual file with a single-year family file, both SPSS and SAS will fill in system missing values for the 19nn family variables for individuals who were not associated with a responding family in 19nn. Depending on your particular analysis needs, you may or may not wish to include individuals with missing family-year records. Provide appropriate instructions to the programs you use for merging to include or exclude individuals with missing family-year records.

    We can think of several approaches to creating a cross-year family-individual file from the components. Two are described and illustrated below. SAS and SPSS statements provided in the SAS and SPSS sub-directories can be used to help construct the programs.


    1. Method 1 - Merge Using Family Data Added Sequentially To Cross-Year Individual Data.

    First select individuals and variables from the cross-year individual file (remembering to retain all relevant annual family Interview Number variables) and then match that data with the desired variables from a single-year family file, matching on the appropriate annual family Interview Number variable, using a one-to-many match.

    Next, match the resulting file (which now contains one record for each individual with selected variables from the cross-year individual file and the first family file) with a second family file matching on the appropriate annual family Interview Number variable, using a one-to-many match.

    Repeat with additional single-year family files until all required family data are obtained and merged with the cross-year individual data, as the diagram below shows.

    See SPSS or SAS examples for an illustration of this approach using three years of family data.

    Merge Using Family Data Added Sequentially To Cross-Year Individual Data
    .       +---------------------------+ +--------------+
    .       |1968-1992 Individual File  | |1st Family    |
    .       |N=inds, subset if desired  | |   File       |
    .       |                           | |N=1yr fam     |
    .       +---------------------------+ +--------------+
    .                 |                        |
    .                 +------------------------+
    .                       |
     STEP 1:  Sort and match on first annual family Interview Number
    .                       |
    .       +-------------------------+ +-----------+
    .       |1st Family + 1968-1992   | |2nd Family |
    .       |Individual File          | |   File    |
    .       |N=inds, subset if desired| |N=2yr fam  |
    .       +-------------------------+ +-----------+
    .                 |                        |
    .                 +------------------------+
    .                       |
     STEP 2:  Sort and match on second annual family Interview Number
    .                       |
    .       +-------------------------+ +-----------+
    .       |1st Family + 2nd Family  | |3rd Family |
    .       |+ 1968-1992 Individual   | |   File    |
    .       |N=inds, subset if desired| |N=3yr fam  |
    .       +-------------------------+ +-----------+
    .                 |                        |
    .                 +------------------------+
    .                       |
     STEP 3:  Sort and match on third annual family Interview Number
    .                       |
    .       +------------------------------------+
    .       |1st Family + 2nd family + 3rd Family|
    .       |+ 1968-1992 Individual File         |
    .       |N=inds, subset if desired           |
    .       +------------------------------------+
    


    2. Method 2 - Merge Using Multiple Family-Individual Files.

    Alternatively, you could do a series of one-to-many matches of the single-year family files and the cross-year individual file matching on the appropriate annual family Interview Number and then merge the resulting single-year family-individual files in a one-to-one match using the 1968 Interview Number and Person Number. Detailed steps are noted below.

    Step1: Subset annual family Interview Number and other selected variables and select cases from cross-year individual file.

    Step2a: Subset selected variables from the year-n family file.

    Step2b: Sort subsetted year-n family file from Step 2a by year-n family Interview Number.

    Step2c: Sort subsetted cross-year individual file from Step 1 by year-n family Interview Number.

    Step2d: Merge sorted cross-year individual file from Step 2c with sorted year-n subsetted family file from 2b (a one-to-many, family-to-individual, match) matching on the year-n family Interview Number.

    Step2e: Sort resulting year-n family-individual file from Step 2d by the individual identifiers, 68 family Interview Number (V30001) and Person Number (V30002).

    ... Repeat Steps 2a-2e for all other years.

    Step3: Merge family-individual files from Step 2e by the individual identifiers, 68 family Interview Number (V30001) and Person Number (V30002).

    See the diagram for an illustration of this approach.

    See SPSS or SAS examples for an illustration of this approach using using 25 years of family data.

    Illustration Of Merge Using Multiple Family-Individual Files

    .   +---------++---------++---------++---------++---------++---------+
    .   |68-92 In-||1st      ||68-92 In-||2nd      ||68-92 In-||3rd      |
    .   |dividual ||Family   ||dividual ||Family   ||dividual ||Family   |
    .   |File     ||File     ||File     ||File     ||File     ||File     |
    .   |N=inds   ||N=1yr fam||N=inds   ||N=2yr fam||N=inds   ||N=3yr fam|
    .   +---------++---------++---------++---------++---------++---------+
    .      |           |        |             |         |            |
    .      +-----------+        +-------------+         +------------+
    .            |                     |                       |
    Step 2:
          Match on 1st year    Match on 2nd year       Match on 3rd year
          Interview Number     Interview Number        Interview Number
    .            |                     |                       |
    .    +---------------+     +---------------+       +---------------+
    .    |1st Family-    |     |2nd Family-    |       |3rd Family-    |
    .    |Individual File|     |Individual File|       |Individual File|
    .    |N=inds         |     |N=inds         |       |N=inds         |
    .    +---------------+     +---------------+       +---------------+
    .          |                      |                       |
    .          +----------------------+-----------------------+
    .                                 |
    Step 3:     Match on 1968 Interview Number and Person Number  |
    .              +-----------------------------------+
    .              |                                   |
    .              | Cross-year Family-Individual File |
    .              | N=inds                            |
    .              +-----------------------------------+
    


    C. Assembling A Cross-Year Family File

    To assemble a 1992 cross-year family file from these files, a procedure similar to one of the above would be followed, but only the cross-year individual records of the 1992 head would be selected from the cross-year individual file. Merge data from the single-year family files using the annual family Interview Number variables to match as described in Method 1 or Method 2 above to create a merged 1968-1992 family-level file for currently responding families.

    Each member of a family has a family Interview Number for each wave with a value identical to the values of that data item for all the other family members in that family that year. In addition, except in 1968, each individual is annually assigned a unique sequence number, which indicates the person's position and status for any given year's list of family members. Thus, the first person listed, always the Head of the family, is 01, the second person listed is 02, and so on.

    To create a 1992 cross-year family-level file, select from the cross-year individual file those cases where V30734 (1992 Sequence Number) is equal to 01, since each family must have at least one member, although it may or may not have more.*

    __________________________________________________________________________

    * Variable V30734, Sequence Number, should be used instead of V30735, Relationship to Head, because although each family has one and only one current Head (i.e., where V30734 = 01-20 and V30735 = 10), it is possible that the prior year's Head has moved out since the previous interview and a new Head is present for the current interview. Relationship to Head for movers-out is coded with reference to the previous year's Head, so for both the current Head and the previous Head, V30735 = 10.

    There is not an 1968 Sequence Number variable; use V30003, Relationship to Head, instead. There was only one Head per household in 1968.

    __________________________________________________________________________

    To create other years' cross-year family-level files, the Sequence Number variable for the latest desired year of data should be used and merges done with the appropriate single-year family files. Again, this produces a file of families who were response through the latest year and eliminates families who had already become nonresponding.


    D. Single-Year Family Files And Single-Year Family-Individual Files

    Producing single-year family files for cross-sectional analysis is simplicity itself. Simply use the single-year file.

    Single-year family-individual files are also relatively simple. Select all individuals whose Sequence Number for the desired year is non-zero (for 1968, use V30003, Relationship to Head, instead) and match the family Interview Number for that year from the individual file with the family Interview Number from the corresponding family file. The family Interview Numbers in the family and individual files are listed in a table in Section "B. Assembling A Cross-Year Family-Individual File", above.


    E. Additional Help

    If you have questions regarding this file, please contact us.

     



    Institute for Social Research | University of Michigan | Privacy | Conditions of Use