/er95read.me PANEL STUDY OF INCOME DYNAMICS 1995 EARLY RELEASE FILES DOCUMENTATION CONTENT OUTLINE I. INTRODUCTION A. Reasons for and limitations of the early release files B. What's new for 1995 1. Education supplement 2. Supplemental question on food shortage 3. New questions about recipients of Supplemental Security Income II. CHARACTERISTICS A. Files and format B. Documentation (or paucity thereof!) and codes 1. Questionnaire 2. Codes C. Variable numbers, positions and generated variables 1. Family file 2. Cross-year individual file D. Bringing forward background information for Head and Wife/"Wife" E. Problem variables, missing variables F. Comparability with 1992 final release file and earlier final release files G. Additional notes: Sample supplements in 1993, 1994 and 1995 III. A CONCLUDING NOTE ................................................................... I. INTRODUCTION A. Reasons for and limitations of the early release files The more than two-year interval between the completion of interviewing on a given PSID wave and the public release of a fully cleaned and documented data file has prompted demand for speedier release of an "early release" version of PSID data files. In response to this demand, the PSID staff produced an early release version of the 1990, 1991 and 1992 family files and the 1968-1992 cross-year individual file; all these files are now available in final release form. These were followed by early release versions of the 1993 and 1994 waves of data. The latest in this series is the early release version of the 1995 wave of data -- an early release version of the 1995 family file and the 1968-1995 cross-year individual file. The early release files are available both at the PSID's Internet site and from the ICPSR (Inter-university Consortium for Political and Social Research). These files' preliminary nature (including, most notably, very incomplete documentation and limited PSID staff counselling) leads us to recommend these files primarily to experienced PSID data analysts; analysts not experienced with the PSID may wish to wait for the fully documented final release data files. All analysts should be aware that a few records and values of some variables may change from the early release to the final release version of these files. We trust that the experienced research community will be able to make effective use of these data, despite their very preliminary form, without increasing the workload on PSID staff. In a nutshell, the advantage of the early release files is that of quicker access to recent waves of data; disadvantages include: a) LACK OF DOCUMENTATION -- no documentation other than: this treatise, univariate frequencies of categorical variables, univariate statistics for continuous variables, and the 1995 questionnaire; b) MISSING VARIABLES -- an incomplete set of family and individual variables, with the most prominent missing variables in the family file being the annualized work and income components and totals including total family income, sampling weights and prorated poverty thresholds, and in the individual file the "summary variables" including variables about marital and fertility histories; c) MINOR PROBLEMS WITH DATA VALUES -- no imputations for missing data; data containing some wild codes; scrambled data for some variables in a few cases where a different person was determined to be the correct Head or Wife/"Wife" during family composition editing ; zero values for most or all cases for a handful of variables (see detailed description of these limitations in Section II below); and d) LIMITED PSID STAFF COUNSELLING -- nobody wants the release of these files to add to the time it takes us to release the fully cleaned and documented versions of them. B. What's new for 1995 1. Education supplement A supplemental module on education, conducted in 1995, covered a wide range of issues regarding the schooling experiences of family members between the ages of 5 and 49. The module gathered retrospective reports about attendance in public vs. private schools (non-religious, Catholic or other religion), highest grade completed, last date of attendance, high school graduation or GED, special education (including gifted), whether an individual has repeated a grade and extracurricular activities. Questions about pre-school experiences including involvement in Head Start, nursery school and day care programs were asked. In addition, there were questions about behavior problems in school (including suspension and expulsion), contact with police (not including minor traffic offenses) and time spent in correctional institutions such as reform school or prison. Data for this supplement are included in the 1968-1995 early release individual file (ER33220-ER33274). 2. Supplemental question on food shortage The PSID staff and the PSID Board of Overseers encourage researchers to propose supplemental questions to the PSID that are suitable for inclusion in a panel study and add value to existing and planned PSID questions. Proposed supplementary questions are selected by the PSID Board of Overseers after review of proposals explaining the scientific merit of the proposed supplement. Short supplementary questions are paid for by the PSID core budget. In 1995 the supplemental question was: F23. In the last twelve months, did you ever run out of the foods that you needed to make a meal and didn't have the money to get more? This variable is included in the 1995 early release family file as ER6091. 3. New questions about recipients of Supplemental Security Income Permanent additions to the questions on Supplemental Security Income were added. These questions were included to exclude amounts received on behalf of someone else. If the receipt of SSI income was reported for the Head, Wife/"Wife" or any OFUM (other family unit member), a follow-up question about whether it was received for that person or for someone else was asked. These variables are included in the 1995 early release family file as ER6277 and ER6584 for the Head and Wife/"Wife", respectively. II. CHARACTERISTICS A. Files and format The early release package for 1995 wave consists of two data files: 1) ER95F.DAT, the 1995 early release family data file, and 2) ER68-95I.DAT, the 1968-1995 early release individual data file and a number of files associated with the two data files. In addition to this file, ER95READ.ME, the 1995 early release family files include: File name Records LRECL Bytes ER95F.DAT 10,406 3,228 33,595,230 ER95F.SAS ... ... 93,656 ER95F.SPS ... ... 93,511 ER95FMEA.TXT ... ... 134,054 ER95FTAB.TXT ... ... 505,494 ER95FMD.TXT ... ... 90,440 and the 1968-1995 early release individual files include: File name Records LRECL Bytes ER68-95I.DAT 56,417 1,930 108,997,644 ER68-95I.SAS ... ... 60,165 ER68-95I.SPS ... ... 58,029 ER95ITAB.TXT ... ... 151,498 ER95IVAR.TXT ... ... 6,469 The contents of these files are described in detail below. .DAT files The data are in raw ASCII form. The 1995 early release family data file contains one record for each family interviewed in 1995. The file includes all family-level variables collected in 1995. The 1968-1995 early release individual data file is a merged cross-year file that contains a record for both 1995 response and 1995 nonresponse individuals; it includes a record for each individual who was in an interviewed family for any wave of the study. The file includes individual-level variables collected from 1968 through 1995. Refer to one of the corresponding files -- .SAS or .SPS -- for record format layout information, variable names and variable labels. .SAS and .SPS files These files, respectively, contain SAS and SPSS data definition statements which provide variable names, locations, and variable labels. Missing data statements have not been provided. You should check the questionnaire and frequencies or means for each variable you intend to include in your analysis to determine which code values should be defined as missing. We do plan to include traditional missing data information with the final release versions. The SPSS and SAS statements are NOT intended to represent completed and full programs for the respective statistical program packages to run extracts, analysis, etc. You must provide all other SPSS or SAS statements needed to complete a program. .TXT files These ASCII text files provide additional information about the early release family file. The file ER95FTAB.TXT contains univariate frequencies for the categorical family variables; the file ER95ITAB.TXT contains univariate frequencies for the categorical individual variables and univariate statistics for the continuous individual variables. These files may be used to check for wild codes. The file ER95FMEA.TXT contains univariate statistics for the continuous family variables. For your information, the ad-hoc missing data statements used in producing these means are included at the end of the file. The missing data statements for the final release file will differ from these ad-hoc missing data statements because these statements were modified to include as missing some inappropriately entered values. The file ER95FMD.TXT contains missing data statements; This SAS code may be used to recode user-defined missing data codes to the SAS System Missing Code. B. Documentation (or paucity thereof!) and codes 1. Questionnaire We have not produced the traditional codebooks for the early release files. However, a 1995 questionnaire is available at our site on the Internet in a PDF format suitable for perusal with an Adobe Acrobat viewer (the Acrobat viewer is available free of charge - see our home page for further information). Use the SAS and SPSS data definition statements to match variables with questions in the questionnaire. The questionnaire contains codes for most data items. For family data, the codebook from Section II, Part 1 of the 1992 documentation can also be helpful in deciphering the early release data. For individual data, use the codebook in our 1992 documentation, Section II, Part 2; similar variables for 1993, 1994 and 1995 are coded identically to those from earlier waves. 2. Codes In general, codes follow our traditional structure, although "don't know" responses are now largely distinguished from other missing data responses. If the questionnaire does not indicate otherwise, code 8 (or 98 or 998, etc.) represents "don't know" and code 9 (or 99 or 999, etc.) represents other missing data or a refusal. Inappropriate questions are padded with zeroes. A few fields contained non-numeric characters, and these have also been converted to zeros for the early release file. If a variable contains a code value that is neither included in the questionnaire nor one of the zero, eight or nine codes just mentioned, you should assume missing data for that value. We will clean such cases for final release, but time constraints do not permit this sort of cleaning for early release. The inevitable exception: codes 21 through 24 for month variables in event dating questions were not printed in the questionnaire but were used throughout the CATI application to indicate mentions of season only. These codes follow: 21. DK month, but season was winter 22. DK month, but season was spring 23. DK month, but season was summer 24. DK month, but season was autumn C. Variable numbers, positions and generated variables All variable numbers for both 1995 early release family and 1968- 1995 early release individual files are prefaced with "ER", rather than "V", to assist both analysts and study staff to determine whether reference is to the early release file or to the final release version. 1. Family file The 1995 early release family variables are in the range ER5001- ER6856. Most of these variables will eventually be incorporated into the final version of the 1995 data, but their variable numbers will change and the data will be cleaner. Variable numbers and locations for the 1995 early release family file are not the same as those we intend for the final version. The 1995 early release family file includes neither variable numbers nor positions for so-called "edited" and "generated" family-level variables. By "edited" variables we mean the first 300 or so variables usually present in each wave's family-level data, beginning with the state of residence and ending with income detail for other family unit members. By "generated" variables we mean those variables traditionally located at the end of the raw data after the Head's background information. In short, all variables equivalent to the 1992 variable ranges V20303-V20620 and V21481-V21549 are absent. Variables not included in the early release file for which component items are available include: annual mortgage and rent payments, annual food costs, poverty thresholds, annual work hours, annual unemployment, etc., hours, annual income of any sort for Head and Wife/"Wife", Head's total labor income, numbers of children in various age and sex categories, education of Head and Wife/"Wife", and average hourly earnings of Head and Wife/"Wife". Since component items exist on the early release file, you may generate these items. Needless to say, imputations have NOT been done for missing data. To create variables from the 1995 early release data that resemble those on final files from 1992 and earlier waves, we suggest you consult the 1992 codebooks where you will find sufficient information about how the variables were created for 1992 to create them for 1995. Background information has not been asked about Heads and Wive/"Wives" each and every year. (Note, however, that in 1995 we asked about the education of all 1995 Heads and Wives/"Wives" in Section M of the questionnaire; these data are included in the 1968- 1995 early release individual file.) We ask the questions about new Heads and new Wives/"Wives" only. During processing, we have traditionally "brought forward" the background information from previous waves for Heads or Wives/"Wives" who are the same persons as in the prior year. In every wave, each set of background variables is preceded by a variable indicating whether data needed to be brought forward. The 1995 early release file, in keeping with our practice for other early release files, has not undergone this "bringing forward". See Section D below for a detailed description of how you can do this yourself. Other variables are not generatable because income components of individuals other than Head and Wife/"Wife" are not included in the 1995 early release data. Variables not included in the early release which cannot be generated from available information include: annual income of any sort for other family members, total family money income, poverty thresholds (because of missing income components), family income deciles, sampling weights, state and region of residence, urbanicity, Head's geographic mobility, county unemployment rate, and variables linking related families. 2. Cross-year individual file Recent cross-year individual PSID files have consisted of annual measures and a set of "summary variables" that have appeared at the end of the individual data record. In the 1968-1995 early release individual file, most of the annual measures (e.g., Sequence Number, Relationship to Head, Family Identification Numbers) are available. However, virtually NONE of the "summary variables" (i.e., V31996-V32049) are included; the single exception is V32000, Sex of Individual, which was too important to omit; it appears in the 1968- 1995 early release individual file as ER32000. Variables ER30001 through ER30794 will remain the same for the final release version (with the prefix change from "ER" to "V"). A few more variables will be added to the 1992-1995 individual data, most notably the sampling weights. The order of variables in the 1968-1995 early release individual file is as follows: RELEASE NUMBER, ER30000, 1968 through 1992 individual data arranged, as usual, by wave, ER30001-ER30794 the lone summary variable, SEX OF INDIVIDUAL, ER32000, the 1993 early release individual data, ER33001-ER33018, the 1994 early release individual data, ER33101-ER33118, and the 1995 early release individual data in ER33201-ER33274. For the final release version, the 1993, 1994 and 1995 variables will be moved to follow the completed 1992 individual data and ER32000 will appear in its usual place among the summary variables as V32000. Some 1993, 1994 and 1995 equivalents of traditional annual individual variables are not included in the early release file: individual income components and totals, linking measures for splitoffs, reason for nonresponse, and sampling weights. In the final release individual files, these variables will be located near the end of the yearly data, just as in 1992 and earlier waves. D. Bringing forward background information for Head and Wife/"Wife" As noted above, the background information for Head and Wife/"Wife" has not been "brought forward" for the 1995 early release family file. Background information is complete for 1992 on the 1992 final release family file, but as of this writing, the 1993 and 1994 family data are available only in early release form and have not yet undergone the bringing-forward process. Only families with Heads and Wives/"Wives" who were new in 1995 have background data in the 1995 early release family file. You must search, respectively, the 1994, 1993 and 1992 family data to complete 1995 background variables. Carefully compare the background variables item for item and code for code in 1992 final release family file and the 1993, 1994 and 1995 early release family files before you attempt to bring forward prior-wave background information. You should be aware that the 1993, 1994 and 1995 background variables are not completely identical to each other! In addition, some 1992 background questions are not included at all in the 1993, 1994 and 1995 early release family files' background data because they have NOT YET been created; among these are questions about: Head's father's occupation, state and county variables for the locations where Head and his/her parents grew up, and number of states and regions in which Head has lived. One more factor complicates bringing forward background data: the absence of the 1992, 1993 and 1994 family ID numbers on the 1995 family file. You must obtain these variables from the Head's record in the 1968-1995 early release individual file in order to match with 1992, 1993 and 1994 family files to bring forward the background information. Below you will find detailed a sugguested procedure for bringing forward the Head's and Wife's/"Wife's" background information. 1. First, add the 1995 Head's 1993, 1994 and 1995 interview numbers from the 1968-1995 early release individual file to the 1995 early release family file. sort er95fam by 95 i'w (ER5002 "1995 INTERVIEW #") sort er68-95ind by 95 i'w (ER33201 "1995 INTERVIEW NUMBER") for 1995 Heads (ER33202 "INDIVIDUAL SEQUENCE NUMBER 95" = 01) merge er95fam & er68-95ind by 95 i'w; add Head's 1994, 1993 and 1992 i'w # (ER33101 "1994 INTERVIEW NUMBER 94", ER33001 "1993 INTERVIEW NUMBER 93", ER30733 "1992 INTERVIEW NUMBER") to er95fam 2. Check to determine whether the 1995 family includes a Wife/"Wife" and whether new Head and new Wife/"Wife" information is present in the 1995 early release family file. If it is, then the appropriate background information is already part of the 1995 early release family file, and this case needs no further processing. if no Wife/"Wife" (ER5008 "AGE OF WIFE" = 0)then statwife=1 else statwife=0 if new Wife/"Wife" 95 (ER6733 "K1 CKPT: WTR WIFE" = 1)then statwife=1 if new Head 95 (ER6787 "L1 CKPT: WTR NEW HEAD" = 1)then stathead=1 else stathead=0 3. If new Head or new Wife/"Wife" information was not present in the 1995 early release family file, check to determine whether it is present in the 1994 early release family file. If it is, then replace the values of the variables in the 1995 early release family file with values of the corresponding variables from the 1994 early release family file. Remember that these variables differ slightly from year to year. sort er95fam by 94 i'w (ER33101 "1994 INTERVIEW NUMBER 94") sort er94fam by 94 i'w (ER2002 "1994 INTERVIEW #") for new 94 Wife/"Wife"s (ER3863 "K1 CKPT: WTR WIFE" = 1) or for new 94 Heads (ER3917 "L1 CKPT: WTR NEW HEAD" = 1) merge er95fam & er94fam by 94 i'w if statwife=0 and ER3863=1, bring forward 94 new Wife/"Wife" info and set statwife=1 if stathead=0 and ER3917=1, bring forward 94 new Head info and set stathead=1 4. If new Head or new Wife/"Wife" information was not present in either the 1995 nor in the 1994 early release family files, check to determine whether it is present in the 1993 early release family file. If it is, then replace the values of the variables in the 1995 early release family file with values of the corresponding variables from the 1993 early release family file. Again, recall that differ slightly from year to year. sort er95fam by 93 i'w (ER33001 "1993 INTERVIEW NUMBER 93") sort er93fam by 93 i'w (ER2 "1993 INTERVIEW # 93") for new 93 Wife/"Wife"s (ER1777 "K1 CKPT: WTR WIFE" = 1) or for new 93 Heads (ER1850 "L1 CKPT: WTR NEW HEAD" = 1) merge er95fam & er93fam by 93 i'w if statwife=0 and ER1777=1, bring forward 93 new Wife/"Wife" info and set statwife=1 if stathead=0 and ER1850=1, bring forward 93 new Head info and set stathead=1 5. If new Head or new Wife/"Wife" information was not present in the 1995, 1994 or 1993 the early release family files, obtain the information from the 1992 final release family file. There is no need to check the value for the 1992 indicator, as all 1992 cases contain background information. Replace the values of the variables in the 1995 early release family file with values of the corresponding variables from the 1992 final release family file. Again, recall that these variables do not match perfectly. sort er95fam by 92 i'w (ER30733 "1992 INTERVIEW NUMBER") sort 92fam by 92 i'w (V20302 "1992 INTERVIEW NUMBER") merge er95fam & 92fam by 92 i'w if statwife=0, bring forward 92 Wife/"Wife" info and set statwife=1 if stathead=0, bring forward 92 Head info and set stathead=1 E. Problem variables, missing variables Some variables included on the 1995 early release files are known to include bad or completely missing data. These will be corrected for the final version of the file, but in the meantime we want you to be informed of the following known problems with the early release data. The 1995 early release family file includes many series of variables concerning monthly dating of events during the prior calendar year. For example, ER5118-ER5129 indicate the months during which the Head worked on his or her present main job in 1994 (questionnaire question B39). The "strings" consist of a set of twelve dummy variables, one for each month. A code value of 0 indicates that the activity did not occur during this month; a code value of 1 indicates that it did. However, due to a programming error, the month of January in each monthly "string" is suspect because it can contain a value of 1 when the value should be 0. If you are using any of these strings you may want to inspect the string for unusual patterns and recode the January value if it seems appropriate. For instance, cases with values of 1 for January but not for February may warrant special handling. Otherwise, annual incomes could be miscalculated if the monthly string were used for computation. Other variables in the 1995 early release family file with suspect code distributions include: ER6795 and ER6797 (questions L15 and L17), literacy of Head's father and mother, respectively, have 21 cases with values of 9 that should probably have had values of 0 instead. The following variables have all missing data: ER5054 (question A36), reason why the family neither owns nor rents the HU; and ER6720 (question G111), a checkpoint for number of dependents. Missing variables include: employment status for individuals other than the Head or Wife/"Wife"; question G113, the number of persons dependent on this family for more than half of their support; and questions G9a-G9d, whether Head and Wife/ "Wife" spent time working at a business and, if so, whether they reported those work hours. In the 1968-1995 early release individual file, ER33211, Employment Status, contains zeroes for every person on the file. F. Comparability with 1992 final release file and earlier final release files Beginning with the 1993 wave, the data were collected using CATI (Computer Assisted Telephone Interviewing). This means that information about each question was collected electronically by the interviewer and, in effect, was coded at the time of data collection. Conversion to standardized units of measurement, formerly performed as part of our coding operation has not yet been done. As a result, the data in the early release files much more directly resemble the answers to questionnaire questions than the 1992 and early years' data did. For example, instead of one variable indicating monthly rental expense, rent costs now exist as two variables: one for the dollar amount and one for the time unit, e.g., $500 per month and $100 per week are typical of responses to the question about rent payments. Therefore many of the 1993 through 1995 early release variables are not directly equivalent to variables from 1992 and earlier waves. As mentioned above, dollar amounts are often associated with time units in the early release file. BE AWARE THAT WE ARE NOT COMMITTED TO INCLUDING THESE COMPONENT AMOUNT AND TIME UNIT DATA AS PART OF THE MAIN FINAL RELEASE FILE. Our current plans are to release final data that resemble as closely as possible our traditional data files. However, the amount-time unit (and similar) data collected in CATI but not generally part of our prior final files MAY, if it is not included in the final release file, be available as a separate, subsidiary file so that analysts who desire this detail can access it. Unlike data collected through 1992, the family data have NOT been cleaned with our manual economic edit process (nor have imputations been made), so you must convert these kinds of amounts into some sort of consistent unit for inter-case comparison and make decisions about handling missing data. In addition, we expect that values for quite a few cases will change when we do perform economic edit operations. For instance, time spent working, being laid off, unemployed, out of the labor force, etc., does not sum to 52 weeks per year in about 10 per cent of the cases in the early release file! In addition, all time unit questions include an "other" code, as well as options for missing data; amounts associated with these "other" codes will be recoded from missing data or else imputed when the data are cleaned. Beginning with the 1990 data, we faced problems with the size of our merged cross-year record formats -- for 1990 the logical record length of a merged cross-year record would have exceed 32,767. The 1989 files were the last released the theretofore traditional cross- year family-individual and cross-year family format. You can recreate cross-year files with variables needed for your particular analysis by merging the necessary information from the appropriate family files. The needed family identification numbers appear both in the final release and early release cross-year individual files. They also appear on the final release family data files. While they do not appear on the early release family data files, you can obtain them from the Head's record in the 1968-1995 early release cross-year individual file (see Section G, step 1, above). Detailed instructions for the process of creating the cross- year files are included in the 1990, 1991 and 1992 family documentation volumes and are also available at our Internet site and are not repeated here. Much of our usual inter-year consistency checking was performed for the early release 1968-1995 cross-year individual file, so we expect the records in this file to remain relatively stable for final release. G. Additional notes: Sample supplements in 1993, 1994 and 1995 We had added a Latino sample of 2,043 families to the PSID in 1990. This sample is described in detail in the 1990 documentation. It was derived from a sample selected and interviewed by Temple University Institute for Survey Research for their Latino National Political Survey (LNPS). The Latino addition was made congruent with our usual ID scheme and unique identifier formats. Latino sample cases are easily identified in the family and individual files by the code values for 1968 ID Number (V20302 in the 1992 family file and V30001 in the 1968-1992 individual file) -- the Latino sample has code values in the range 7001-9043. In 1992 several different kinds of recontacts were attempted. These are described in detail in the 1992 family documentation, but briefly, three groups were selected: 1) all 1991 nonresponse; 2) a random subset of SRC and Census sample members who had become nonresponse in 1990 or earlier; and 3) all of Temple University's Latino sample persons who were not successfully interviewed by us in 1990. The successfully recontacted Latino families have 1968 ID Numbers in the range 9044-9308. Our recontact effort for 1993 included the resurrection of many nonresponse sample persons who shared a 1968 ID number with families still responding in 1992, similar to the second group selected for 1992 as described above. But in contrast to this 1992 group, priority was given to families with connected individuals under age 18. All sample individuals within such a family were selected for recontact, even if they themselves were older. The main focus of the 1994 recontact effort was to follow nonsample ex-spouses of sample members; these ex-spouses had one or more children with the sample members, and at least one of those children was expected to be under age 18 by 1994. In addition, recontacts were attempted with persons who had become nonresponse in 1992 or 1993, with nonresponse core sample persons who had no other family members still responding by 1993 (some of whom had become nonresponse as early as 1969), and with some children formerly designated nonsample but born to sample members since the study began. The 1993, 1994 and 1995 waves included a change in PSID following rules. We now follow all sample persons who leave home, regardless of age. So, for example, when a sample male Head leaves his nonsample wife and their sample children, we attempt an interview not only with him but also with her because her household contains their sample children. Beginning with the 1994 data collection, we also now consider as sample those children who are born to a sample parent in a year when the sample parent was not in an interviewed family. III. A CONCLUDING NOTE We close by repeating our warnings: a) We expect that these files will be most useful for experienced PSID data analysts, especially those who want to pull a limited number of variables to be merged onto analysis files constructed from prior-wave data. b) You should be aware that a few records and values of some variables may change from the early release to the final release version of these files. c) You should check the distribution of each potential analysis variable for wild codes. d) The absence of complete documentation may make it difficult to determine the precise coding of a number of variables on the family file. e) The absence of sampling weight variables makes it problematic to use these files by themselves to produce nationally-representative estimates from either the original or Latino samples. (The most recent sampling weights included in the 1968-1995 early release individual file are the 1991 individual sampling weights.) We hope you find these files useful. ;)