CONTENT OUTLINE

I.  Introduction

II.  Characteristics
      A.  Variable Numbers, Positions and Generated Variables
      B.  Documentation (or Paucity Thereof!) and Codes
      C.  Problem Variables, Missing Variables and the 1993-1994 Family
            Index
      D.  Files and Format
      E.  Additional Notes:  Sample Supplements in 1993 and 1994

III.  A Concluding Note


                    PANEL STUDY OF INCOME DYNAMICS, 1994
                        PRELIMINARY FILES FOR 1994


I.  Introduction

The more than two-year interval between the completion of interviewing on a
given PSID wave and the public release of a fully-cleaned and documented data
file has prompted demand for speedier release of a "preliminary" version of
PSID data files.  In response to this demand, the PSID staff has produced a
"preliminary" version of the 1994 family and individual files.

The files' preliminary nature (including, most notably, very incomplete
documentation and virtually no PSID staff counselling) leads us to recommend
these data files only to very experienced PSID data users.  Relatively
inexperienced users should wait for the regular release of the data files.

We trust that the experienced research community will be able to make effective
use of these data, despite their very preliminary form, without increasing the
workload on PSID staff.  This document is written for experienced users and
will be the ONLY documentation of this version of these files released by PSID.

In a nutshell, the advantage of these preliminary files is that of quicker
access to recent waves of data.

Disadvantages include:  a) no documentation other than this treatise and the
1994 questionnaire; b) an incomplete set of family-level variables, with the
most prominent missing variables being annualized work and income components
and totals, weights and prorated poverty thresholds, and individual "summary
variables" about marital and fertility histories; c) zero values for most or
all cases for a handful of variables (detailed below in Section II, Part A); d)
rather "dirty" data containing some wild codes and no imputations; and e)
extremely limited and very grumpy PSID staff counselling (nobody wants the
release of these files to add to the time it takes us to release the
fully-cleaned and documented versions of them).

Many of the 1994 early release variables are not directly equivalent to
variables from 1992 and earlier waves (although they are equivalent to the 1993
early release data).  See Section II below for further details.


II.  Characteristics

Two data files are included as part of the "preliminary" data-file package: 
the 1994 single-year family-level data with almost 11,000 data records and the
1968-1994 early-release individual file with about 50,000 records, including
both response and nonresponse individuals for all waves of the study.  Data
from the Latino sample are also included.

Beginning with the 1993 wave, the data were collected using CATI (Computer
Assisted Telephone Interviewing).  This means that information about each
question is collected electronically by the interviewer and in effect is coded
at the time of collection.  Of course this data collection method replaces
contingency and wild code checking, formerly performed as part of our coding
operation.  But because of the way in which questions are asked and responses
are given, the data much more closely resemble answers to questionnaire
questions.  For example, rent costs now exist as two variables:  one for the
dollar amount and one for the time unit, e.g., $400 per month and $100 per week
are typical of responses to the question about rent payments.

Unlike data collected through 1992, the family data have NOT been cleaned with
our manual economic edit process (nor have imputations been made), so the user
must convert these kinds of amounts into some sort of consistent unit for
inter-case comparison and make decisions about handling missing data.  In
addition, we expect that values for quite a few cases will change when we do
perform economic edit operations.  For instance, time spent working, being laid
off, unemployed, out of the labor force, etc. does not sum to 52 weeks per year
in more than 10 per cent of the cases.

As mentioned above, dollar amounts generally are associated with time units. 
All time unit questions include an "other" code, as well as options for missing
data.  Amounts associated with these "other" codes will have to be recoded from
missing data or else imputed when the data are cleaned.  Also beware that we
are not intending to include these component amount and time unit data as part
of the main final release file.  Our current plans are to release final data
that resemble as closely as possible our datasets from the past.  We believe
that the majority of users will not be interested in computing amounts
differently than we have in the past.  However, the amount-time unit (and
similar) data collected in CATI but not generally part of our prior final files
will probably be available as a separate, subsidiary file so that users who
desire this detail can access it.

Our usual inter-year consistency checking was performed for the 1968-1994
individual file, so we expect the individual-level data to remain quite stable
for final release.

Beginning with the 1990 data, we faced problems with our merged record formats,
i.e., the cross-year distribution files would exceed a logical record length of
32,767, the maximum allowed on most systems (including our own).  So, in
contrast to the structure of the cross-year family and cross-year
family-individual files issued prior to 1990, family-level data files have NOT
been merged across waves to form a single data record.  Thus the analyst must
merge the necessary information from the appropriate files.  This is not
difficult since the needed family identification numbers appear both in the
cross-year individual file and in the single-year family data files.  Detailed
instructions for the merging process are located in the 1990 through 1992
family documentation and are not repeated here.


A.  Variable Numbers, Positions and Generated Variables

All variable numbers for both family and individual early release files are
prefaced with "ER", rather than "V", to assist both users and study staff in
the future to determine whether reference is to the early release file or to
the final release version.  All 1994 early release family-level variables are
in the range ER2001 through ER4016.  Most of these variables will eventually be
incorporated into the final version of the 1994 data, but their variable
numbers will change and the data will be cleaner.  Variable numbers and
locations for the 1994 family file are not the same as those we intend for the
final version.

In addition, the family file includes neither variable numbers nor positions
for so-called "edited" and "generated" family-level variables.  By "edited"
variables we mean the first 300 or so variables usually present in each wave's
family-level data, beginning with the state of residence and ending with income
detail for other family unit members.  The term "generated" variable refers to
those variables traditionally located at the end of the raw data after the
Head's background information.

Omitted variables include:  annual mortgage and rent payments, annual food
costs, annual work hours, annual unemployment, etc. hours, annual income of any
sort for all family members, total family money income, Head's total labor
income, pro-rated food and poverty thresholds, education of Head and
Wife/"Wife", family income deciles, and average hourly earnings of Head and
Wife/"Wife".  Component items exist on the file, however, so that the user may
generate these items.  Needless to say, imputations have NOT been done for
missing data.

Some other omitted variables cannot be generated from information available. 
These include:  weights, state and region of residence, urbanicity, Head's
geographic mobility, numbers of children in various age and sex categories,
county unemployment rate, and variables linking related families.

In short, all variables equivalent to the 1992 variable ranges V20303-V20620
and V21481-V21549 are absent.

Background information is not asked about Heads and Wive/"Wives" each and every
year.  We ask the questions about new Heads and new Wives/"Wives" only.  If a
female Head marries and becomes a Wife, then she is reasked the background
information, and her new husband, the Head, is also asked.  During processing,
we have traditionally brought forward the background information from previous
waves for Heads or Wives/"Wives" who are the same persons as in the prior year.
In every wave, each set of background variables is preceded by a variable
indicating whether data need to be brought forward.

The 1994 early release file, in keeping with our practice for other early
release files, has not undergone this bringing forward.  In addition, the
process is somewhat less straightforward than for our previous early release
files.  The background data include questions about Head's father's occupation,
state and county variables for the locations where Head and his parents grew
up, and number of states and regions in which Head has lived.  These variables
have NOT YET been created for the 1994 early releases, so the user must
carefully compare the list and codes of background variables included in the
1994 early release data set before bringing forward prior-wave information.

Background information is complete for 1992 on the 1992 final release file, but
as of this writing, the 1993 family data are available only in early release
form and therefore have not yet undergone the bringing-forward process.  Only
Heads and Wives/"Wives" who were new in 1993 have actual background data in the
1993 early release file, so the user must search both 1992 and 1993 data to
complete 1994 background variables.

There is another complicating factor in bringing forward background data:  the
absence of the 1992 and 1993 family ID numbers on the 1994 family file. 
Therefore, the user must check these variables from the individual data file in
order to match with 1992 and 1993 background information.

Below we detail the procedure for bringing forward Head's background
information in a series of steps.

          COMPARE THE 1992 BACKGROUND VARIABLES ITEM FOR ITEM WITH 1993 AND
          1994 DATA FOR COMPARABILITY OF CODES AND FOR IDENTICAL ITEMS; SOME
          1992 BACKGROUND QUESTIONS ARE NOT INCLUDED IN THE 1993 AND 1994 SETS
          OF BACKGROUND DATA.  IN ADDITION, THE 1993 AND 1994 VARIABLES ARE NOT
          COMPLETELY IDENTICAL TO EACH OTHER!!!!!

          Next, match the 1994 family file with the 1994 Head's record from the
          1968-1994 individual file (1994 family variable ER2002 with
          individual variable ER33101 where ER33102=01).  Copy the 1992 and
          1993 family IDs from the individual file (ER30733 and ER33001) to the
          1994 family file.

          Check values for 1994 family variable ER3917, the indicator for
          whether background information exists for Head on the 1994 file.  If
          ER3917=1, then the appropriate background information is already
          part of the 1994 data, and this case needs no further processing.

          If 1994 ER3917=5, match the 1993 family ID that you attached to the
          1994 family file with the 1993 family ID number from the 1993 file
          (1993 ER1850).  Check the value for the 1993 Head's background
          indicator variable (1993 ER1850) on the 1993 family file.  If
          ER1850=1, then the background data are located on the 1993 file.
          Copy the data from the 1993 family file (1993 ER1851-ER1944) to
          equivalent variables in the 1994 family file (1994 ER3918-ER3986),
          recalling that there is not a one-to-one match.

          If 1993 ER1850=5, then it is necessary to go back to the 1992 final
          release family file for the background information.  Match the 1992
          family ID from the 1994 family file with the 1992 family ID number
          from the final 1992 file (V20302).  There is no need to check the
          value for the 1992 indicator (V21388), as all 1992 cases contain
          background information.  Copy the data from the 1992 family file
          (1992 V21389-V21461) to the corresponding variables in the 1994
          family file (ER3918-ER3986), again recalling that these variables do
          not match perfectly.

A similar procedure can be done for Wives/"Wives" using the 1993 and 1994
indicators (ER1777 and ER3863, repectively) to determine same Wife/"Wife".  We
advise using the 1993 family ID from the 1994 Wife's/"Wife's" individual data
record in place of the Head's.  The values to be copied are 1993 ER1778-ER1849
or 1992 V21340-V21387 to the equivalent 1994 variables (ER3864-ER3916).  These
variables must also be checked for direct correspondence between the 1993 and
1994 early release files and the 1992 wave.

Individual-level data on recent PSID data files have consisted of annual
measures and a set of "summary variables" that have appeared at the end of the
individual data record.  In the 1994 preliminary data, most of the annual
measures (e.g., Sequence Number, Relationship to Head, Family Identification
Numbers) are available, while virtually NONE of the "summary variables" are
included.

With a single exception, the individual-level "summary variables" (i.e.,
V31996-V32049) are not included on these files.  The exception is V32000, Sex
of Individual, which is too important to omit.

Variables ER30001 through ER30794 will remain the same for the final release
version (with the prefix change from "ER" to "V"), but a few more variables
must be added to the 1992-1994 individual data, most notably the weights.  The
order of the 1968-1994 early release data is as follows:  1968 through 1992
individual data are arranged as usual by wave; we jump to the summary variable
ER32000, Sex of Individual, and then we include the 1993 individual data in
ER33001-ER33018 and the 1994 individual data in ER33101-ER33118.  For the final
release version, the 1993 and 1994 variables will be moved to follow the
completed 1992 individual data and ER32000 will appear in its usual place among
the summary variables.

Some 1993 and 1994 equivalents of the annual individual-level variables are not
included in this preliminary version.  Variables with this treatment include
individual income components and totals, linking measures for splitoffs, age
generated from birth date (rather than respondent report), reason for
nonresponse, and weights.  These variables will be located near the end of the
yearly data, just as in 1992 and earlier waves.

To create variables from early release data that resemble those on final files
from 1992 and earlier waves, we suggest users consult the 1992 codebooks. 
Descriptions of "edited" and generated variables for 1992 include enough
information to create many variables, for example, annual work hours of Head
and Wife/"Wife" and Head's annual wages.  Some other variables, such as total
family money income, are not generatable because income components of
individuals other than Head and Wife/"Wife" are not included in the 1993 and
1994 early release data.


B.  Documentation (or Paucity Thereof!) and Codes

This document does not include codebooks.  However, a 1994 questionnaire is
incorporated in the early release package.  It is available on the Internet in
a PDF format suitable for perusal with an Adobe Acrobat viewer.  See our home
page for further information.  (The Acrobat viewer is available free of
charge.)  Use the variable labels from the SAS and SPSS to match variables with
the questionnaire.  The questionnaire text contains codes for most data items. 
The codebook from Section II, Part 1 of the 1992 documentation can also be
helpful in deciphering the "preliminary" data on this file.

An index of 1993 and 1994 early release family file variables is included in
the following section.  The index covers ONLY 1993 and 1994 variables; no
attempt has been made to link the early release variables with equivalents from
1992 and earlier waves.

In general, codes follow our traditional structure, although "don't know"
responses are now largely distinguished from other missing data responses.  If
the questionnaire does not indicate otherwise, code 8 (or 98 or 998, etc.)
represents "don't know" and code 9 represents a refusal or other missing data. 
Inappropriate questions are padded with zeroes.  A few fields contained 
non-numeric characters, and these have also been converted to zeros for the 
early release file.  If a variable contains a code value that is neither 
included in the questionnaire nor one of the zero, eight or nine codes just 
mentioned, assume missing data for that value.  We will clean such cases for 
final release, but time constraints do not permit this sort of cleaning for 
early release.  The inevitable exception:  codes 21 through 24 for month 
variables in event dating questions were not printed in the questionnaire 
but were used throughout the CATI application to indicate mentions of season 
only.  These codes follow:

                     21.  DK month, but season was winter
                     22.  DK month, but season was spring
                     23.  DK month, but season was summer
                     24.  DK month, but season was autumn

For individual data, use the codebook in our 1992 documentation, Section II,
Part 2.  Similar variables for 1993 and 1994 are coded identically to those
from earlier waves.


C.  Problem Variables, Missing Variables and the 1993-1994 Family Index

Some variables included on the 1994 file are known to include bad or completely
missing data.  These will be corrected for the final version of the file, but
in the meantime we want to inform users about 

The 1994 file includes many series of variables concerning monthly dating of
events during the prior calendar year.  For example, ER2119-ER2130 indicate the
months during which the Head worked on his or her present main job in 1993
(questionnaire question B39).  The "strings" consist of a set of twelve dummy
variables, one for each month.  Essentially, a code value of 0 indicates that
the activity did not occur during this month; a code value of 1 indicates that
it did.  The month of January in each monthly "string" is suspect because it
can contain a value of 1 when the value should be 0.  This implies that incomes
could be miscalculated if the monthly string is used for computation.  In
addition, the series for question E66, months in which a nonworking Wife/"Wife"
was unemployed in 1993, is missing the month of February entirely; there are
only eleven variables in this set.

Several variables are included in the 1994 early release file, but PSID staff
has found that code distributions are suspect or all cases contain missing
data.  These are ER2055 (question A36), reason why the family neither owns nor
rents the HU; ER3718 (question G111), a checkpoint for number of dependents;
and ER 3924 and ER3926 (questions L14 and L16), education of Head's father and
mother, respectively.

The individual-level variable ER33111, Employment Status, contains zeroes for
every person on the file.  The employment statuses of Head and Wife/"Wife" are
available on the family file (ER2068-ER2071 and ER2562-ER2565, respectively),
but information for other individuals is missing.
Besides the above-mentioned omission of the month of February from the set of
variables for question E66 (series ER2932-ER2942), some other variables are
missing from the 1994 early release file:  question G113, the number of persons
dependent on this family for more than half of their support; and questions
G9a-G9d, whether Head and Wife/ "Wife" spent time working at a business and, if
so, whether they reported those work hours.


D.  Files and Format
The early release package for 1994 consists of the two data files mentioned
above, i.e., the 1994 family-level data and the merged individual-level
1968-1994 data.  These are ASCII data files.  We have also included two other
pairs of files with information about variables in the corresponding data
files.  These are SAS and SPSS data definition statements.  The user is
cautioned that neither of these contains missing data specifications as of this
time, although we plan to include missing data information with the final
release versions.


E.  Additional Notes:  Sample Supplements in 1993 and 1994

We had added a Latino sample of 2,043 families to the PSID for 1990.  This
sample is described in detail in the 1990 documentation, but briefly the Temple
University Institute for Survey Research selected and interviewed this sample
for the Latino National Political Survey (LNPS).  The Latino addition was made
congruent with our usual ID scheme and unique identifier formats, and these
cases are easily identified at the family and individual levels by the code
values for 1968 ID Number (V20302 for 1992 family data and V30001 for the
individual file): the values for their 1968 IDs are in the range 7001 to 9043.

In 1992 several different kinds of recontacts were attempted.  These are
described in detail in the 1992 family documentation, but briefly, three groups
were followed:  all 1991 nonresponse; a random subset of SRC and Census sample
members who had become nonresponse in 1990 or earlier; and all of Temple
University's Latino sample persons who were not successfully interviewed by us
in 1990.  The successfully recontacted Latino families have 1968 ID Numbers are
in the range 9244-9308.

The 1993 and 1994 waves included a change in PSID following rules.  We now
follow all sample persons who leave home, regardless of age.  So, for example,
when a sample male Head leaves his nonsample spouse with their children, we
attempt an interview not only with him but also with her because her household
contains sample members.

Our recontact effort for 1993 included the resurrection of many nonresponse
sample persons who shared a 1968 ID number with families still responding in
1992, similar to the second group selected for 1992 as described above.  But in
contrast to this 1992 group, priority was given to families with connected
individuals under age 18 at the time of nonresponse.  All sample individuals
within such a family were selected for recontact, even if they themselves were
older.

The main thrust of the 1994 recontact effort was to follow some nonsample ex-
spouses of sample members; these ex-spouses had had one or more children with
the sample members, and at least one of those children was expected to be under
age 18 by 1994.  In addition, recontacts were attempted with 1992 and 1993
nonresponse and also with families with no remaining response individuals. 
Some of these latter families had become nonresponse as early as 1969.


III.  A Concluding Note

We close by repeating our warning:  THESE DATA SHOULD BE USED ONLY BY VERY
EXPERIENCED PSID DATA ANALYSTS.  The absence of complete documentation makes it
difficult to determine the precise coding of a number of variables on the file. 
And the absence of weight variables makes it impossible to use these files by
themselves to produce any nationally-representative estimates from either the
original or Latino samples.  We expect that these preliminary versions of the
data will be useful for experienced users who want to pull a handful of
variables from the files so that they can be merged onto analysis files
constructed from prior-wave data.