SIDE
Software for Intake Distribution Estimation (SIDE) can help you obtain estimates of usual nutrient intake distributions, their moments, and their percentiles. The software uses a method developed at Iowa State University described in Nusser, et al. (1996).
With SIDE you can- Input SAS© data sets with multiple intake observations on at least some individuals in the sample.
- Adjust dietary intake data for discrete or continuous factors of your choice, such as day of the week or age of the individual, and make adjustments to any day.
- Accommodate different sampling weights associated with each individual.
- Analyze data collected on consecutive days using an estimate of the correlation among intake days for each different dietary component.
- Estimate the distribution of usual nutrient intake for any number of nutrients and for subpopulations, such as gender.
- Compute the proportion of all individuals with intake above or below any number of given thresholds for each nutrient, including those not equally spaced.
- Estimate the standard error of each estimated percentile of the distribution for simple random samples.
- Output necessary data for producing plots of estimated densities.(Plots must be produced with another software package.)
What kind of output can you get?
With SIDE, you can control how much output is produced from your analyses. Detailed instructions for choosing what to output appear in the manual. SIDE allows you to print:- Moments of the observed intake distribution estimated either from the input data or from any of the various adjusted data sets.
- Selected percentiles of the observed intake distribution.
- Statistics and diagnostics for the different steps in the SIDE method for estimating usual intake distributions.
- Moments of the estimated usual intake distribution.
- Proportion of individuals in the sample with observed intake above or below one or several given thresholds and the corresponding standard errors.
View example output.
SIDE Can also produce SAS© data sets containing:- Data created during the initial smoothing process applied to the observed intakes.
- Data used to construct the semiparametric normality transformations.
- Percentiles and estimated function density values for the distribution of observed and usual intakes.
What equipment and software will you need?
SIDE (Version 1.01) is written in the SAS/IML© language, and it runs under the SAS© system. The estimation of usual intake distributions is performed by a set of 50 SAS/IML© modules residing in executable form in a permanent SAS© storage catalog. The source code for these modules is stored in a file called side.sas and consists of about 5800 lines of code.
These modules are called from a SAS/IML© program that you create. Because the modules are stored in executable form, you must process the side.sas program only once, no matter how many data sets you plan to analyze.
Note: The source code that accompanies the manual has been tested on SAS© versions 6.07, 6.08, 6.09, 6.10, 6.11, and 6.12. No downward compatibility to earlier versions of SAS© is implied, nor should it be inferred.
The current version of SIDE (Version 1.01) is written in the SAS/IML© language, and it runs under the SAS© system. To run SIDE, you will need:- A workstation running some version of UNIX or an IBM PC or compatible machine running Windows
- SAS© version 6.07 or later for UNIX or Windows SAS© version 6.08 or later for PC
SIDE will not run under SAS© version 6.04 or any earlier version. We have not conducted tests on platforms other than those mentioned.
The user must understand basic SAS© system concepts and operation in order to use SIDE.
What about your data?
Application of the method developed at ISU for estimating usual intake distributions requires that multiple intake days of data be available on at least some individuals in the sample because the method estimates the within-individual variation of intakes. This estimation is not possible when only one observation is available on each individual in the sample.
You must be sure that your data set has the following characteristics:- All recorded intakes must be nonnegative. If you have transformed your intake data in any way, please verify that transformed intakes are also nonnegative.
- The observed distribution of intakes must be smooth. No one intake value should appear with frequency significantly higher than the rest of the values. An example of a nonsmooth distribution is the observed distribution of alcohol, or of foods, where it is common to observe many zeros. This version of the software is not designed to handle such distributions.
- At least some individuals in the sample must have multiple intake observations. Two observations on a portion of sampled individuals is the minimum requirement for successful application of the SIDE method.
- The input data set must contain a total of mn observations, where m is the number of individuals in the data set and n is the maximum number of observations present for each individual.
- If intake data were collected on consecutive days and if you want to account for the day-to-day correlation, then you must supply an estimate of the correlation.
- Unequal sampling weights for each individual in the sample may be used. SIDE will produce a weighted analysis of your data.
- If your intake data were collected on different days of the week, during different seasons, or from individuals of different ages, ethnic groups, or income level, you can partially remove the effect of those variables from your intake data using SIDE's adjustment procedure. You may use continuous or qualitative variables for data adjustment.
- Data may be organized in subpopulations, such as ethnic groups or groups divided by location or income level. SIDE will provide an analysis within the subpopulations that you choose.
Overview of methodology
SIDE was developed to carry out the statistical method described in Nusser, Carriquiry, Dodd, and Fuller (1996). The method consists of several steps. The steps can be grouped according to the major tasks to be accomplished: preliminary data adjustments, semiparametric transformation to normality, estimation of within and between individual variances for intakes, and back transformation into the original scale.
1. Preliminary data adjustments:Preliminary data adjustments include shifting observed intake data by a small amount away from zero, incorporating survey weights by creating an equal weights sample, and correcting for the effect of sample day (first versus all the rest) on the mean and the variance of the distribution of observed intakes.
To shift observed intake data away from zero, the program computes the weighted overall mean intake and adds one ten thousandth of the weighted mean to each observation in the sample. The magnitude of the value added to each observation is small, but it eliminates all zero observations, and avoids potential problems should a log transformation be used.
A regression-based adjustment to account for differences among sample days is applied to shifted observations. Since it is well known that regression estimates have better properties when data are normal, the shifted intake data are power transformed prior to estimating the regression adjustment. The best power transformation for each nutrient is found by using a grid search, and choosing the power that minimizes the departure of the transformed data from normality. A regression model is fit to the shifted power transformed data, where the independent variables in the model include adjustment variables such as day of the week, month of the year, interview sequence, and any other survey related variable that you feel should be accounted for prior to analysis. The regression adjustment is a ratio, rather than the usual linear adjustment. This method guarantees that all adjusted intake values are positive. Usually, the first interview day is the standard, in the sense that observed means for later days are adjusted to be equal to the mean of the first day. Proper reorganization of your data set will allow any other day to be used as the standard, should you so desire.
The regression based ratio adjustment described above creates a set of dietary intake data with equal means across survey days. To make the variances across days equal as well, a second adjustment that scales the data such that the variance on each sample day is equal to that of the first sample day is applied.
The ratio-adjusted, variance-adjusted intake data have common mean and variance across survey days. After applying the inverse power transformation to the adjusted data, we create an "equal weights" sample, as described in Nusser, Carriquiry, Dodd, and Fuller (1996). The data set is now ready for the estimation of the usual intake distribution.
2. Semiparametric transformation to normality:Observed intake data (whether adjusted or not) generally have nonnormal distributions. For nutrients such as vitamins and some micro nutrients, skewness is quite extreme.
Most statistical procedures rely on the assumption of normality. For example, the method that was proposed by the National Academy of Sciences (NAS) in 1986 for estimating usual intake distributions, assumes that intake data are normal, and proposes that dietary intake data be transformed prior to analysis so that the normality assumption is satisfied. The SIDE procedure transforms adjusted dietary intake data into normality as a part of obtaining estimates of usual intake distributions.
The transformation into normality in SIDE is done in two steps. In the first step, data are transformed so that their distribution is as close to normal as possible, by using a power transformation similar to the one described earlier. However, power transformed data are not necessarily normal. Thus, a second transformation which takes the power transformed intakes into the normal scale is employed. This second step in the transformation is nonparametric and is based on a grafted polynomial model. The power transformation plus the grafted polynomial function make up the semiparametric transformation into normality.
3. Estimation of within and between individual variances in intakes:SIDE uses a measurement error model for observed daily intakes, similar to the model proposed by the NAS. The model states that the observed intake for an individual on any day is equal to that individual's usual intake plus a measurement error. The variance of the usual intakes is the between individual variance. The variance of the measurement errors is the within individual variance and reflects the day-to-day variation of intakes for an individual.
Estimates for both the within and the between individual variances are obtained under the measurement error model, under the assumption of normality. The variances are used to estimate the distribution of usual intakes in the normal scale.
4. Back-transformation into the original scale:The final step in the methodology is to transform the estimated usual intake distribution from the normal scale into the original scale. More than "undoing" the transformation into normality by applying the inverse transformation is required because the original transformation is nonlinear. The inverse transformation makes an adjustment for the fact that the mean of a nonlinear function of a random variable is not the nonlinear function of the mean. The inverse transformation is based on an approximation to the mean function. This back transformation is called mean transformation, since it brings the distribution of usual intakes (true individual means) back into the original scale.
The program estimates statistics of interest from the estimated usual intake distribution. For example, estimates for the mean and the variance of the usual intake distribution for a nutrient, for a set of percentiles, or for the proportion of the population below (or above) a given threshold are available.
Nusser, S. M., A. L. Carriquiry, K. W. Dodd, and W. A. Fuller. (1996), "A semiparametric transformation approach to estimating usual daily intake distributions," Journal of the American Statistical Association, December, 1996.