C-SIDE
Software for Intake Distribution Estimation (C-SIDE) can help you obtain estimates of usual nutrient and food intake distributions, their moments, and their percentiles. The software uses methods developed at Iowa State University and described in Nusser, Carriquiry, Dodd, and Fuller (1996) and Nusser, Fuller, and Guenther (1996).
If you have nutrient intake data (consumed almost daily), with C-SIDE you can:- Read ASCII text files containing intake data.
- Adjust dietary intake data for discrete or continuous factors of your choice, such as day of the week or age of the individual, selecting any sample day or the grand mean as the standard.
- Incorporate sampling weights associated with each individual.
- Analyze data collected on consecutive days using an estimate of the correlation among intake days for each different dietary component.
- Estimate the distribution of usual intake for any number of nutrients and for subpopulations, such as subpopulations defined by gender.
- Compute the proportion of all individuals with intake above or below any number of given thresholds for each nutrient, including unequally spaced thresholds.
- Estimate the standard error of each estimated percentile of the distributions using linearization variance estimators.
- Estimate the standard error for moments and percentiles of the distribution for data from complex survey designs using a jackknife or balanced repeated replication (BRR) method.
- Output data necessary for producing plots of estimated densities with another software package.
- Read ASCII text files containing intake data.
- Adjust positive dietary intake data for discrete or continuous factors of your choice, such as day of the week or age of the individual, selecting any sample day or the grand mean as the standard.
- Incorporate sampling weights associated with each individual
- Estimate the usual intake distribution for consumption days only for any number of foods and for subpopulations, such as subpopulations defined by gender.
- Estimate the distribution of the probability that an individual consumes the food on a day.
- Estimate the distribution of usual intake on all days.
- Estimate the fraction of nonconsumers and consumers of the food in the population.
- Compute the proportion of all individuals with intake above or below any number of given thresholds for each food, including unequally spaced thresholds, using either usual intake distribution.
- Output data necessary for producing plots of estimated densities with another software package.
What kind of output can you get?
C-SIDE will produce:- Diagnostics for the different steps in the method for estimating usual intake distributions.
- Moments of the estimated usual intake distributions.
- Estimated percentile values and the proportion of individuals in the sample with observed intake above and below given thresholds.
- Adjusted daily intake data in original units.
- Normal scale daily intakes, power transformed daily intakes, and normal scores from adjusted daily intakes.
- A set of representative usual intakes generated from the estimated usual intake distribution.
- Percentiles and estimated density function values for the distribution of usual intakes.
- Estimated usual intakes for each individual in the data set.
What equipment and software will you need?
C-SIDE (Version 1.0) is designed to work on a computer running an X Windows System graphical window manager (commonly used on UNIX and Linux systems).
Executable files are available for:- Linux (Intel) workstations.
- SUN Sparc workstations.
- Microsoft Windows (requires additional user supplied software that implements an X Windows server, such as Cygwin/X).
If you have a platform different from that above, we can work with you to get C-SIDE compiled on that platform.
- If we can not find a similar computer platform in-house that we can access to do the work, then we can send you the source code and help you compile and test the software on your equipment.
- You would not have a license for the source code. We would require you to remove it from your system once the program has been successfully compiled.
- To recompile C-SIDE for an environment different from those listed above, you
will need:
- ANSI compliant C compiler.
- X Window System, Version 11 Release 5 (X11R5) or later, including files needed to compile a program, such as libraries and header files for Xlib, the X Toolkit Intrinsics, and the Athena widget set.
For Microsoft Windows platforms, you can test whether your chosen X Windows server software will work with C-SIDE by requesting a free sample version of C-SIDE, called pre-SIDE. Only the preliminary data adjustments step of the ISU method is implemented in the pre-SIDE version of the software.
What about your data?
The following must be true of your data set:Multiple days of intake data must be available on at least some of the individuals in the sample. For foods, there must be multiple positive intakes. Multiple intakes are required because the method estimates the within-individual variation of intakes. This estimation is not possible when only one observation is available on each individual in the sample.
For food intake data, the number of observations for an individual must be the same for all individuals in the data set. For nutrient data, the number of observations may vary among individuals.
All recorded intakes must be nonnegative. If you have transformed your intake data in any way, verify that the transformed intakes input into the program are also nonnegative.
Data must be in ASCII text files, with one observation per line. A unique identification variable for each individual must be present. The file must be ordered so multiple observations on one individual are grouped together. Variables may appear in any order in the data file.
The observed distribution of nutrient intakes must be smooth. No one intake value should appear with frequency significantly higher than the rest of the value. Food intake data may contain a large number of observed zero intakes, but the observed distribution of positive intakes must be smooth.
For food intakes, the usual intake for days on which the food is consumed is assumed to be independent of the frequency with which the food is consumed.
C-SIDE is also capable of accommodating data sets with the following characteristics:If nutrient intake data were collected on no more than three consecutive days, and if you want to account for the day-to-day correlations, then you must supply an estimate of the correlation. This option is not available for food intakes.
Sampling weights for each individual in the sample may be used. C-SIDE will produce a weighted analysis of your data.
If your intake data were collected on different days of the week, during different seasons, or from individuals of different ages, ethnic groups, or income level, you can partially remove the effect of those variables from your intake data using C-SIDE's adjustment procedure. You may use continuous or qualitative variables for data adjustment.
Data may be organized into subpopulations, such as those defined by ethnic groups, geographic location, or income level. C-SIDE can provide a separate analysis for each subpopulation.
Replicate weight sets may be used to estimate variances of estimated usual intake quantiles and CDF values with a jackknife or balanced repeated replication (BRR) method.
Overview of Methodology
C-SIDE was developed to carry out the statistical method for estimating usual nutrient or food intake distributions. The nutrient method is described in Nusser, Carriquiry, Dodd, and Fuller (1996). The food method is described in Nusser, Fuller and Guenther (1996). This section contains a technical overview of the different steps in the program. The nutrient procedure is described first.
Nutrients
The method for estimating usual nutrient intake distributions consists of several steps. The steps can be grouped according to the major tasks to be accomplished: preliminary data adjustments, semiparametric transformation to normality, estimation of within and between individual variances for intakes, and back transformation into the original scale.
1. Preliminary data adjustments:Preliminary data adjustments include shifting observed intake data by a small amount away from zero, incorporating survey weights by creating an equal weights sample, and correcting for the effect of sample day (first versus all the rest) on the mean and the variance of the distribution of observed intakes.
To shift observed intake data away from zero, the program computes the weighted overall mean intake and adds one ten thousandth of the weighted mean to each observation in the sample. The magnitude of the value added to each observation is small, but it eliminates all zero observations, and avoids potential problems should a log transformation be used.
A regression-based adjustment to account for differences among sample days is applied to shifted observations. Since it is well known that regression estimates have better properties when data are normal, the shifted intake data are power transformed prior to estimating the regression adjustment. The best power transformation for each nutrient is found by using a grid search, and choosing the power that minimizes the departure of the transformed data from normality. A regression model is fit to the shifted power transformed data, where the independent variables in the model include adjustment variables such as day of the week, month of the year, interview sequence, and any other survey related variable that you feel should be accounted for prior to analysis. The regression adjustment is a ratio, rather than the usual linear adjustment. This method guarantees that all adjusted intake values are positive. Usually, the first interview day is the standard, in the sense that observed means for later days are adjusted to be equal to the mean of the first day. Proper reorganization of your data set will allow any other day to be used as the standard, should you so desire. The grand mean can also be used as the standard.
The regression based ratio adjustment described above creates a set of dietary intake data with equal means across survey days. To make the variances across days equal as well, a second adjustment that scales the data such that the variance on each sample day is equal to that of the first sample day (or to a pooled variance if the grand mean is the reference standard) is applied.
The ratio-adjusted, variance-adjusted intake data have common mean and variance across survey days. After applying the inverse power transformation to the adjusted data, we create an "equal weights" sample, as described in Nusser, Carriquiry, Dodd, and Fuller (1996). The data set is now ready for the estimation of the usual intake distribution.
2. Semiparametric transformation to normality:Observed intake data (whether adjusted or not) generally have nonnormal distributions. For nutrients such as vitamins and some micro nutrients, skewness is quite extreme.
Most statistical procedures rely on the assumption of normality. For example, the method that was proposed by the National Academy of Sciences in 1986 for estimating usual intake distributions, assumes that intake data are normal, and proposes that dietary intake data be transformed prior to analysis so that the normality assumption is satisfied. The C-SIDE procedure transforms adjusted dietary intake data into normality as a part of obtaining estimates of usual intake distributions.
The transformation into normality in C-SIDE is done in two steps. In the first step, data are transformed so that their distribution is as close to normal as possible, by using a power transformation similar to the one described earlier. However, power transformed data are not necessarily normal. Thus, a second transformation which takes the power transformed intakes into the normal scale is employed. This second step in the transformation is nonparametric and is based on a grafted polynomial model. The power transformation plus the grafted polynomial function make up the semiparametric transformation into normality.
3. Estimation of within and between individual variances in intakes:C-SIDE uses a measurement error model for observed daily intakes, similar to the model proposed by the NAS. The model states that the observed intake for an individual on any day is equal to that individual's usual intake plus a measurement error. The variance of the usual intakes is the between individual variance. The variance of the measurement errors is the within individual variance, and reflects the day-to-day variation of intakes for an individual.
Estimates for both the within and the between individual variances are obtained under the measurement error model, under the assumption of normality. The variances are used to estimate the distribution of usual intakes in the normal scale.
4. Back-transformation into the original scale:The final step in the methodology is to transform the estimated usual intake distribution from the normal scale into the original scale. More than "undoing" the transformation into normality by applying the inverse transformation is required because the original transformation is nonlinear. The inverse transformation makes an adjustment for the fact that the mean of a nonlinear function of a random variable is not the nonlinear function of the mean. The inverse transformation is based on an approximation to the mean function. This back transformation is called mean transformation, since it brings the distribution of usual intakes (true individual means) back into the original scale.
The program estimates statistics of interest from the estimated usual intake distribution. For example, estimates for the mean and the variance of the usual intake distribution for a nutrient, for a set of percentiles, or for the proportion of the population below (or above) a given threshold are available.
Foods
The food methodology involves three main steps. The first is to obtain an estimate of the distribution of food consumption probabilities (the probability that an individual consumes the food on any given day).
A modified minimum chi-square estimator is used to obtain an estimated probability mass for 51 equally-spaced consumption probability values (0.00, 0.02, 0.04, ..., 0.98, 1.00). A maximum entropy term in the objective function is used to smooth the mass over the positive consumption probabilities. The estimated probability that an individual's consumption probability is zero is an estimate of the fraction of the population that never consumes the food.
In the second step, the method for estimating usual nutrient intake distributions is applied to the positive intakes to estimate a usual intake distribution for consumption days only. This distribution applies only to days on which the food was consumed.
The third step is to estimate the usual intake distribution for all days from the joint distribution of consumption day usual intakes and individual consumption probabilities. The current method assumes that the usual intakes are independent of the frequency with which the food is consumed.
Nusser, S. M., A. L. Carriquiry, K. W. Dodd, and W. A. Fuller. (1996), "A semiparametric transformation approach to estimating usual daily intake distributions," Journal of the American Statistical Association, December, 1996.
Nusser, S. M., W. A. Fuller, and P. G. Guenther. (1996), "Estimating usual dietary intake distributions: Adjusting for measurement error and nonnormality in 24-hour food intake data," in L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N. Schwarz, and D. Trewin. (eds.), Survey Measurement and Process Quality. New York: Wiley (in press)