Center for Survey
Statistics and Methodology

C-SIDE

Software for Intake Distribution Estimation (C-SIDE) can help you obtain estimates of usual nutrient and food intake distributions, their moments, and their percentiles. The software uses methods developed at Iowa State University and described in Nusser, Carriquiry, Dodd, and Fuller (1996) and Nusser, Fuller, and Guenther (1996).

If you have nutrient intake data (consumed almost daily), with C-SIDE you can: If you have food intake data (not consumed daily), with C-SIDE you can:

What kind of output can you get?

C-SIDE will produce:
  1. Diagnostics for the different steps in the method for estimating usual intake distributions.
  2. Moments of the estimated usual intake distributions.
  3. Estimated percentile values and the proportion of individuals in the sample with observed intake above and below given thresholds.
Example output reports: C-SIDE can also produce ASCII text data files containing:
  1. Adjusted daily intake data in original units.
  2. Normal scale daily intakes, power transformed daily intakes, and normal scores from adjusted daily intakes.
  3. A set of representative usual intakes generated from the estimated usual intake distribution.
  4. Percentiles and estimated density function values for the distribution of usual intakes.
  5. Estimated usual intakes for each individual in the data set.

What equipment and software will you need?

C-SIDE (Version 1.0) is designed to work on a computer running an X Windows System graphical window manager (commonly used on UNIX and Linux systems).

Executable files are available for:

If you have a platform different from that above, we can work with you to get C-SIDE compiled on that platform.

For Microsoft Windows platforms, you can test whether your chosen X Windows server software will work with C-SIDE by requesting a free sample version of C-SIDE, called pre-SIDE. Only the preliminary data adjustments step of the ISU method is implemented in the pre-SIDE version of the software.

What about your data?

The following must be true of your data set:

Multiple days of intake data must be available on at least some of the individuals in the sample. For foods, there must be multiple positive intakes. Multiple intakes are required because the method estimates the within-individual variation of intakes. This estimation is not possible when only one observation is available on each individual in the sample.

For food intake data, the number of observations for an individual must be the same for all individuals in the data set. For nutrient data, the number of observations may vary among individuals.

All recorded intakes must be nonnegative. If you have transformed your intake data in any way, verify that the transformed intakes input into the program are also nonnegative.

Data must be in ASCII text files, with one observation per line. A unique identification variable for each individual must be present. The file must be ordered so multiple observations on one individual are grouped together. Variables may appear in any order in the data file.

The observed distribution of nutrient intakes must be smooth. No one intake value should appear with frequency significantly higher than the rest of the value. Food intake data may contain a large number of observed zero intakes, but the observed distribution of positive intakes must be smooth.

For food intakes, the usual intake for days on which the food is consumed is assumed to be independent of the frequency with which the food is consumed.

C-SIDE is also capable of accommodating data sets with the following characteristics:

If nutrient intake data were collected on no more than three consecutive days, and if you want to account for the day-to-day correlations, then you must supply an estimate of the correlation. This option is not available for food intakes.

Sampling weights for each individual in the sample may be used. C-SIDE will produce a weighted analysis of your data.

If your intake data were collected on different days of the week, during different seasons, or from individuals of different ages, ethnic groups, or income level, you can partially remove the effect of those variables from your intake data using C-SIDE's adjustment procedure. You may use continuous or qualitative variables for data adjustment.

Data may be organized into subpopulations, such as those defined by ethnic groups, geographic location, or income level. C-SIDE can provide a separate analysis for each subpopulation.

Replicate weight sets may be used to estimate variances of estimated usual intake quantiles and CDF values with a jackknife or balanced repeated replication (BRR) method.


Overview of Methodology

C-SIDE was developed to carry out the statistical method for estimating usual nutrient or food intake distributions. The nutrient method is described in Nusser, Carriquiry, Dodd, and Fuller (1996). The food method is described in Nusser, Fuller and Guenther (1996). This section contains a technical overview of the different steps in the program. The nutrient procedure is described first.

Nutrients

The method for estimating usual nutrient intake distributions consists of several steps. The steps can be grouped according to the major tasks to be accomplished: preliminary data adjustments, semiparametric transformation to normality, estimation of within and between individual variances for intakes, and back transformation into the original scale.

1. Preliminary data adjustments:

Preliminary data adjustments include shifting observed intake data by a small amount away from zero, incorporating survey weights by creating an equal weights sample, and correcting for the effect of sample day (first versus all the rest) on the mean and the variance of the distribution of observed intakes.

To shift observed intake data away from zero, the program computes the weighted overall mean intake and adds one ten thousandth of the weighted mean to each observation in the sample. The magnitude of the value added to each observation is small, but it eliminates all zero observations, and avoids potential problems should a log transformation be used.

A regression-based adjustment to account for differences among sample days is applied to shifted observations. Since it is well known that regression estimates have better properties when data are normal, the shifted intake data are power transformed prior to estimating the regression adjustment. The best power transformation for each nutrient is found by using a grid search, and choosing the power that minimizes the departure of the transformed data from normality. A regression model is fit to the shifted power transformed data, where the independent variables in the model include adjustment variables such as day of the week, month of the year, interview sequence, and any other survey related variable that you feel should be accounted for prior to analysis. The regression adjustment is a ratio, rather than the usual linear adjustment. This method guarantees that all adjusted intake values are positive. Usually, the first interview day is the standard, in the sense that observed means for later days are adjusted to be equal to the mean of the first day. Proper reorganization of your data set will allow any other day to be used as the standard, should you so desire. The grand mean can also be used as the standard.

The regression based ratio adjustment described above creates a set of dietary intake data with equal means across survey days. To make the variances across days equal as well, a second adjustment that scales the data such that the variance on each sample day is equal to that of the first sample day (or to a pooled variance if the grand mean is the reference standard) is applied.

The ratio-adjusted, variance-adjusted intake data have common mean and variance across survey days. After applying the inverse power transformation to the adjusted data, we create an "equal weights" sample, as described in Nusser, Carriquiry, Dodd, and Fuller (1996). The data set is now ready for the estimation of the usual intake distribution.

2. Semiparametric transformation to normality:

Observed intake data (whether adjusted or not) generally have nonnormal distributions. For nutrients such as vitamins and some micro nutrients, skewness is quite extreme.

Most statistical procedures rely on the assumption of normality. For example, the method that was proposed by the National Academy of Sciences in 1986 for estimating usual intake distributions, assumes that intake data are normal, and proposes that dietary intake data be transformed prior to analysis so that the normality assumption is satisfied. The C-SIDE procedure transforms adjusted dietary intake data into normality as a part of obtaining estimates of usual intake distributions.

The transformation into normality in C-SIDE is done in two steps. In the first step, data are transformed so that their distribution is as close to normal as possible, by using a power transformation similar to the one described earlier. However, power transformed data are not necessarily normal. Thus, a second transformation which takes the power transformed intakes into the normal scale is employed. This second step in the transformation is nonparametric and is based on a grafted polynomial model. The power transformation plus the grafted polynomial function make up the semiparametric transformation into normality.

3. Estimation of within and between individual variances in intakes:

C-SIDE uses a measurement error model for observed daily intakes, similar to the model proposed by the NAS. The model states that the observed intake for an individual on any day is equal to that individual's usual intake plus a measurement error. The variance of the usual intakes is the between individual variance. The variance of the measurement errors is the within individual variance, and reflects the day-to-day variation of intakes for an individual.

Estimates for both the within and the between individual variances are obtained under the measurement error model, under the assumption of normality. The variances are used to estimate the distribution of usual intakes in the normal scale.

4. Back-transformation into the original scale:

The final step in the methodology is to transform the estimated usual intake distribution from the normal scale into the original scale. More than "undoing" the transformation into normality by applying the inverse transformation is required because the original transformation is nonlinear. The inverse transformation makes an adjustment for the fact that the mean of a nonlinear function of a random variable is not the nonlinear function of the mean. The inverse transformation is based on an approximation to the mean function. This back transformation is called mean transformation, since it brings the distribution of usual intakes (true individual means) back into the original scale.

The program estimates statistics of interest from the estimated usual intake distribution. For example, estimates for the mean and the variance of the usual intake distribution for a nutrient, for a set of percentiles, or for the proportion of the population below (or above) a given threshold are available.

Foods

The food methodology involves three main steps. The first is to obtain an estimate of the distribution of food consumption probabilities (the probability that an individual consumes the food on any given day).

A modified minimum chi-square estimator is used to obtain an estimated probability mass for 51 equally-spaced consumption probability values (0.00, 0.02, 0.04, ..., 0.98, 1.00). A maximum entropy term in the objective function is used to smooth the mass over the positive consumption probabilities. The estimated probability that an individual's consumption probability is zero is an estimate of the fraction of the population that never consumes the food.

In the second step, the method for estimating usual nutrient intake distributions is applied to the positive intakes to estimate a usual intake distribution for consumption days only. This distribution applies only to days on which the food was consumed.

The third step is to estimate the usual intake distribution for all days from the joint distribution of consumption day usual intakes and individual consumption probabilities. The current method assumes that the usual intakes are independent of the frequency with which the food is consumed.


Nusser, S. M., A. L. Carriquiry, K. W. Dodd, and W. A. Fuller. (1996), "A semiparametric transformation approach to estimating usual daily intake distributions," Journal of the American Statistical Association, December, 1996.

Nusser, S. M., W. A. Fuller, and P. G. Guenther. (1996), "Estimating usual dietary intake distributions: Adjusting for measurement error and nonnormality in 24-hour food intake data," in L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N. Schwarz, and D. Trewin. (eds.), Survey Measurement and Process Quality. New York: Wiley (in press)

Iowa State University