Edward L. Korn and Barry I. Graubard.
Scatterplots with Survey Data.
In The American Statistician, vol. 52, no. 1, pp. 58--69, 1998.



We suggest various modifications to make scatterplots more informative when used with data obtained from a sample survey. Aspects of survey data leading to the plot mod- ifications include the sample weights associated with the observations, imputed data for item nonresponse, and large sample sizes. Examples are given using data from the 1988 National Maternal Infant and Health Survey, the second Na- tional Health and Nutrition Examination Survey, and the epidemiologic follow-up of the first National Health and Nutrition Examination Survey.


@Article{        korn:1998:SPSD,
  author = 	 {Edward L. Korn and Barry I. Graubard},
  title = 	 {Scatterplots with Survey Data},
  journal = 	 {The American Statistician},
  year = 	 {1998},
  volume = 	 {52},
  number = 	 {1},
  pages = 	 {58--69},



Atkinson, A.C. (1985), Plots, Transformations, and Regression, Oxford: Clarendon Press.
Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. (1983),
Graphical Methods for Data Analysis, Belmont, CA: Wadsworth International Group.
Cleveland, W. S. (1979), "Robust Locally Weighted Regression and Smoothing Scatterplots," Journal of the American Statistical Association, 74, 829-836.
Cleveland, W. S., and McGill, R. (1984), "The Many Faces of a Scatterplot," Journal of the American Statistical Association, 79, 807-822.
Cook, R. D., and Weisberg, S. (1994), An Introduction to Regression Graphics. New York: Wiley.
Guo, S., Roche, A. F., Baumgartner, R. N., Chumlea, W. C., and Ryan, A. S. (1990), "Kernel Regression for Smoothing Percentile Curves: Reference Data for Calf and Subscapular Skinfold Thicknesses in Mexican Americans," American Journal of Clinical Nutrition, 51, 908S-916S.
Hardle, W. (1990), Applied Nonparametric Regression. Cambridge, MA: Cambridge University Press.
Hinkins, S., Oh, H. L., and Scheuren, F. (1994), "Inverse Sampling Design Algorithms," in 1994 Proceedings of the Section on Survey Research Methods, Alexandria, VA: American Statistical Association, pp 626-631.
Korn, E. L., and Graubard, B. I. (1995), "Analysis of Large Health Surveys: Accounting for the Sample Design," Journal of the Royal Statistical Society, Ser. A., 158, 263-295.
Korn, E. L., Midthune, D., and Graubard, B. I. (1997), "Estimating Interpolated Percentiles from Grouped Data with Large Samples," Journal of Official Statistics, in press.
Little, R. J. A., and Rubin, D B. (1987), Statistical Analysis with Missing Data, New York: Wiley.
McDowell, A., Engel, A., Massey, J. T., and Maurer, K. (1981), "Plan and Operation of the Second National Health and Nutrition Examination Survey, 1976-80," Vital and Health Statistics, Series 11, No. 15, Washington, DC: National Center for Health Statistics.
Murthy, M. M., and Sethi, V. K. (1965), "Self-Weighting Design at Tabulation Stage," Sankhya, Series B, 27, 201-210.
National Center for Health Statistics (1976), "NCHS Growth Charts, 1976," in Monthly Vital Statistics Report, vol. 25, no. 3, Suppl. (HRA) 76-1120. Rockville, MD: Health Resources Administration.
National Center for Health Statistics, Annest, J. L., and Mahaffey, K. (1984), "Blood Lead Levels for Persons Ages 6 months-74 years, United States, 1976-80," in Vital and Health Statistics, Series 11, No. 223 (DHHS pub. no. PHS 84-1683), Washington, DC: U.S. Government Printing Office.
National Center for Health Statistics, Cohen, B. B., Barbano, H. E., Cox, C. S., et al. (1987), "Plan and Operation of NHANES I Epidemiologic Followup Study, 1982-84," in Vital and Health Statistics, Series 1, No. 22 (DHHS pub. no. PHS 87-1324). Washington, DC: U.S. Government Printing Office.
O'Hara Hines, R. J., and Carter, E. M. (1993), "Improved Added Variable and Partial Residual Plots for the Detection of Influential Observations in Generalized Linear Models" (with discussion), Applied Statistics, 42, 3-20.
Owen, A. B. (1987), Nonparainetric Conditional Estimation, Technical Report No. 265, Stanford University, Dept. of Statistics.
Pirkle, J. L., Schwartz, J., Landis, J. R., and Harlan, W. R. (1985), "The Relationship Between Blood Lead Levels and Blood Pressure and Its Cardiovascular Risk Implications," American Journal of Epidemiology, 121, 246-258.
Sanderson, M., Placek, P. J., and Keppel, K. G. (1991), "The 1988 National Maternal and Infant Health Survey: Design, Content, and Data Availability," Birth, 18, 26-32.
SAS (1990), SAS/GRAPH Software: Reference, Version 6, First Edition, Volume 1, Cary, NC: SAS Institute, Inc.
Shah, B. V., Barnwell, B. G., and Bieler, G. S. (1995), SUDAAN User's Manual, Research Triangle Park, NC: Research Triangle Institute.
Stevens, R. G., Jones, D. Y., Micozzi, M. S., and Taylor, P. R. (1988), "Body Iron Stores and the Risk of Cancer," New England Journal of Medicine, 319, 1047-1052.
Stone, C. J. (1977), "Consistent Nonparametric Regression" (with discussion), Annals of Statistics, 5, 595-645.
Wang, X., Zuckerman, B., Coffman, G.A., and Corwin, M. J. (1995), "Familial Aggregation of Low Birth Weight Among Whites and Blacks in the United States," New England Journal of Medicine, 333, 1744-1749.
Woodruff, R. S. (1952), "Confidence Intervals for Medians and Other Position Measures," Journal of the American Statistical Association, 47, 635-646.