Data Analysis Project Fall 2024 This data project is aimed at providing
Data Analysis Project
Fall 2024
This data project is aimed at providing you with an opportunity to obtain experience applying SAS programming skills learned in this course to a dataset. You will work individually on a project where you will find NHANES datasets of interest, read them into SAS, perform recodes as necessary, and implement appropriate summary and graphical methods for presenting descriptive statistics that summarize the distribution of the variables you have chosen. The particular SAS skills and programming you will perform will largely depend upon the dataset and your research interest. Submission of a written report summarizing your work is expected.
The following is a step-by-step outline to starting and completing the project
Step 1: Explore the NHANES 2017-2018 documentation and answer the following:
Provide the name of the study that generated the data, the years covered, and the geographic coverage of the data.
Who owns the data?
How data was collected (self-administered, web-based, telephone interview)
Target population: what population was sampled from and expected to be represented
Sampling: how were respondents selected? Simple random sample, stratified, cluster sampling, etc.
https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&Cycle=2017-2018
Step 2: Selection of Files
You must use the demographic file
You can pick 2 or more of the following files:
blood pressure (exam) body measure (exam) cholesterol (lab)
alcohol use(ques) blood pressure&cholesterol(ques) diabetes (ques)
cardiovascular health(ques) current health status (ques) drug use(ques)
mental health-depression screener(ques) physical activity(ques)
income(ques) smoking- cig use (ques) reproductive health(ques)
Step 3: Generate a Data analysis Plan that includes the following components:
Aims: The research questions need to be clearly defined
Methods: The methods section is the main component of the data analysis plan. The methods should include details on:
Data sources (answers from step 1 and 2)
Study population: include a definition and outline the inclusion/exclusion criteria for your specific research question
Study measures: detail definitions and derivations (including categorization used, if any) of study measures including:
Main exposure variables
Outcome variables
Demographic variables
Sequence of planned analyses including:
Data management steps
Planned analysis to show the distributions of each variable both visually and numerically; procedures to be used
Planned graphs to visualize the distribution of each variable involved in your primary research question; procedures to be used
Planned numerical summaries to investigate the research question of interest; procedures to be used
Planned graph to visualize at least one of the relationships investigated in your research question; procedures to be used
Step 4: Implement Analysis Plan
The SAS program should include:
Comments and annotations throughout
Libname
infile/input statements or proc copy if you had to read the dataset from something other than a SAS file
formats
steps to properly combine the files
recoding of at least two variables using if/then or select/when
at least one SAS functions
at least two graph procedures
at least two procedures (proc univariate, proc freq, proc means , etc)
Step 5: Provide Written Summary
The report will consist of the following sections:
Introduction (1 paragraph): describing the specific research question(s) of interest
Data Source and Population(1-2 paragraphs):
Analysis Results
Summary (3-4 paragraphs)
Describe and highlight the major findings from you analysis. Refer to tables and figures as appropriate.
Describe each variable by itself. What are the distributions of your continuous data, do they look normally distributed? What is the distribution of the categorical variables? Are there categories with little or no observations?
Describe the main finding from the research question of interest. Provide an interpretation for this finding.
What to Submit:
Data Analysis Plan (Due Sunday, Nov 3 @ 11:59 pm) : The purpose of this intermediate assignment is to keep you on track and to receive feedback on the appropriateness of your chosen dataset and plans for this project. Submit as a MS Word document no more than 2 pages double spaced. (20 points)
Final Assignment Submission (Due Wednesday, Dec 11 @ 11:59 pm ). You will submit three files:
Final Written Report (30 points)
SAS files (50 points)
Final SAS program filename
Log from the Final SAS Program filename
