mardi 14 avril 2020

Study of Adolescent behavior using ANNOVA

Introduction



The National Longitudinal Study of Adolescent Health (AddHealth) is a representative school-based survey of adolescents in grades 7-12 in the United States. The Wave 1 survey focuses on factors that may influence adolescents’ health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities. 
source: Data Analysis Tools par Université Wesleyenne 

NB. This study is under Coursera training (Outils d'analyse des données) using python

Data frame
AID  IMONTH  IDAY  IYEAR  .... H1GI3 

.....HIDA2........


=> For influence adolescents’ risk behaviors we will use for our expérience
  • H1GI3 (General Introductory): How old was the adolescent when he moved to his current residence. It's a quantitative value =>response (1, 2, 3 ... years)
  • H1DA2 (Daily Activities): how many times did adolescent do hobbies. It's a categorical value => response ( not at all, 1 or 2 times, more, refused, don’t know...)
NB. you can also try to use other daily activities in connection with proposed general information e.g:
"Daily Activities", H1DA5: how many times did adolescent play an active sport.

=> The hypothesis are the followings:




  • H0: stability residence, not influence hobbies doing and so a normal behavior
  • H1: non-stability residence and certainly not being at the same since birth could have bad influence on habit e.g. hobbies and so the risk of behavior.

Python code


# -*- coding: utf-8 -*-
"""
Created on Tue Apr 14 13:42:31 2020
@author: Aymen ABID
"""
import numpy
import pandas
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi #for Turkey's test 
mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False) 
#True=>reduce memory use for non robust pc
#setting variables you will be working with to numeric
mydata['H1DA2'] = mydata['H1DA2'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
#SETTING MISSING DATA
mydata['H1GI3']=mydata['H1GI3'].replace(200, numpy.nan)
mydata['H1DA2']=mydata['H1DA2'].replace(200, numpy.nan)
# using ols function for calculating the F-statistic and associated p value
model1 = smf.ols(formula='H1GI3 ~ C(H1DA2)', data=mydata)
results1 = model1.fit()
print (results1.summary())
#Turkey's Post hoc test
sub = mydata[['H1GI3', 'H1DA2']].dropna()
mc1 = multi.MultiComparison(sub['H1GI3'], sub['H1DA2'])
res1 = mc1.tukeyhsd()
print(res1.summary())

Results

                            OLS Regression Results
=============================================================
Dep. Variable: H1GI3 R-squared: 0.062
Model: OLS Adj. R-squared: 0.061
Method: Least Squares F-statistic: 85.40
Date: Tue, 14 Apr 2020 Prob (F-statistic): 3.41e-87
Time: 15:54:38 Log-Likelihood: -24043.
No. Observations: 6503 AIC: 4.810e+04
Df Residuals: 6497 BIC: 4.814e+04
Df Model: 5
Covariance Type: nonrobust
=============================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
Intercept 10.1272 0.260 39.016 0.000 9.618 10.636
C(H1DA2)[T.1] -0.9474 0.334 -2.838 0.005 -1.602 -0.293
C(H1DA2)[T.2] -1.2544 0.366 -3.432 0.001 -1.971 -0.538
C(H1DA2)[T.3] -1.4301 0.363 -3.939 0.000 -2.142 -0.718
C(H1DA2)[T.6] 85.8728 6.909 12.429 0.000 72.329 99.417
C(H1DA2)[T.8] 68.2728 4.374 15.608 0.000 59.698 76.848
=============================================================
Omnibus: 7136.772 Durbin-Watson: 1.897
Prob(Omnibus): 0.000 Jarque-Bera (JB): 714073.666
Skew: 5.548 Prob(JB): 0.00
Kurtosis: 53.122 Cond. No. 63.1
=============================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Multiple Comparison of Means - Tukey HSD, FWER=0.05
======================================================
group1 group2 meandiff p-adj lower upper reject
------------------------------------------------------
0 1 -0.9474 0.0518 -1.899 0.0043 False
0 2 -1.2544 0.0079 -2.2964 -0.2124 True
0 3 -1.4301 0.0012 -2.4651 -0.3951 True
0 6 85.8728 0.001 66.1784 105.5672 True
0 8 68.2728 0.001 55.8038 80.7418 True
1 2 -0.307 0.9 -1.2538 0.6398 False
1 3 -0.4828 0.6627 -1.4218 0.4563 False
1 6 86.8202 0.001 67.1306 106.5097 True
1 8 69.2202 0.001 56.7588 81.6816 True
2 3 -0.1757 0.9 -1.2063 0.8548 False
2 6 87.1272 0.001 67.433 106.8213 True
2 8 69.5272 0.001 57.0585 81.9958 True
3 6 87.3029 0.001 67.6091 106.9967 True
3 8 69.7029 0.001 57.2349 82.171 True
6 8 -17.6 0.2595 -40.8863 5.6863 False
------------------------------------------------------



Analysis 

Interpretation for ANOVA

When examining the association between the current residence (quantitative response) and the number of times doing hobbies for last week (categorical explanatory) relevant by the questionnaire, an ANalysis Of VAriance (ANOVA) revealed that among daily, adolescents those with continuous residence independence doing hobbies recently (last week).
In fact, this is interpenetrated on the calculated F-statistic is 85.40, and the p or probability value for this F-statistic is 3.41e-87, which is deep under the p=0.05.

Then,
p have been less than .05, I can assume that the individuals with major stability do hobbies recently more than individuals with less period in the current residence and certainly form birth.

Turkey's Post hoc test

Identifying the comparisons is within rejected the null hypothesis, which is in which the reject column result is true.
So we can see that compared date have group significantly different if this we have rejection case via this post hoc test
Then,
many rejection case confirm the rejection of H0 and affirm the influence of instability residence about behavior.


Aucun commentaire:

Enregistrer un commentaire