Introduction
As said in link,
The National Longitudinal Study of Adolescent Health (AddHealth) is a representative school-based survey of adolescents in grades 7-12 in the United States. The Wave 1 survey focuses on factors that may influence adolescents’ health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities.
source: Data Analysis Tools par Université Wesleyenne
NB. This study is under Coursera training (Outils d'analyse des données) using pythonData
H1GI2 (General Introductory) :
The question was:Think about the house or apartment building in which you lived in January 1990, when you were {AGE IN JANUARY 1990} years old. Do you still live there?
| Frequency | Code | Response |
| 3046 | 0 | no |
| 3447 | 1 | yes |
| 3 | 6 | refused |
| 8 | 8 | don't know |
=> 4 categories
H1DA2 (Daily Activities):
The question was:
How many times did adolescents do hobbies? It's a categorical value => response ( not at all, 1 or 2 times, more, refused, don’t know...)
| Frequency | Code | Response |
| 1416 | 0 | not at all |
| 2163 | 1 | 1 or 2 times |
| 1439 | 2 | 3 or 4 times |
| 1479 | 3 | 5 or more times |
| 2 | 6 | refused |
| 5 | 8 | don't know |
Objective:
Study of the impact & relationship between thinking about the home of childhood and daily activities such hobbies.code:
# -*- coding: utf-8 -*-"""
Created on Sat Apr 18 13:11:16 2020
@author: PC HP
"""
import pandas
import numpy
import scipy.stats
import seaborn
import matplotlib.pyplot as plt
mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False)
mydata['H1GI2'] = mydata['H1GI2'].apply(pandas.to_numeric, errors='coerce')
mydata['H1DA2'] = mydata['H1DA2'].apply(pandas.to_numeric, errors='coerce')
#mydata['H1GI2'] = pandas.to_numeric(mydata['H1GI2'], errors='coerce')
#mydata['H1DA2 '] = pandas.to_numeric(mydata['H1DA2 '], errors='coerce')
#SETTING MISSING DATA
mydata['H1GI2']=mydata['H1GI2'].replace(200, numpy.nan)
mydata['H1DA2']=mydata['H1DA2'].replace(200, numpy.nan)
# contingency table of observed counts
print('observed counts table:')
ct1=pandas.crosstab(mydata['H1GI2'], mydata['H1DA2'])
print (ct1)
# column percentages
colsum=ct1.sum(axis=0)
colpct=ct1/colsum
print('chi percentages table:')
print(colpct)
# chi-square
print ('chi-square value, p value, expected counts')
cs1= scipy.stats.chi2_contingency(ct1)
print (cs1)
# set variable types
mydata["H1DA2"] = mydata["H1DA2"].astype('category')
# new code for setting variables to numeric:
mydata['H1GI2'] = pandas.to_numeric(mydata['H1GI2'], errors='coerce')
# graph
seaborn.catplot(x="H1DA2", y="H1GI2", data=mydata, kind="bar", ci=None)
plt.xlabel('Times did adolescents do hobbies last week')
plt.ylabel('Proportion residence leaving in childhood Dependent')
recode = {0: 0}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {0: 0,1:1}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {0: 0,1:1,2:2}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {0: 0,1:1,2:2,3:3}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {0: 0,1:1,2:2,3:3,6:6}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {0: 0,1:1,2:2,3:3,8:8}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
#next sub groups
recode = {6:6,8:8}
mydata['subH1DA2']= mydata['H1DA2'].map(recode)
# contingency table of observed counts
ct2=pandas.crosstab(mydata['H1GI2'], mydata['subH1DA2'])
print (ct2)
# column percentages
colsum=ct2.sum(axis=0)
colpct=ct2/colsum
print(colpct)
print ('chi-square value, p value, expected counts')
cs2= scipy.stats.chi2_contingency(ct2)
print (cs2)
Results:
According to the figure the S1 = {x = [0..3]} are in relation and for S2 = {x = [6.8]} with a low probability. But the two subgroups are not dependant. This is explained also by the following:
- S1:
subH1DA2 0.0 1.0 2.0 3.0
H1GI2
0 0.496469 0.478502
0.451703 0.444219
1 0.500706
0.521036 0.548297 0.555105
6 0.000706 0.000000
0.000000 0.000000
8 0.002119 0.000462
0.000000 0.000676
chi-square value, p
value, expected counts
(19.160401671234695,
0.023863055784254898, 9, array([[6.63647837e+02, 1.01375019e+03,
6.74427428e+02, 6.93174542e+02],
[7.51044482e+02, 1.14725227e+03,
7.63243651e+02, 7.84459597e+02],
[2.17946745e-01, 3.32922887e-01,
2.21486840e-01, 2.27643528e-01],
[1.08973372e+00, 1.66461444e+00,
1.10743420e+00, 1.13821764e+00]]))
|
- S2:
subH1DA2 6.0
8.0
H1GI2
0 0 1
1 0 1
6 2 0
8 0 3
subH1DA2 6.0
8.0
H1GI2
0 0.0
0.2
1 0.0
0.2
6 1.0
0.0
8 0.0
0.6
chi-square value, p
value, expected counts
(7.0,
0.07189777249646509, 3, array([[0.28571429, 0.71428571],
[0.28571429, 0.71428571],
[0.57142857, 1.42857143],
[0.85714286, 2.14285714]]))
|
More details are available in the following results (including post hoc analysis):
observed counts
table:
H1DA2 0
1 2 3
6 8
H1GI2
0 703
1035 650 657
0 1
1 709
1127 789 821
0 1
6 1
0 0 0
2 0
8 3
1 0 1
0 3
chi percentages
table:
H1DA2 0 1 2 3
6 8
H1GI2
0 0.496469 0.478502
0.451703 0.444219 0.0
0.2
1 0.500706 0.521036
0.548297 0.555105 0.0
0.2
6 0.000706 0.000000
0.000000 0.000000 1.0
0.0
8 0.002119 0.000462
0.000000 0.000676 0.0
0.6
chi-square value, p
value, expected counts
(5810.662666395812,
0.0, 15, array([[6.63151292e+02, 1.01299170e+03, 6.73922817e+02,
6.92655904e+02,
9.36654367e-01, 2.34163592e+00],
[7.50453875e+02,
1.14635009e+03, 7.62643450e+02, 7.83842712e+02,
1.05996310e+00, 2.64990775e+00],
[6.53136531e-01,
9.97693727e-01, 6.63745387e-01, 6.82195572e-01,
9.22509225e-04,
2.30627306e-03],
[1.74169742e+00,
2.66051661e+00, 1.76998770e+00, 1.81918819e+00,
2.46002460e-03,
6.15006150e-03]]))
subH1DA2 0.0
H1GI2
0 703
1 709
6 1
8 3
subH1DA2 0.0
H1GI2
0 0.496469
1 0.500706
6 0.000706
8 0.002119
chi-square value, p
value, expected counts
(0.0, 1.0, 0,
array([[703.],
[709.],
[
1.],
[
3.]]))
subH1DA2 0.0
1.0
H1GI2
0 703
1035
1 709
1127
6 1 0
8 3 1
subH1DA2 0.0 1.0
H1GI2
0 0.496469 0.478502
1 0.500706 0.521036
6 0.000706 0.000000
8 0.002119 0.000462
chi-square value, p
value, expected counts
(4.886483669772731,
0.18030059764588033, 3, array([[6.87624476e+02, 1.05037552e+03],
[7.26397318e+02, 1.10960268e+03],
[3.95641241e-01, 6.04358759e-01],
[1.58256496e+00, 2.41743504e+00]]))
subH1DA2 0.0
1.0 2.0
H1GI2
0 703
1035 650
1 709
1127 789
6 1 0
0
8 3 1
0
subH1DA2 0.0 1.0
2.0
H1GI2
0 0.496469 0.478502
0.451703
1 0.500706 0.521036
0.548297
6 0.000706 0.000000
0.000000
8 0.002119 0.000462
0.000000
chi-square value, p
value, expected counts
(13.279010760752726,
0.03881309634250806, 6, array([[6.73855719e+02, 1.02934316e+03,
6.84801116e+02],
[7.40733360e+02, 1.13150159e+03,
7.52765046e+02],
[2.82184137e-01, 4.31048226e-01,
2.86767637e-01],
[1.12873655e+00, 1.72419291e+00, 1.14707055e+00]]))
subH1DA2 0.0
1.0 2.0 3.0
H1GI2
0 703
1035 650 657
1 709
1127 789 821
6 1 0
0 0
8 3 1
0 1
subH1DA2 0.0 1.0 2.0 3.0
H1GI2
0 0.496469 0.478502
0.451703 0.444219
1 0.500706 0.521036
0.548297 0.555105
6 0.000706 0.000000
0.000000 0.000000
8 0.002119 0.000462
0.000000 0.000676
chi-square value, p
value, expected counts
(19.160401671234695,
0.023863055784254898, 9, array([[6.63647837e+02, 1.01375019e+03,
6.74427428e+02, 6.93174542e+02],
[7.51044482e+02, 1.14725227e+03,
7.63243651e+02, 7.84459597e+02],
[2.17946745e-01, 3.32922887e-01, 2.21486840e-01,
2.27643528e-01],
[1.08973372e+00, 1.66461444e+00,
1.10743420e+00, 1.13821764e+00]]))
subH1DA2 0.0
1.0 2.0 3.0
6.0
H1GI2
0 703
1035 650 657
0
1 709
1127 789 821
0
6 1 0
0 0 2
8 3 1
0 1 0
subH1DA2 0.0 1.0 2.0 3.0
6.0
H1GI2
0 0.496469 0.478502
0.451703 0.444219 0.0
1 0.500706 0.521036
0.548297 0.555105 0.0
6 0.000706 0.000000
0.000000 0.000000 1.0
8 0.002119 0.000462
0.000000 0.000676 0.0
chi-square value, p
value, expected counts
(4348.773173724676,
0.0, 12, array([[6.63443607e+02, 1.01343822e+03, 6.74219880e+02,
6.92961225e+02,
9.37067241e-01],
[7.50813356e+02,
1.14689922e+03, 7.63008771e+02, 7.84218187e+02,
1.06047084e+00],
[6.53639021e-01,
9.98461302e-01, 6.64256039e-01, 6.82720419e-01,
9.23218957e-04],
[1.08939837e+00, 1.66410217e+00,
1.10709340e+00, 1.13786736e+00,
1.53869826e-03]]))
subH1DA2 0.0
1.0 2.0 3.0
8.0
H1GI2
0 703
1035 650 657
1
1 709
1127 789 821
1
6 1 0
0 0 0
8 3 1
0 1 3
subH1DA2 0.0 1.0 2.0 3.0
8.0
H1GI2
0 0.496469 0.478502
0.451703 0.444219 0.2
1 0.500706 0.521036
0.548297 0.555105 0.2
6 0.000706 0.000000
0.000000 0.000000 0.0
8 0.002119 0.000462
0.000000 0.000676 0.6
chi-square value, p
value, expected counts
(1477.2704083643332,
3.0249713664075e-309, 12, array([[6.63355275e+02, 1.01330329e+03,
6.74130114e+02, 6.92868963e+02,
2.34235620e+00],
[7.50684712e+02,
1.14670271e+03, 7.62878038e+02, 7.84083820e+02,
2.65072285e+00],
[2.17779145e-01,
3.32666872e-01, 2.21316518e-01, 2.27468471e-01,
7.68994156e-04],
[1.74223316e+00, 2.66133497e+00,
1.77053214e+00, 1.81974777e+00,
6.15195325e-03]]))
subH1DA2 6.0
8.0
H1GI2
0 0 1
1 0 1
6 2 0
8 0 3
subH1DA2 6.0
8.0
H1GI2
0 0.0
0.2
1 0.0
0.2
6 1.0
0.0
8 0.0
0.6
chi-square value, p
value, expected counts
(7.0,
0.07189777249646509, 3, array([[0.28571429, 0.71428571],
[0.28571429, 0.71428571],
[0.57142857, 1.42857143],
[0.85714286, 2.14285714]]))
|
So, we conclude the rejection of H0 with great independence between thinking about the home of childhood and daily activities such as hobbies. Also, according to post hoc comparison, it exists great independence between response subgroups; who would like to give response x= [0ènot at all..., 3è5 or more time] or who don't give (refused, don' know) according to their recent hobbies doing.

Aucun commentaire:
Enregistrer un commentaire