Introduction
As said in the link,
The National Longitudinal Study of Adolescent Health (AddHealth) is a representative school-based survey of adolescents in grades 7-12 in the United States. The Wave 1 survey focuses on factors that may influence adolescents’ health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities.
source: Data Analysis Tools par Université Wesleyenne
NB. This study is under Coursera training (Outils d'analyse des données) using pythonData
- H1GI3 (General Introductory)How old was the adolescent when he moved to his current residence?
It's a quantitative value =>response (1, 2, 3 ... years) - H1GH45 (General Health)How many people do you know who have AIDS?
it's a quantitive value => response: range 0 to 98 people
- H1DA2 (Daily Activities)how many times did adolescents do hobbies? It's a categorical value => response ( not at all, 1 or 2 times, more, refused, don’t know...)
Objective
We study the moderation of did adolescents do hobbies about the current residence period and AIDs existence in neighborhood knowledge.- N.B As explanatory and response used data are both quantitive, we would like to use Pearson correlation (r correlation analysis).
Python code
# -*- coding: utf-8 -*-"""
Created on Mon Apr 20 00:30:09 2020
@author: ABID Aymen
"""
# CORRELATION
import pandas
import numpy
import scipy.stats
import seaborn
import matplotlib.pyplot as plt
mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False)
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')
mydata['H1DA2'] = mydata['H1DA2'].apply(pandas.to_numeric, errors='coerce')
#data cleaning from no input values
mydata['H1DA2']=mydata['H1DA2'].replace(' ', numpy.nan)
data_clean=mydata.dropna()
#%%
#pearson information (r,p)
print ('Association between residence stability and AIDS people knowledge (r,p)')
[r,p]=scipy.stats.pearsonr(data_clean['H1GI3'], data_clean['H1GH45'])
print([r,p])
#moderator grouping
def incomegrp (row):
if row['H1DA2'] <= 3:
return 1
elif row['H1DA2'] > 6:
return 2
data_clean['incomegrp'] = data_clean.apply (lambda row: incomegrp (row),axis=1)
chk1 = data_clean['incomegrp'].value_counts(sort=False, dropna=False)
print(chk1)
sub1=data_clean[(data_clean['incomegrp']== 1)]
sub2=data_clean[(data_clean['incomegrp']== 2)]
#pearson information (r,p) per subgroup (sub1..)
print ('Association between uresidence stability and AIDS people knowledge for exact answer')
print (scipy.stats.pearsonr(sub1['H1GI3'], sub1['H1GH45']))
print (' ')
print ('Association between uresidence stability and AIDS people knowledge for not exact answer')
print (scipy.stats.pearsonr(sub2['H1GI3'], sub2['H1GH45']))
#%%
#graphics
scat1 = seaborn.regplot(x="H1GI3", y="H1GH45", data=sub1)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Scatterplot for the Association Between Urban Rate and Internet Use Rate for exact answer countries')
print (scat1)
#%%
scat2 = seaborn.regplot(x="H1GI3", y="H1GH45", fit_reg=False, data=sub2)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Scatterplot for the Association Between Urban Rate and Internet Use Rate for not exact answer')
print (scat2)
Created on Mon Apr 20 00:30:09 2020
@author: ABID Aymen
"""
# CORRELATION
import pandas
import numpy
import scipy.stats
import seaborn
import matplotlib.pyplot as plt
mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False)
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')
mydata['H1DA2'] = mydata['H1DA2'].apply(pandas.to_numeric, errors='coerce')
#data cleaning from no input values
mydata['H1DA2']=mydata['H1DA2'].replace(' ', numpy.nan)
data_clean=mydata.dropna()
#%%
#pearson information (r,p)
print ('Association between residence stability and AIDS people knowledge (r,p)')
[r,p]=scipy.stats.pearsonr(data_clean['H1GI3'], data_clean['H1GH45'])
print([r,p])
#moderator grouping
def incomegrp (row):
if row['H1DA2'] <= 3:
return 1
elif row['H1DA2'] > 6:
return 2
data_clean['incomegrp'] = data_clean.apply (lambda row: incomegrp (row),axis=1)
chk1 = data_clean['incomegrp'].value_counts(sort=False, dropna=False)
print(chk1)
sub1=data_clean[(data_clean['incomegrp']== 1)]
sub2=data_clean[(data_clean['incomegrp']== 2)]
#pearson information (r,p) per subgroup (sub1..)
print ('Association between uresidence stability and AIDS people knowledge for exact answer')
print (scipy.stats.pearsonr(sub1['H1GI3'], sub1['H1GH45']))
print (' ')
print ('Association between uresidence stability and AIDS people knowledge for not exact answer')
print (scipy.stats.pearsonr(sub2['H1GI3'], sub2['H1GH45']))
#%%
#graphics
scat1 = seaborn.regplot(x="H1GI3", y="H1GH45", data=sub1)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Scatterplot for the Association Between Urban Rate and Internet Use Rate for exact answer countries')
print (scat1)
#%%
scat2 = seaborn.regplot(x="H1GI3", y="H1GH45", fit_reg=False, data=sub2)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Scatterplot for the Association Between Urban Rate and Internet Use Rate for not exact answer')
print (scat2)
Results
Association between residence stability and AIDS people knowledge (r,p)
[0.15162579814828608, 9.506977908603123e-35]

Aucun commentaire:
Enregistrer un commentaire