Introduction
As said in the link,
The National Longitudinal Study of Adolescent Health (AddHealth) is a representative school-based survey of adolescents in grades 7-12 in the United States. The Wave 1 survey focuses on factors that may influence adolescents’ health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities.
source: Data Analysis Tools par Université Wesleyenne
NB. This study is under Coursera training (Outils d'analyse des données) using pythonData
- H1GI3 (General Introductory)How old was the adolescent when he moved to his current residence?
It's a quantitative value =>response (1, 2, 3 ... years)
H1GH45 (General Health)
How many people do you know who have AIDS?
it's a quantitive value => response: range 0 to 98 people
It's a quantitative value =>response (1, 2, 3 ... years)
H1GH45 (General Health)
How many people do you know who have AIDS?
it's a quantitive value => response: range 0 to 98 people
it's a quantitive value => response: range 0 to 98 people
Objective:
Study of the correlation between time spent by adolescents in current residence and AIDS people in the neighborhood.
Python code:
# -*- coding: utf-8 -*-
"""
Created on Sun Apr 19 17:51:51 2020
@author: Aymen ABID
"""
import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt
import math
mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False)
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GI3']=mydata['H1GI3'].replace(' ', numpy.nan)
mydata['H1GH45']=mydata['H1GH45'].replace(' ', numpy.nan)
data_clean=mydata.dropna() #clean data from Nan measurements for correlation
print ('Association between residence stability and AIDS people knowledge (r,p)')
[r,p]=scipy.stats.pearsonr(data_clean['H1GI3'], data_clean['H1GH45'])
print([r,p])
print('Fraction of the variability ')
print (math.sqrt( r))
scat1 = seaborn.regplot(x="H1GI3", y="H1GH45", fit_reg=True, data=mydata)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Association Between time in current residence and AIDS that adolescent know')

Aucun commentaire:
Enregistrer un commentaire