dimanche 19 avril 2020

Correlation for AIDS knowledge using current residence stability

Introduction

As said in the link,
The National Longitudinal Study of Adolescent Health (AddHealth) is a representative school-based survey of adolescents in grades 7-12 in the United States. The Wave 1 survey focuses on factors that may influence adolescents’ health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities.
source: Data Analysis Tools par Université Wesleyenne 
NB. This study is under Coursera training (Outils d'analyse des données) using python


Data


  • H1GI3 (General Introductory)How old was the adolescent when he moved to his current residence?
    It's a quantitative value =>response (1, 2, 3 ... years)
  • H1GH45 (General Health)
    How many people do you know who have AIDS?
    it's a quantitive value => response: range 0 to 98 people

Objective:

Study of the correlation between time spent by adolescents in current residence and AIDS people in the neighborhood.

Python code:

# -*- coding: utf-8 -*-
"""
Created on Sun Apr 19 17:51:51 2020

@author: Aymen ABID
"""
import pandas
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt
import math  

mydata = pandas.read_csv('addhealth_pds.csv', low_memory=False)
mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')

mydata['H1GI3'] = mydata['H1GI3'].apply(pandas.to_numeric, errors='coerce')
mydata['H1GH45'] = mydata['H1GH45'].apply(pandas.to_numeric, errors='coerce')

mydata['H1GI3']=mydata['H1GI3'].replace(' ', numpy.nan)
mydata['H1GH45']=mydata['H1GH45'].replace(' ', numpy.nan)

data_clean=mydata.dropna() #clean data from Nan measurements for correlation
print ('Association between residence stability and AIDS people knowledge  (r,p)')
[r,p]=scipy.stats.pearsonr(data_clean['H1GI3'], data_clean['H1GH45'])
print([r,p])
print('Fraction of the variability ')
print (math.sqrt( r))

scat1 = seaborn.regplot(x="H1GI3", y="H1GH45", fit_reg=True, data=mydata)
plt.xlabel('Residence Rate')
plt.ylabel('AIDS people knowledge Rate')
plt.title('Association Between time in current residence and AIDS that adolescent know')


Results:

Association between residence stability and AIDS people knowledge  (r,p)
[0.15162579814828608, 9.506977908603123e-35]
Fraction of the variability 
0.3893915743159912




Aucun commentaire:

Enregistrer un commentaire