Problem Statement
- What are the demograpic distribution of surveyed individuals?
- What percentage of the surveyed population had health
insurance?
- Distribution of time since last hospital visit?
- Do the patient do routine health checks?(How often)
- Have the Individuals taken any cancer screening?
- How do various factors e.g. health insurance, hospital visits,cancer
screening change over time?
- Investigate the relationship between monthly household income and
access to healthcare, insurance coverage, and health outcomes.
- Correlations between variables to identify potential associations or
dependencies.
- Find whether there are disparities in healthcare access or outcomes
based on gender, age, income, or other demographic factors.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## corrplot 0.92 loaded
## # A tibble: 6 × 32
## Location `_Location_latitude` `_Location_longitude` `_Location_altitude`
## <chr> <dbl> <dbl> <dbl>
## 1 -0.2742007 36… -0.274 36.1 1882.
## 2 -0.7158125 37… -0.716 37.1 1362.
## 3 -0.7158157 37… -0.716 37.1 1362.
## 4 -0.7157082 37… -0.716 37.1 1362.
## 5 -0.7157337 37… -0.716 37.1 1362.
## 6 -0.7158041 37… -0.716 37.1 1362.
## # ℹ 28 more variables: `_Location_precision` <dbl>, `Date and Time` <dttm>,
## # Age <chr>, Gender <chr>, `Marital Status` <chr>,
## # `How many children do you have, if any?` <dbl>, `Employment Status` <chr>,
## # `Monthly Household Income` <chr>,
## # `Have you ever had health insurance?` <chr>,
## # `If yes, which insurance cover?` <chr>,
## # `When was the last time you visited a hospital for medical treatment? (In Months)` <dbl>, …
str(Healthcare)
## tibble [6,158 × 32] (S3: tbl_df/tbl/data.frame)
## $ Location : chr [1:6158] "-0.2742007 36.058336 1882.2000732421875 20.0" "-0.7158125 37.1475058 1361.9000244140625 20.0" "-0.7158157 37.1475082 1361.9000244140625 20.0" "-0.7157082 37.14749 1361.9000244140625 20.0" ...
## $ _Location_latitude : num [1:6158] -0.274 -0.716 -0.716 -0.716 -0.716 ...
## $ _Location_longitude : num [1:6158] 36.1 37.1 37.1 37.1 37.1 ...
## $ _Location_altitude : num [1:6158] 1882 1362 1362 1362 1362 ...
## $ _Location_precision : num [1:6158] 20 20 20 20 20 ...
## $ Date and Time : POSIXct[1:6158], format: NA "2023-05-15 13:38:00" ...
## $ Age : chr [1:6158] "41-50" "18-30" "41-50" "18-30" ...
## $ Gender : chr [1:6158] "Female" "Male" "Female" "Male" ...
## $ Marital Status : chr [1:6158] "Married" "Single" "Married" "Single" ...
## $ How many children do you have, if any? : num [1:6158] 2 0 5 NA 7 NA 2 NA 3 2 ...
## $ Employment Status : chr [1:6158] "Self-employed" "Unemployed" "Self-employed" "Self-employed" ...
## $ Monthly Household Income : chr [1:6158] "20001-30000" "Less than 10000" "20001-30000" "10001-20000" ...
## $ Have you ever had health insurance? : chr [1:6158] "Yes" "No" "No" "Yes" ...
## $ If yes, which insurance cover? : chr [1:6158] "Nhif" NA "Nhif" "Nhif" ...
## $ When was the last time you visited a hospital for medical treatment? (In Months) : num [1:6158] 53 8 6 16 13 2 4 24 5 6 ...
## $ Did you have health insurance during your last hospital visit? : chr [1:6158] "No" "No" "Yes" "Yes" ...
## $ Have you ever had a routine check-up with a doctor or healthcare provider? : chr [1:6158] "Yes" "Yes" "No" "No" ...
## $ If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?: chr [1:6158] "2" "1" NA NA ...
## $ Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)? : chr [1:6158] "No" "No" "Yes" "No" ...
## $ If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?: chr [1:6158] "2" NA "4+" NA ...
## $ Your Picture : logi [1:6158] NA NA NA NA NA NA ...
## $ Your Picture_URL : logi [1:6158] NA NA NA NA NA NA ...
## $ _id : num [1:6158] 2.30e+08 2.38e+08 2.38e+08 2.38e+08 2.38e+08 ...
## $ _uuid : chr [1:6158] "aa30304f-84f2-4c1b-b30a-371241f2ff17" "63c461e3-b3ef-47cf-9632-0c912a639f46" "4209a55d-a983-433f-8ce0-bce6cd28d713" "2eba9b13-1706-4faf-b7a7-e45e9dcf48ab" ...
## $ _submission_time : POSIXct[1:6158], format: "2023-04-05 08:44:06" "2023-05-15 10:44:01" ...
## $ _validation_status : logi [1:6158] NA NA NA NA NA NA ...
## $ _notes : logi [1:6158] NA NA NA NA NA NA ...
## $ _status : chr [1:6158] "submitted_via_web" "submitted_via_web" "submitted_via_web" "submitted_via_web" ...
## $ _submitted_by : chr [1:6158] NA "safra_data" "safra_data" "safra_data" ...
## $ __version__ : chr [1:6158] "vJ8gEKnN2pccxThc5jnkz4" "vMrCPR7NLZZJrf4PTsQ8uH" "vMrCPR7NLZZJrf4PTsQ8uH" "vMrCPR7NLZZJrf4PTsQ8uH" ...
## $ _tags : logi [1:6158] NA NA NA NA NA NA ...
## $ _index : num [1:6158] 1 2 3 4 5 6 7 8 9 10 ...
summary(Healthcare)
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6158 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2593 1st Qu.:36.38 1st Qu.:1348.9
## Mode :character Median :-0.7264 Median :36.87 Median :1592.9
## Mean :-0.7378 Mean :36.72 Mean :1536.4
## 3rd Qu.:-0.3781 3rd Qu.:37.15 3rd Qu.:1857.6
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :353 NA's :353 NA's :353
## _Location_precision Date and Time Age
## Min. : 0.000 Min. :2023-05-15 08:35:00.00 Length:6158
## 1st Qu.: 4.100 1st Qu.:2023-06-16 10:18:30.00 Class :character
## Median : 4.820 Median :2023-06-23 10:22:00.00 Mode :character
## Mean : 71.217 Mean :2023-06-23 12:19:03.48
## 3rd Qu.: 7.196 3rd Qu.:2023-06-30 15:33:30.00
## Max. :4900.000 Max. :2023-07-27 12:00:00.00
## NA's :353 NA's :148
## Gender Marital Status How many children do you have, if any?
## Length:6158 Length:6158 Min. : 0.0
## Class :character Class :character 1st Qu.: 1.0
## Mode :character Mode :character Median : 2.0
## Mean : 147.2
## 3rd Qu.: 3.0
## Max. :800159.0
## NA's :625
## Employment Status Monthly Household Income
## Length:6158 Length:6158
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## Have you ever had health insurance? If yes, which insurance cover?
## Length:6158 Length:6158
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## When was the last time you visited a hospital for medical treatment? (In Months)
## Min. : 0.000
## 1st Qu.: 2.000
## Median : 4.000
## Mean : 6.652
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :158
## Did you have health insurance during your last hospital visit?
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## Have you ever had a routine check-up with a doctor or healthcare provider?
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## Your Picture Your Picture_URL _id _uuid
## Mode:logical Mode:logical Min. :230162389 Length:6158
## NA's:6158 NA's:6158 1st Qu.:246958185 Class :character
## Median :248624898 Mode :character
## Mean :248260300
## 3rd Qu.:249981696
## Max. :258479375
##
## _submission_time _validation_status _notes
## Min. :2023-04-05 08:44:06.00 Mode:logical Mode:logical
## 1st Qu.:2023-06-19 10:27:25.75 NA's:6158 NA's:6158
## Median :2023-06-26 09:56:12.50
## Mean :2023-06-25 04:21:26.05
## 3rd Qu.:2023-07-03 13:40:47.00
## Max. :2023-08-07 09:12:14.00
##
## _status _submitted_by __version__ _tags
## Length:6158 Length:6158 Length:6158 Mode:logical
## Class :character Class :character Class :character NA's:6158
## Mode :character Mode :character Mode :character
##
##
##
##
## _index
## Min. : 1
## 1st Qu.:1540
## Median :3080
## Mean :3080
## 3rd Qu.:4619
## Max. :6158
##
dim(Healthcare)
## [1] 6158 32
colnames(Healthcare)
## [1] "Location"
## [2] "_Location_latitude"
## [3] "_Location_longitude"
## [4] "_Location_altitude"
## [5] "_Location_precision"
## [6] "Date and Time"
## [7] "Age"
## [8] "Gender"
## [9] "Marital Status"
## [10] "How many children do you have, if any?"
## [11] "Employment Status"
## [12] "Monthly Household Income"
## [13] "Have you ever had health insurance?"
## [14] "If yes, which insurance cover?"
## [15] "When was the last time you visited a hospital for medical treatment? (In Months)"
## [16] "Did you have health insurance during your last hospital visit?"
## [17] "Have you ever had a routine check-up with a doctor or healthcare provider?"
## [18] "If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?"
## [19] "Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?"
## [20] "If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?"
## [21] "Your Picture"
## [22] "Your Picture_URL"
## [23] "_id"
## [24] "_uuid"
## [25] "_submission_time"
## [26] "_validation_status"
## [27] "_notes"
## [28] "_status"
## [29] "_submitted_by"
## [30] "__version__"
## [31] "_tags"
## [32] "_index"
tail(Healthcare)
## # A tibble: 6 × 32
## Location `_Location_latitude` `_Location_longitude` `_Location_altitude`
## <chr> <dbl> <dbl> <dbl>
## 1 -1.2689389 36… -1.27 36.9 1618.
## 2 -1.2693104 36… -1.27 36.9 1618
## 3 -1.2705219 36… -1.27 36.9 1595.
## 4 -1.2718084 36… -1.27 36.9 1595.
## 5 -1.2730717 36… -1.27 36.9 1595.
## 6 -1.2739374 36… -1.27 36.9 1595.
## # ℹ 28 more variables: `_Location_precision` <dbl>, `Date and Time` <dttm>,
## # Age <chr>, Gender <chr>, `Marital Status` <chr>,
## # `How many children do you have, if any?` <dbl>, `Employment Status` <chr>,
## # `Monthly Household Income` <chr>,
## # `Have you ever had health insurance?` <chr>,
## # `If yes, which insurance cover?` <chr>,
## # `When was the last time you visited a hospital for medical treatment? (In Months)` <dbl>, …
Rename columns
Healthcare <- Healthcare %>%
rename(Children = 'How many children do you have, if any?', Insured = 'Have you ever had health insurance?', InsuranceName = 'If yes, which insurance cover?', Last_hospital_visit_months = 'When was the last time you visited a hospital for medical treatment? (In Months)' , Insured_last_visit ='Did you have health insurance during your last hospital visit?', routine_checkup = 'Have you ever had a routine check-up with a doctor or healthcare provider?', last_check_up_years = 'If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?', Cancerscreening ='Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?', Interval_of_screening = 'If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?')
summary(Healthcare)
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6158 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2593 1st Qu.:36.38 1st Qu.:1348.9
## Mode :character Median :-0.7264 Median :36.87 Median :1592.9
## Mean :-0.7378 Mean :36.72 Mean :1536.4
## 3rd Qu.:-0.3781 3rd Qu.:37.15 3rd Qu.:1857.6
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :353 NA's :353 NA's :353
## _Location_precision Date and Time Age
## Min. : 0.000 Min. :2023-05-15 08:35:00.00 Length:6158
## 1st Qu.: 4.100 1st Qu.:2023-06-16 10:18:30.00 Class :character
## Median : 4.820 Median :2023-06-23 10:22:00.00 Mode :character
## Mean : 71.217 Mean :2023-06-23 12:19:03.48
## 3rd Qu.: 7.196 3rd Qu.:2023-06-30 15:33:30.00
## Max. :4900.000 Max. :2023-07-27 12:00:00.00
## NA's :353 NA's :148
## Gender Marital Status Children Employment Status
## Length:6158 Length:6158 Min. : 0.0 Length:6158
## Class :character Class :character 1st Qu.: 1.0 Class :character
## Mode :character Mode :character Median : 2.0 Mode :character
## Mean : 147.2
## 3rd Qu.: 3.0
## Max. :800159.0
## NA's :625
## Monthly Household Income Insured InsuranceName
## Length:6158 Length:6158 Length:6158
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Last_hospital_visit_months Insured_last_visit routine_checkup
## Min. : 0.000 Length:6158 Length:6158
## 1st Qu.: 2.000 Class :character Class :character
## Median : 4.000 Mode :character Mode :character
## Mean : 6.652
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :158
## last_check_up_years Cancerscreening Interval_of_screening Your Picture
## Length:6158 Length:6158 Length:6158 Mode:logical
## Class :character Class :character Class :character NA's:6158
## Mode :character Mode :character Mode :character
##
##
##
##
## Your Picture_URL _id _uuid
## Mode:logical Min. :230162389 Length:6158
## NA's:6158 1st Qu.:246958185 Class :character
## Median :248624898 Mode :character
## Mean :248260300
## 3rd Qu.:249981696
## Max. :258479375
##
## _submission_time _validation_status _notes
## Min. :2023-04-05 08:44:06.00 Mode:logical Mode:logical
## 1st Qu.:2023-06-19 10:27:25.75 NA's:6158 NA's:6158
## Median :2023-06-26 09:56:12.50
## Mean :2023-06-25 04:21:26.05
## 3rd Qu.:2023-07-03 13:40:47.00
## Max. :2023-08-07 09:12:14.00
##
## _status _submitted_by __version__ _tags
## Length:6158 Length:6158 Length:6158 Mode:logical
## Class :character Class :character Class :character NA's:6158
## Mode :character Mode :character Mode :character
##
##
##
##
## _index
## Min. : 1
## 1st Qu.:1540
## Median :3080
## Mean :3080
## 3rd Qu.:4619
## Max. :6158
##
Identify missing values
Healthcare %>% map(~sum(is.na(.)))
## $Location
## [1] 353
##
## $`_Location_latitude`
## [1] 353
##
## $`_Location_longitude`
## [1] 353
##
## $`_Location_altitude`
## [1] 353
##
## $`_Location_precision`
## [1] 353
##
## $`Date and Time`
## [1] 148
##
## $Age
## [1] 18
##
## $Gender
## [1] 17
##
## $`Marital Status`
## [1] 18
##
## $Children
## [1] 625
##
## $`Employment Status`
## [1] 24
##
## $`Monthly Household Income`
## [1] 259
##
## $Insured
## [1] 19
##
## $InsuranceName
## [1] 2519
##
## $Last_hospital_visit_months
## [1] 158
##
## $Insured_last_visit
## [1] 56
##
## $routine_checkup
## [1] 23
##
## $last_check_up_years
## [1] 4382
##
## $Cancerscreening
## [1] 31
##
## $Interval_of_screening
## [1] 4593
##
## $`Your Picture`
## [1] 6158
##
## $`Your Picture_URL`
## [1] 6158
##
## $`_id`
## [1] 0
##
## $`_uuid`
## [1] 0
##
## $`_submission_time`
## [1] 0
##
## $`_validation_status`
## [1] 6158
##
## $`_notes`
## [1] 6158
##
## $`_status`
## [1] 0
##
## $`_submitted_by`
## [1] 1
##
## $`__version__`
## [1] 0
##
## $`_tags`
## [1] 6158
##
## $`_index`
## [1] 0
sum(duplicated(Healthcare))
## [1] 0
Healthcare <- Healthcare %>%
select (-c(`_tags`, `__version__`, `_submitted_by`, `_status`, `Date and Time`, `_notes`, `_validation_status`,`_submission_time`, `Your Picture_URL`,`_id`, `Your Picture`,`_uuid`, '_index'))
summary(Healthcare)
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6158 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2593 1st Qu.:36.38 1st Qu.:1348.9
## Mode :character Median :-0.7264 Median :36.87 Median :1592.9
## Mean :-0.7378 Mean :36.72 Mean :1536.4
## 3rd Qu.:-0.3781 3rd Qu.:37.15 3rd Qu.:1857.6
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :353 NA's :353 NA's :353
## _Location_precision Age Gender Marital Status
## Min. : 0.000 Length:6158 Length:6158 Length:6158
## 1st Qu.: 4.100 Class :character Class :character Class :character
## Median : 4.820 Mode :character Mode :character Mode :character
## Mean : 71.217
## 3rd Qu.: 7.196
## Max. :4900.000
## NA's :353
## Children Employment Status Monthly Household Income
## Min. : 0.0 Length:6158 Length:6158
## 1st Qu.: 1.0 Class :character Class :character
## Median : 2.0 Mode :character Mode :character
## Mean : 147.2
## 3rd Qu.: 3.0
## Max. :800159.0
## NA's :625
## Insured InsuranceName Last_hospital_visit_months
## Length:6158 Length:6158 Min. : 0.000
## Class :character Class :character 1st Qu.: 2.000
## Mode :character Mode :character Median : 4.000
## Mean : 6.652
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :158
## Insured_last_visit routine_checkup last_check_up_years Cancerscreening
## Length:6158 Length:6158 Length:6158 Length:6158
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Interval_of_screening
## Length:6158
## Class :character
## Mode :character
##
##
##
##
## Warning: NAs introduced by coercion
## Warning: NAs introduced by coercion
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6158 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2593 1st Qu.:36.38 1st Qu.:1348.9
## Mode :character Median :-0.7264 Median :36.87 Median :1592.9
## Mean :-0.7378 Mean :36.72 Mean :1536.4
## 3rd Qu.:-0.3781 3rd Qu.:37.15 3rd Qu.:1857.6
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :353 NA's :353 NA's :353
## _Location_precision Age Gender Marital Status
## Min. : 0.000 18-30:2183 Female:3034 Divorced: 461
## 1st Qu.: 4.100 31-40:1777 Male :3107 Married :3654
## Median : 4.820 41-50:1167 NA's : 17 Single :2025
## Mean : 71.217 51-60: 618 NA's : 18
## 3rd Qu.: 7.196 60+ : 395
## Max. :4900.000 NA's : 18
## NA's :353
## Children Employment Status Monthly Household Income
## Min. : 0.0 Employed :2177 10001-20000 :1433
## 1st Qu.: 1.0 Self-employed:1836 20001-30000 :1260
## Median : 2.0 Unemployed :2121 30001-40000 : 122
## Mean : 147.2 NA's : 24 40001-50000 : 719
## 3rd Qu.: 3.0 50001+ : 528
## Max. :800159.0 Less than 10000:1837
## NA's :625 NA's : 259
## Insured InsuranceName Last_hospital_visit_months
## Length:6158 Length:6158 Min. : 0.000
## Class :character Class :character 1st Qu.: 2.000
## Mode :character Mode :character Median : 4.000
## Mean : 6.652
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :158
## Insured_last_visit routine_checkup last_check_up_years Cancerscreening
## No :3264 No :4342 Min. :1.000 Mode:logical
## Yes :2838 Yes :1793 1st Qu.:1.000 NA's:6158
## NA's: 56 NA's: 23 Median :1.000
## Mean :1.641
## 3rd Qu.:2.000
## Max. :3.000
## NA's :4636
## Interval_of_screening
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :1.859
## 3rd Qu.:2.000
## Max. :3.000
## NA's :5053
## tibble [6,158 × 19] (S3: tbl_df/tbl/data.frame)
## $ Location : chr [1:6158] "-0.2742007 36.058336 1882.2000732421875 20.0" "-0.7158125 37.1475058 1361.9000244140625 20.0" "-0.7158157 37.1475082 1361.9000244140625 20.0" "-0.7157082 37.14749 1361.9000244140625 20.0" ...
## $ _Location_latitude : num [1:6158] -0.274 -0.716 -0.716 -0.716 -0.716 ...
## $ _Location_longitude : num [1:6158] 36.1 37.1 37.1 37.1 37.1 ...
## $ _Location_altitude : num [1:6158] 1882 1362 1362 1362 1362 ...
## $ _Location_precision : num [1:6158] 20 20 20 20 20 ...
## $ Age : Factor w/ 5 levels "18-30","31-40",..: 3 1 3 1 3 1 2 1 2 2 ...
## $ Gender : Factor w/ 2 levels "Female","Male": 1 2 1 2 2 1 1 2 1 1 ...
## $ Marital Status : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 3 2 3 2 3 2 2 ...
## $ Children : num [1:6158] 2 0 5 2 7 2 2 2 3 2 ...
## $ Employment Status : Factor w/ 3 levels "Employed","Self-employed",..: 2 3 2 2 2 3 2 2 1 1 ...
## $ Monthly Household Income : Factor w/ 6 levels "10001-20000",..: 2 6 2 1 2 1 1 6 2 1 ...
## $ Insured : chr [1:6158] "Yes" "No" "No" "Yes" ...
## $ InsuranceName : chr [1:6158] "Nhif" NA "Nhif" "Nhif" ...
## $ Last_hospital_visit_months: num [1:6158] 53 8 6 16 13 2 4 24 5 6 ...
## $ Insured_last_visit : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 1 2 2 ...
## $ routine_checkup : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
## $ last_check_up_years : num [1:6158] 2 1 NA NA NA NA NA NA NA NA ...
## $ Cancerscreening : logi [1:6158] NA NA NA NA NA NA ...
## $ Interval_of_screening : num [1:6158] 2 NA NA NA NA NA NA NA NA NA ...
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 11 rows containing non-finite values (`stat_bin()`).
summary(Healthcare)
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6158 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2593 1st Qu.:36.38 1st Qu.:1348.9
## Mode :character Median :-0.7264 Median :36.87 Median :1592.9
## Mean :-0.7378 Mean :36.72 Mean :1536.4
## 3rd Qu.:-0.3781 3rd Qu.:37.15 3rd Qu.:1857.6
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :353 NA's :353 NA's :353
## _Location_precision Age Gender Marital Status
## Min. : 0.000 18-30:2183 Female:3034 Divorced: 461
## 1st Qu.: 4.100 31-40:1777 Male :3107 Married :3654
## Median : 4.820 41-50:1167 NA's : 17 Single :2025
## Mean : 71.217 51-60: 618 NA's : 18
## 3rd Qu.: 7.196 60+ : 395
## Max. :4900.000 NA's : 18
## NA's :353
## Children Employment Status Monthly Household Income
## Min. : 0.000 Employed :2177 10001-20000 :1433
## 1st Qu.: 1.000 Self-employed:1836 20001-30000 :1260
## Median : 2.000 Unemployed :2121 30001-40000 : 122
## Mean : 2.453 NA's : 24 40001-50000 : 719
## 3rd Qu.: 3.000 50001+ : 528
## Max. :13.000 Less than 10000:1837
## NA's :11 NA's : 259
## Insured InsuranceName Last_hospital_visit_months
## Length:6158 Length:6158 Min. : 0.000
## Class :character Class :character 1st Qu.: 2.000
## Mode :character Mode :character Median : 4.000
## Mean : 6.652
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :158
## Insured_last_visit routine_checkup last_check_up_years Cancerscreening
## No :3264 No :4342 Min. :1.000 Mode:logical
## Yes :2838 Yes :1793 1st Qu.:1.000 NA's:6158
## NA's: 56 NA's: 23 Median :1.000
## Mean :1.641
## 3rd Qu.:2.000
## Max. :3.000
## NA's :4636
## Interval_of_screening
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :1.859
## 3rd Qu.:2.000
## Max. :3.000
## NA's :5053
Marital Status
## # A tibble: 4 × 2
## `Marital Status` Percentage
## <fct> <dbl>
## 1 Married 59.3
## 2 Single 32.9
## 3 Divorced 7.49
## 4 <NA> 0.29
Insured
Age Categories Insured
Male to Female Insured
Healthcare_cleaned <- Healthcare[!is.na(Healthcare$Gender) & !is.na(Healthcare$Insured), ]
ggplot(Healthcare_cleaned, aes(x = Gender, fill = Insured)) +
geom_bar() +
labs(title = "Insured vs. Uninsured by Gender", y = "") +
theme_minimal()+
scale_fill_manual(values = c("Yes" = "green", "No" = "purple"))+
theme(
legend.position = "right",
panel.grid.major = element_blank(), # Hide major grid lines
panel.grid.minor = element_blank(), # Hide minor grid lines
axis.text.x = element_text(angle = 0, hjust = 0.5),
plot.title = element_text(hjust = 0.5) # Center the title
)
Type of insurance
Healthcare_cover <- Healthcare %>%
group_by(InsuranceName)%>%
summarise(Count = n())
print(Healthcare_cover)
## # A tibble: 227 × 2
## InsuranceName Count
## <chr> <int>
## 1 "\r\nNHIF" 3
## 2 ", NHIF" 1
## 3 "." 3
## 4 "0" 1
## 5 "1" 1
## 6 "2" 2
## 7 "5" 1
## 8 "800165" 1
## 9 "AAR" 22
## 10 "AAR\r\nBritam" 1
## # ℹ 217 more rows
Clean Insurance
## # A tibble: 135 × 2
## InsuranceName Count
## <chr> <int>
## 1 "" 19
## 2 "." 3
## 3 "0" 1
## 4 "1" 1
## 5 "2" 2
## 6 "5" 1
## 7 "800165" 1
## 8 "Aar" 23
## 9 "Aar Insurance" 2
## 10 "Absa Bank Care" 1
## # ℹ 125 more rows
## # A tibble: 30 × 2
## InsuranceName Count
## <chr> <int>
## 1 NA 2851
## 2 NHIF 2810
## 3 Jubilee 149
## 4 Britam 125
## 5 APA Insurance 108
## 6 CIC 56
## 7 Unknown 28
## 8 AAR 26
## 9 Health Cover 19
## 10 DirectLine 18
## # ℹ 20 more rows
## # A tibble: 12 × 3
## InsuranceName Count Percentage
## <chr> <int> <dbl>
## 1 NA 2851 45.6
## 2 NHIF 2810 44.9
## 3 Jubilee 149 2.38
## 4 Britam 125 2.00
## 5 APA Insurance 108 1.73
## 6 CIC 56 0.895
## 7 Others 52 0.831
## 8 Unknown 28 0.448
## 9 AAR 26 0.416
## 10 Health Cover 19 0.304
## 11 DirectLine 18 0.288
## 12 UAP OLD MUTUAL 13 0.208
Percentage
summary(expanded_dataset)
## Location _Location_latitude _Location_longitude _Location_altitude
## Length:6255 Min. :-4.0519 Min. :34.09 Min. :-201.3
## Class :character 1st Qu.:-1.2610 1st Qu.:36.38 1st Qu.:1348.4
## Mode :character Median :-0.7259 Median :36.87 Median :1592.9
## Mean :-0.7382 Mean :36.71 Mean :1534.6
## 3rd Qu.:-0.3783 3rd Qu.:37.15 3rd Qu.:1857.5
## Max. : 1.8422 Max. :39.69 Max. :2988.5
## NA's :359 NA's :359 NA's :359
## _Location_precision Age Gender Marital Status
## Min. : 0.000 18-30:2200 Female:3079 Divorced: 469
## 1st Qu.: 4.100 31-40:1803 Male :3159 Married :3723
## Median : 4.820 41-50:1194 NA's : 17 Single :2045
## Mean : 71.521 51-60: 635 NA's : 18
## 3rd Qu.: 7.116 60+ : 404
## Max. :4900.000 NA's : 19
## NA's :359
## Children Employment Status Monthly Household Income
## Min. : 0.000 Employed :2241 10001-20000 :1448
## 1st Qu.: 1.000 Self-employed:1862 20001-30000 :1276
## Median : 2.000 Unemployed :2128 30001-40000 : 126
## Mean : 2.464 NA's : 24 40001-50000 : 733
## 3rd Qu.: 3.000 50001+ : 571
## Max. :13.000 Less than 10000:1841
## NA's :11 NA's : 260
## Insured InsuranceName Last_hospital_visit_months
## Length:6255 Length:6255 Min. : 0.000
## Class :character Class :character 1st Qu.: 2.000
## Mode :character Mode :character Median : 4.000
## Mean : 6.663
## 3rd Qu.: 8.000
## Max. :2021.000
## NA's :159
## Insured_last_visit routine_checkup last_check_up_years Cancerscreening
## No :3276 No :4385 Min. :1.000 Mode:logical
## Yes :2923 Yes :1847 1st Qu.:1.000 NA's:6255
## NA's: 56 NA's: 23 Median :1.000
## Mean :1.626
## 3rd Qu.:2.000
## Max. :3.000
## NA's :4683
## Interval_of_screening
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :1.858
## 3rd Qu.:2.000
## Max. :3.000
## NA's :5114
Routine CheckUp vs Insured
Employment
Employment Status and Insurance Coverage