Problem Statement

  1. What are the demograpic distribution of surveyed individuals?
  2. What percentage of the surveyed population had health insurance?
  3. Distribution of time since last hospital visit?
  4. Do the patient do routine health checks?(How often)
  5. Have the Individuals taken any cancer screening?
  6. How do various factors e.g. health insurance, hospital visits,cancer screening change over time?
  7. Investigate the relationship between monthly household income and access to healthcare, insurance coverage, and health outcomes.
  8. Correlations between variables to identify potential associations or dependencies.
  9. Find whether there are disparities in healthcare access or outcomes based on gender, age, income, or other demographic factors.
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## corrplot 0.92 loaded
## # A tibble: 6 × 32
##   Location       `_Location_latitude` `_Location_longitude` `_Location_altitude`
##   <chr>                         <dbl>                 <dbl>                <dbl>
## 1 -0.2742007 36…               -0.274                  36.1                1882.
## 2 -0.7158125 37…               -0.716                  37.1                1362.
## 3 -0.7158157 37…               -0.716                  37.1                1362.
## 4 -0.7157082 37…               -0.716                  37.1                1362.
## 5 -0.7157337 37…               -0.716                  37.1                1362.
## 6 -0.7158041 37…               -0.716                  37.1                1362.
## # ℹ 28 more variables: `_Location_precision` <dbl>, `Date and Time` <dttm>,
## #   Age <chr>, Gender <chr>, `Marital Status` <chr>,
## #   `How many children do you have, if any?` <dbl>, `Employment Status` <chr>,
## #   `Monthly Household Income` <chr>,
## #   `Have you ever had health insurance?` <chr>,
## #   `If yes, which insurance cover?` <chr>,
## #   `When was the last time you visited a hospital for medical treatment? (In Months)` <dbl>, …
str(Healthcare)
## tibble [6,158 × 32] (S3: tbl_df/tbl/data.frame)
##  $ Location                                                                                                                  : chr [1:6158] "-0.2742007 36.058336 1882.2000732421875 20.0" "-0.7158125 37.1475058 1361.9000244140625 20.0" "-0.7158157 37.1475082 1361.9000244140625 20.0" "-0.7157082 37.14749 1361.9000244140625 20.0" ...
##  $ _Location_latitude                                                                                                        : num [1:6158] -0.274 -0.716 -0.716 -0.716 -0.716 ...
##  $ _Location_longitude                                                                                                       : num [1:6158] 36.1 37.1 37.1 37.1 37.1 ...
##  $ _Location_altitude                                                                                                        : num [1:6158] 1882 1362 1362 1362 1362 ...
##  $ _Location_precision                                                                                                       : num [1:6158] 20 20 20 20 20 ...
##  $ Date and Time                                                                                                             : POSIXct[1:6158], format: NA "2023-05-15 13:38:00" ...
##  $ Age                                                                                                                       : chr [1:6158] "41-50" "18-30" "41-50" "18-30" ...
##  $ Gender                                                                                                                    : chr [1:6158] "Female" "Male" "Female" "Male" ...
##  $ Marital Status                                                                                                            : chr [1:6158] "Married" "Single" "Married" "Single" ...
##  $ How many children do you have, if any?                                                                                    : num [1:6158] 2 0 5 NA 7 NA 2 NA 3 2 ...
##  $ Employment Status                                                                                                         : chr [1:6158] "Self-employed" "Unemployed" "Self-employed" "Self-employed" ...
##  $ Monthly Household Income                                                                                                  : chr [1:6158] "20001-30000" "Less than 10000" "20001-30000" "10001-20000" ...
##  $ Have you ever had health insurance?                                                                                       : chr [1:6158] "Yes" "No" "No" "Yes" ...
##  $ If yes, which insurance cover?                                                                                            : chr [1:6158] "Nhif" NA "Nhif" "Nhif" ...
##  $ When was the last time you visited a hospital for medical treatment? (In Months)                                          : num [1:6158] 53 8 6 16 13 2 4 24 5 6 ...
##  $ Did you have health insurance during your last hospital visit?                                                            : chr [1:6158] "No" "No" "Yes" "Yes" ...
##  $ Have you ever had a routine check-up with a doctor or healthcare provider?                                                : chr [1:6158] "Yes" "Yes" "No" "No" ...
##  $ If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?: chr [1:6158] "2" "1" NA NA ...
##  $ Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?                                                 : chr [1:6158] "No" "No" "Yes" "No" ...
##  $ If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?: chr [1:6158] "2" NA "4+" NA ...
##  $ Your Picture                                                                                                              : logi [1:6158] NA NA NA NA NA NA ...
##  $ Your Picture_URL                                                                                                          : logi [1:6158] NA NA NA NA NA NA ...
##  $ _id                                                                                                                       : num [1:6158] 2.30e+08 2.38e+08 2.38e+08 2.38e+08 2.38e+08 ...
##  $ _uuid                                                                                                                     : chr [1:6158] "aa30304f-84f2-4c1b-b30a-371241f2ff17" "63c461e3-b3ef-47cf-9632-0c912a639f46" "4209a55d-a983-433f-8ce0-bce6cd28d713" "2eba9b13-1706-4faf-b7a7-e45e9dcf48ab" ...
##  $ _submission_time                                                                                                          : POSIXct[1:6158], format: "2023-04-05 08:44:06" "2023-05-15 10:44:01" ...
##  $ _validation_status                                                                                                        : logi [1:6158] NA NA NA NA NA NA ...
##  $ _notes                                                                                                                    : logi [1:6158] NA NA NA NA NA NA ...
##  $ _status                                                                                                                   : chr [1:6158] "submitted_via_web" "submitted_via_web" "submitted_via_web" "submitted_via_web" ...
##  $ _submitted_by                                                                                                             : chr [1:6158] NA "safra_data" "safra_data" "safra_data" ...
##  $ __version__                                                                                                               : chr [1:6158] "vJ8gEKnN2pccxThc5jnkz4" "vMrCPR7NLZZJrf4PTsQ8uH" "vMrCPR7NLZZJrf4PTsQ8uH" "vMrCPR7NLZZJrf4PTsQ8uH" ...
##  $ _tags                                                                                                                     : logi [1:6158] NA NA NA NA NA NA ...
##  $ _index                                                                                                                    : num [1:6158] 1 2 3 4 5 6 7 8 9 10 ...
summary(Healthcare)
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6158        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2593    1st Qu.:36.38       1st Qu.:1348.9    
##  Mode  :character   Median :-0.7264    Median :36.87       Median :1592.9    
##                     Mean   :-0.7378    Mean   :36.72       Mean   :1536.4    
##                     3rd Qu.:-0.3781    3rd Qu.:37.15       3rd Qu.:1857.6    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :353        NA's   :353         NA's   :353       
##  _Location_precision Date and Time                        Age           
##  Min.   :   0.000    Min.   :2023-05-15 08:35:00.00   Length:6158       
##  1st Qu.:   4.100    1st Qu.:2023-06-16 10:18:30.00   Class :character  
##  Median :   4.820    Median :2023-06-23 10:22:00.00   Mode  :character  
##  Mean   :  71.217    Mean   :2023-06-23 12:19:03.48                     
##  3rd Qu.:   7.196    3rd Qu.:2023-06-30 15:33:30.00                     
##  Max.   :4900.000    Max.   :2023-07-27 12:00:00.00                     
##  NA's   :353         NA's   :148                                        
##     Gender          Marital Status     How many children do you have, if any?
##  Length:6158        Length:6158        Min.   :     0.0                      
##  Class :character   Class :character   1st Qu.:     1.0                      
##  Mode  :character   Mode  :character   Median :     2.0                      
##                                        Mean   :   147.2                      
##                                        3rd Qu.:     3.0                      
##                                        Max.   :800159.0                      
##                                        NA's   :625                           
##  Employment Status  Monthly Household Income
##  Length:6158        Length:6158             
##  Class :character   Class :character        
##  Mode  :character   Mode  :character        
##                                             
##                                             
##                                             
##                                             
##  Have you ever had health insurance? If yes, which insurance cover?
##  Length:6158                         Length:6158                   
##  Class :character                    Class :character              
##  Mode  :character                    Mode  :character              
##                                                                    
##                                                                    
##                                                                    
##                                                                    
##  When was the last time you visited a hospital for medical treatment? (In Months)
##  Min.   :   0.000                                                                
##  1st Qu.:   2.000                                                                
##  Median :   4.000                                                                
##  Mean   :   6.652                                                                
##  3rd Qu.:   8.000                                                                
##  Max.   :2021.000                                                                
##  NA's   :158                                                                     
##  Did you have health insurance during your last hospital visit?
##  Length:6158                                                   
##  Class :character                                              
##  Mode  :character                                              
##                                                                
##                                                                
##                                                                
##                                                                
##  Have you ever had a routine check-up with a doctor or healthcare provider?
##  Length:6158                                                               
##  Class :character                                                          
##  Mode  :character                                                          
##                                                                            
##                                                                            
##                                                                            
##                                                                            
##  If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?
##  Length:6158                                                                                                               
##  Class :character                                                                                                          
##  Mode  :character                                                                                                          
##                                                                                                                            
##                                                                                                                            
##                                                                                                                            
##                                                                                                                            
##  Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?
##  Length:6158                                                              
##  Class :character                                                         
##  Mode  :character                                                         
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?
##  Length:6158                                                                                                               
##  Class :character                                                                                                          
##  Mode  :character                                                                                                          
##                                                                                                                            
##                                                                                                                            
##                                                                                                                            
##                                                                                                                            
##  Your Picture   Your Picture_URL      _id               _uuid          
##  Mode:logical   Mode:logical     Min.   :230162389   Length:6158       
##  NA's:6158      NA's:6158        1st Qu.:246958185   Class :character  
##                                  Median :248624898   Mode  :character  
##                                  Mean   :248260300                     
##                                  3rd Qu.:249981696                     
##                                  Max.   :258479375                     
##                                                                        
##  _submission_time                 _validation_status  _notes       
##  Min.   :2023-04-05 08:44:06.00   Mode:logical       Mode:logical  
##  1st Qu.:2023-06-19 10:27:25.75   NA's:6158          NA's:6158     
##  Median :2023-06-26 09:56:12.50                                    
##  Mean   :2023-06-25 04:21:26.05                                    
##  3rd Qu.:2023-07-03 13:40:47.00                                    
##  Max.   :2023-08-07 09:12:14.00                                    
##                                                                    
##    _status          _submitted_by      __version__         _tags        
##  Length:6158        Length:6158        Length:6158        Mode:logical  
##  Class :character   Class :character   Class :character   NA's:6158     
##  Mode  :character   Mode  :character   Mode  :character                 
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##      _index    
##  Min.   :   1  
##  1st Qu.:1540  
##  Median :3080  
##  Mean   :3080  
##  3rd Qu.:4619  
##  Max.   :6158  
## 
dim(Healthcare)
## [1] 6158   32
colnames(Healthcare)
##  [1] "Location"                                                                                                                  
##  [2] "_Location_latitude"                                                                                                        
##  [3] "_Location_longitude"                                                                                                       
##  [4] "_Location_altitude"                                                                                                        
##  [5] "_Location_precision"                                                                                                       
##  [6] "Date and Time"                                                                                                             
##  [7] "Age"                                                                                                                       
##  [8] "Gender"                                                                                                                    
##  [9] "Marital Status"                                                                                                            
## [10] "How many children do you have, if any?"                                                                                    
## [11] "Employment Status"                                                                                                         
## [12] "Monthly Household Income"                                                                                                  
## [13] "Have you ever had health insurance?"                                                                                       
## [14] "If yes, which insurance cover?"                                                                                            
## [15] "When was the last time you visited a hospital for medical treatment? (In Months)"                                          
## [16] "Did you have health insurance during your last hospital visit?"                                                            
## [17] "Have you ever had a routine check-up with a doctor or healthcare provider?"                                                
## [18] "If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?"
## [19] "Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?"                                                 
## [20] "If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?"
## [21] "Your Picture"                                                                                                              
## [22] "Your Picture_URL"                                                                                                          
## [23] "_id"                                                                                                                       
## [24] "_uuid"                                                                                                                     
## [25] "_submission_time"                                                                                                          
## [26] "_validation_status"                                                                                                        
## [27] "_notes"                                                                                                                    
## [28] "_status"                                                                                                                   
## [29] "_submitted_by"                                                                                                             
## [30] "__version__"                                                                                                               
## [31] "_tags"                                                                                                                     
## [32] "_index"
tail(Healthcare)
## # A tibble: 6 × 32
##   Location       `_Location_latitude` `_Location_longitude` `_Location_altitude`
##   <chr>                         <dbl>                 <dbl>                <dbl>
## 1 -1.2689389 36…                -1.27                  36.9                1618.
## 2 -1.2693104 36…                -1.27                  36.9                1618 
## 3 -1.2705219 36…                -1.27                  36.9                1595.
## 4 -1.2718084 36…                -1.27                  36.9                1595.
## 5 -1.2730717 36…                -1.27                  36.9                1595.
## 6 -1.2739374 36…                -1.27                  36.9                1595.
## # ℹ 28 more variables: `_Location_precision` <dbl>, `Date and Time` <dttm>,
## #   Age <chr>, Gender <chr>, `Marital Status` <chr>,
## #   `How many children do you have, if any?` <dbl>, `Employment Status` <chr>,
## #   `Monthly Household Income` <chr>,
## #   `Have you ever had health insurance?` <chr>,
## #   `If yes, which insurance cover?` <chr>,
## #   `When was the last time you visited a hospital for medical treatment? (In Months)` <dbl>, …

Rename columns

Healthcare <- Healthcare %>%
  rename(Children = 'How many children do you have, if any?', Insured = 'Have you ever had health insurance?', InsuranceName = 'If yes, which insurance cover?', Last_hospital_visit_months = 'When was the last time you visited a hospital for medical treatment? (In Months)' , Insured_last_visit ='Did you have health insurance during your last hospital visit?', routine_checkup = 'Have you ever had a routine check-up with a doctor or healthcare provider?', last_check_up_years = 'If you answered yes to the previous question, what time period (in years) do you stay before having your routine check-up?',  Cancerscreening ='Have you ever had a cancer screening (e.g. mammogram, colonoscopy, etc.)?', Interval_of_screening = 'If you answered yes to the previous question, what time period (in years) do you stay before having your Cancer screening?')
summary(Healthcare)
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6158        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2593    1st Qu.:36.38       1st Qu.:1348.9    
##  Mode  :character   Median :-0.7264    Median :36.87       Median :1592.9    
##                     Mean   :-0.7378    Mean   :36.72       Mean   :1536.4    
##                     3rd Qu.:-0.3781    3rd Qu.:37.15       3rd Qu.:1857.6    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :353        NA's   :353         NA's   :353       
##  _Location_precision Date and Time                        Age           
##  Min.   :   0.000    Min.   :2023-05-15 08:35:00.00   Length:6158       
##  1st Qu.:   4.100    1st Qu.:2023-06-16 10:18:30.00   Class :character  
##  Median :   4.820    Median :2023-06-23 10:22:00.00   Mode  :character  
##  Mean   :  71.217    Mean   :2023-06-23 12:19:03.48                     
##  3rd Qu.:   7.196    3rd Qu.:2023-06-30 15:33:30.00                     
##  Max.   :4900.000    Max.   :2023-07-27 12:00:00.00                     
##  NA's   :353         NA's   :148                                        
##     Gender          Marital Status        Children        Employment Status 
##  Length:6158        Length:6158        Min.   :     0.0   Length:6158       
##  Class :character   Class :character   1st Qu.:     1.0   Class :character  
##  Mode  :character   Mode  :character   Median :     2.0   Mode  :character  
##                                        Mean   :   147.2                     
##                                        3rd Qu.:     3.0                     
##                                        Max.   :800159.0                     
##                                        NA's   :625                          
##  Monthly Household Income   Insured          InsuranceName     
##  Length:6158              Length:6158        Length:6158       
##  Class :character         Class :character   Class :character  
##  Mode  :character         Mode  :character   Mode  :character  
##                                                                
##                                                                
##                                                                
##                                                                
##  Last_hospital_visit_months Insured_last_visit routine_checkup   
##  Min.   :   0.000           Length:6158        Length:6158       
##  1st Qu.:   2.000           Class :character   Class :character  
##  Median :   4.000           Mode  :character   Mode  :character  
##  Mean   :   6.652                                                
##  3rd Qu.:   8.000                                                
##  Max.   :2021.000                                                
##  NA's   :158                                                     
##  last_check_up_years Cancerscreening    Interval_of_screening Your Picture  
##  Length:6158         Length:6158        Length:6158           Mode:logical  
##  Class :character    Class :character   Class :character      NA's:6158     
##  Mode  :character    Mode  :character   Mode  :character                    
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Your Picture_URL      _id               _uuid          
##  Mode:logical     Min.   :230162389   Length:6158       
##  NA's:6158        1st Qu.:246958185   Class :character  
##                   Median :248624898   Mode  :character  
##                   Mean   :248260300                     
##                   3rd Qu.:249981696                     
##                   Max.   :258479375                     
##                                                         
##  _submission_time                 _validation_status  _notes       
##  Min.   :2023-04-05 08:44:06.00   Mode:logical       Mode:logical  
##  1st Qu.:2023-06-19 10:27:25.75   NA's:6158          NA's:6158     
##  Median :2023-06-26 09:56:12.50                                    
##  Mean   :2023-06-25 04:21:26.05                                    
##  3rd Qu.:2023-07-03 13:40:47.00                                    
##  Max.   :2023-08-07 09:12:14.00                                    
##                                                                    
##    _status          _submitted_by      __version__         _tags        
##  Length:6158        Length:6158        Length:6158        Mode:logical  
##  Class :character   Class :character   Class :character   NA's:6158     
##  Mode  :character   Mode  :character   Mode  :character                 
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##      _index    
##  Min.   :   1  
##  1st Qu.:1540  
##  Median :3080  
##  Mean   :3080  
##  3rd Qu.:4619  
##  Max.   :6158  
## 

Identify missing values

 Healthcare %>% map(~sum(is.na(.)))
## $Location
## [1] 353
## 
## $`_Location_latitude`
## [1] 353
## 
## $`_Location_longitude`
## [1] 353
## 
## $`_Location_altitude`
## [1] 353
## 
## $`_Location_precision`
## [1] 353
## 
## $`Date and Time`
## [1] 148
## 
## $Age
## [1] 18
## 
## $Gender
## [1] 17
## 
## $`Marital Status`
## [1] 18
## 
## $Children
## [1] 625
## 
## $`Employment Status`
## [1] 24
## 
## $`Monthly Household Income`
## [1] 259
## 
## $Insured
## [1] 19
## 
## $InsuranceName
## [1] 2519
## 
## $Last_hospital_visit_months
## [1] 158
## 
## $Insured_last_visit
## [1] 56
## 
## $routine_checkup
## [1] 23
## 
## $last_check_up_years
## [1] 4382
## 
## $Cancerscreening
## [1] 31
## 
## $Interval_of_screening
## [1] 4593
## 
## $`Your Picture`
## [1] 6158
## 
## $`Your Picture_URL`
## [1] 6158
## 
## $`_id`
## [1] 0
## 
## $`_uuid`
## [1] 0
## 
## $`_submission_time`
## [1] 0
## 
## $`_validation_status`
## [1] 6158
## 
## $`_notes`
## [1] 6158
## 
## $`_status`
## [1] 0
## 
## $`_submitted_by`
## [1] 1
## 
## $`__version__`
## [1] 0
## 
## $`_tags`
## [1] 6158
## 
## $`_index`
## [1] 0
 sum(duplicated(Healthcare))
## [1] 0
Healthcare <- Healthcare %>%
  select (-c(`_tags`, `__version__`, `_submitted_by`, `_status`, `Date and Time`, `_notes`, `_validation_status`,`_submission_time`, `Your Picture_URL`,`_id`, `Your Picture`,`_uuid`, '_index'))
summary(Healthcare)
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6158        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2593    1st Qu.:36.38       1st Qu.:1348.9    
##  Mode  :character   Median :-0.7264    Median :36.87       Median :1592.9    
##                     Mean   :-0.7378    Mean   :36.72       Mean   :1536.4    
##                     3rd Qu.:-0.3781    3rd Qu.:37.15       3rd Qu.:1857.6    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :353        NA's   :353         NA's   :353       
##  _Location_precision     Age               Gender          Marital Status    
##  Min.   :   0.000    Length:6158        Length:6158        Length:6158       
##  1st Qu.:   4.100    Class :character   Class :character   Class :character  
##  Median :   4.820    Mode  :character   Mode  :character   Mode  :character  
##  Mean   :  71.217                                                            
##  3rd Qu.:   7.196                                                            
##  Max.   :4900.000                                                            
##  NA's   :353                                                                 
##     Children        Employment Status  Monthly Household Income
##  Min.   :     0.0   Length:6158        Length:6158             
##  1st Qu.:     1.0   Class :character   Class :character        
##  Median :     2.0   Mode  :character   Mode  :character        
##  Mean   :   147.2                                              
##  3rd Qu.:     3.0                                              
##  Max.   :800159.0                                              
##  NA's   :625                                                   
##    Insured          InsuranceName      Last_hospital_visit_months
##  Length:6158        Length:6158        Min.   :   0.000          
##  Class :character   Class :character   1st Qu.:   2.000          
##  Mode  :character   Mode  :character   Median :   4.000          
##                                        Mean   :   6.652          
##                                        3rd Qu.:   8.000          
##                                        Max.   :2021.000          
##                                        NA's   :158               
##  Insured_last_visit routine_checkup    last_check_up_years Cancerscreening   
##  Length:6158        Length:6158        Length:6158         Length:6158       
##  Class :character   Class :character   Class :character    Class :character  
##  Mode  :character   Mode  :character   Mode  :character    Mode  :character  
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  Interval_of_screening
##  Length:6158          
##  Class :character     
##  Mode  :character     
##                       
##                       
##                       
## 
## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6158        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2593    1st Qu.:36.38       1st Qu.:1348.9    
##  Mode  :character   Median :-0.7264    Median :36.87       Median :1592.9    
##                     Mean   :-0.7378    Mean   :36.72       Mean   :1536.4    
##                     3rd Qu.:-0.3781    3rd Qu.:37.15       3rd Qu.:1857.6    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :353        NA's   :353         NA's   :353       
##  _Location_precision    Age          Gender      Marital Status
##  Min.   :   0.000    18-30:2183   Female:3034   Divorced: 461  
##  1st Qu.:   4.100    31-40:1777   Male  :3107   Married :3654  
##  Median :   4.820    41-50:1167   NA's  :  17   Single  :2025  
##  Mean   :  71.217    51-60: 618                 NA's    :  18  
##  3rd Qu.:   7.196    60+  : 395                                
##  Max.   :4900.000    NA's :  18                                
##  NA's   :353                                                   
##     Children            Employment Status    Monthly Household Income
##  Min.   :     0.0   Employed     :2177    10001-20000    :1433       
##  1st Qu.:     1.0   Self-employed:1836    20001-30000    :1260       
##  Median :     2.0   Unemployed   :2121    30001-40000    : 122       
##  Mean   :   147.2   NA's         :  24    40001-50000    : 719       
##  3rd Qu.:     3.0                         50001+         : 528       
##  Max.   :800159.0                         Less than 10000:1837       
##  NA's   :625                              NA's           : 259       
##    Insured          InsuranceName      Last_hospital_visit_months
##  Length:6158        Length:6158        Min.   :   0.000          
##  Class :character   Class :character   1st Qu.:   2.000          
##  Mode  :character   Mode  :character   Median :   4.000          
##                                        Mean   :   6.652          
##                                        3rd Qu.:   8.000          
##                                        Max.   :2021.000          
##                                        NA's   :158               
##  Insured_last_visit routine_checkup last_check_up_years Cancerscreening
##  No  :3264          No  :4342       Min.   :1.000       Mode:logical   
##  Yes :2838          Yes :1793       1st Qu.:1.000       NA's:6158      
##  NA's:  56          NA's:  23       Median :1.000                      
##                                     Mean   :1.641                      
##                                     3rd Qu.:2.000                      
##                                     Max.   :3.000                      
##                                     NA's   :4636                       
##  Interval_of_screening
##  Min.   :1.000        
##  1st Qu.:1.000        
##  Median :2.000        
##  Mean   :1.859        
##  3rd Qu.:2.000        
##  Max.   :3.000        
##  NA's   :5053
## tibble [6,158 × 19] (S3: tbl_df/tbl/data.frame)
##  $ Location                  : chr [1:6158] "-0.2742007 36.058336 1882.2000732421875 20.0" "-0.7158125 37.1475058 1361.9000244140625 20.0" "-0.7158157 37.1475082 1361.9000244140625 20.0" "-0.7157082 37.14749 1361.9000244140625 20.0" ...
##  $ _Location_latitude        : num [1:6158] -0.274 -0.716 -0.716 -0.716 -0.716 ...
##  $ _Location_longitude       : num [1:6158] 36.1 37.1 37.1 37.1 37.1 ...
##  $ _Location_altitude        : num [1:6158] 1882 1362 1362 1362 1362 ...
##  $ _Location_precision       : num [1:6158] 20 20 20 20 20 ...
##  $ Age                       : Factor w/ 5 levels "18-30","31-40",..: 3 1 3 1 3 1 2 1 2 2 ...
##  $ Gender                    : Factor w/ 2 levels "Female","Male": 1 2 1 2 2 1 1 2 1 1 ...
##  $ Marital Status            : Factor w/ 3 levels "Divorced","Married",..: 2 3 2 3 2 3 2 3 2 2 ...
##  $ Children                  : num [1:6158] 2 0 5 2 7 2 2 2 3 2 ...
##  $ Employment Status         : Factor w/ 3 levels "Employed","Self-employed",..: 2 3 2 2 2 3 2 2 1 1 ...
##  $ Monthly Household Income  : Factor w/ 6 levels "10001-20000",..: 2 6 2 1 2 1 1 6 2 1 ...
##  $ Insured                   : chr [1:6158] "Yes" "No" "No" "Yes" ...
##  $ InsuranceName             : chr [1:6158] "Nhif" NA "Nhif" "Nhif" ...
##  $ Last_hospital_visit_months: num [1:6158] 53 8 6 16 13 2 4 24 5 6 ...
##  $ Insured_last_visit        : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 1 2 2 ...
##  $ routine_checkup           : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 1 1 1 1 1 ...
##  $ last_check_up_years       : num [1:6158] 2 1 NA NA NA NA NA NA NA NA ...
##  $ Cancerscreening           : logi [1:6158] NA NA NA NA NA NA ...
##  $ Interval_of_screening     : num [1:6158] 2 NA NA NA NA NA NA NA NA NA ...
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 11 rows containing non-finite values (`stat_bin()`).

summary(Healthcare)
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6158        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2593    1st Qu.:36.38       1st Qu.:1348.9    
##  Mode  :character   Median :-0.7264    Median :36.87       Median :1592.9    
##                     Mean   :-0.7378    Mean   :36.72       Mean   :1536.4    
##                     3rd Qu.:-0.3781    3rd Qu.:37.15       3rd Qu.:1857.6    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :353        NA's   :353         NA's   :353       
##  _Location_precision    Age          Gender      Marital Status
##  Min.   :   0.000    18-30:2183   Female:3034   Divorced: 461  
##  1st Qu.:   4.100    31-40:1777   Male  :3107   Married :3654  
##  Median :   4.820    41-50:1167   NA's  :  17   Single  :2025  
##  Mean   :  71.217    51-60: 618                 NA's    :  18  
##  3rd Qu.:   7.196    60+  : 395                                
##  Max.   :4900.000    NA's :  18                                
##  NA's   :353                                                   
##     Children          Employment Status    Monthly Household Income
##  Min.   : 0.000   Employed     :2177    10001-20000    :1433       
##  1st Qu.: 1.000   Self-employed:1836    20001-30000    :1260       
##  Median : 2.000   Unemployed   :2121    30001-40000    : 122       
##  Mean   : 2.453   NA's         :  24    40001-50000    : 719       
##  3rd Qu.: 3.000                         50001+         : 528       
##  Max.   :13.000                         Less than 10000:1837       
##  NA's   :11                             NA's           : 259       
##    Insured          InsuranceName      Last_hospital_visit_months
##  Length:6158        Length:6158        Min.   :   0.000          
##  Class :character   Class :character   1st Qu.:   2.000          
##  Mode  :character   Mode  :character   Median :   4.000          
##                                        Mean   :   6.652          
##                                        3rd Qu.:   8.000          
##                                        Max.   :2021.000          
##                                        NA's   :158               
##  Insured_last_visit routine_checkup last_check_up_years Cancerscreening
##  No  :3264          No  :4342       Min.   :1.000       Mode:logical   
##  Yes :2838          Yes :1793       1st Qu.:1.000       NA's:6158      
##  NA's:  56          NA's:  23       Median :1.000                      
##                                     Mean   :1.641                      
##                                     3rd Qu.:2.000                      
##                                     Max.   :3.000                      
##                                     NA's   :4636                       
##  Interval_of_screening
##  Min.   :1.000        
##  1st Qu.:1.000        
##  Median :2.000        
##  Mean   :1.859        
##  3rd Qu.:2.000        
##  Max.   :3.000        
##  NA's   :5053

Marital Status

## # A tibble: 4 × 2
##   `Marital Status` Percentage
##   <fct>                 <dbl>
## 1 Married               59.3 
## 2 Single                32.9 
## 3 Divorced               7.49
## 4 <NA>                   0.29

Insured

Age Categories Insured

Male to Female Insured

Healthcare_cleaned <- Healthcare[!is.na(Healthcare$Gender) & !is.na(Healthcare$Insured), ]
ggplot(Healthcare_cleaned, aes(x = Gender, fill = Insured)) +
  geom_bar() +
  labs(title = "Insured vs. Uninsured by Gender", y = "") +
  theme_minimal()+
  scale_fill_manual(values = c("Yes" = "green", "No" = "purple"))+
  theme(
    legend.position = "right",
    panel.grid.major = element_blank(),  # Hide major grid lines
    panel.grid.minor = element_blank(),  # Hide minor grid lines
    axis.text.x = element_text(angle = 0, hjust = 0.5),
    plot.title = element_text(hjust = 0.5)  # Center the title
     
  )

Type of insurance

Healthcare_cover <- Healthcare %>%
  group_by(InsuranceName)%>%
  
  summarise(Count = n()) 
print(Healthcare_cover)
## # A tibble: 227 × 2
##    InsuranceName   Count
##    <chr>           <int>
##  1 "\r\nNHIF"          3
##  2 ", NHIF"            1
##  3 "."                 3
##  4 "0"                 1
##  5 "1"                 1
##  6 "2"                 2
##  7 "5"                 1
##  8 "800165"            1
##  9 "AAR"              22
## 10 "AAR\r\nBritam"     1
## # ℹ 217 more rows

Clean Insurance

## # A tibble: 135 × 2
##    InsuranceName    Count
##    <chr>            <int>
##  1 ""                  19
##  2 "."                  3
##  3 "0"                  1
##  4 "1"                  1
##  5 "2"                  2
##  6 "5"                  1
##  7 "800165"             1
##  8 "Aar"               23
##  9 "Aar Insurance"      2
## 10 "Absa Bank Care"     1
## # ℹ 125 more rows
## # A tibble: 30 × 2
##    InsuranceName Count
##    <chr>         <int>
##  1 NA             2851
##  2 NHIF           2810
##  3 Jubilee         149
##  4 Britam          125
##  5 APA Insurance   108
##  6 CIC              56
##  7 Unknown          28
##  8 AAR              26
##  9 Health Cover     19
## 10 DirectLine       18
## # ℹ 20 more rows
## # A tibble: 12 × 3
##    InsuranceName  Count Percentage
##    <chr>          <int>      <dbl>
##  1 NA              2851     45.6  
##  2 NHIF            2810     44.9  
##  3 Jubilee          149      2.38 
##  4 Britam           125      2.00 
##  5 APA Insurance    108      1.73 
##  6 CIC               56      0.895
##  7 Others            52      0.831
##  8 Unknown           28      0.448
##  9 AAR               26      0.416
## 10 Health Cover      19      0.304
## 11 DirectLine        18      0.288
## 12 UAP OLD MUTUAL    13      0.208

Percentage

summary(expanded_dataset)
##    Location         _Location_latitude _Location_longitude _Location_altitude
##  Length:6255        Min.   :-4.0519    Min.   :34.09       Min.   :-201.3    
##  Class :character   1st Qu.:-1.2610    1st Qu.:36.38       1st Qu.:1348.4    
##  Mode  :character   Median :-0.7259    Median :36.87       Median :1592.9    
##                     Mean   :-0.7382    Mean   :36.71       Mean   :1534.6    
##                     3rd Qu.:-0.3783    3rd Qu.:37.15       3rd Qu.:1857.5    
##                     Max.   : 1.8422    Max.   :39.69       Max.   :2988.5    
##                     NA's   :359        NA's   :359         NA's   :359       
##  _Location_precision    Age          Gender      Marital Status
##  Min.   :   0.000    18-30:2200   Female:3079   Divorced: 469  
##  1st Qu.:   4.100    31-40:1803   Male  :3159   Married :3723  
##  Median :   4.820    41-50:1194   NA's  :  17   Single  :2045  
##  Mean   :  71.521    51-60: 635                 NA's    :  18  
##  3rd Qu.:   7.116    60+  : 404                                
##  Max.   :4900.000    NA's :  19                                
##  NA's   :359                                                   
##     Children          Employment Status    Monthly Household Income
##  Min.   : 0.000   Employed     :2241    10001-20000    :1448       
##  1st Qu.: 1.000   Self-employed:1862    20001-30000    :1276       
##  Median : 2.000   Unemployed   :2128    30001-40000    : 126       
##  Mean   : 2.464   NA's         :  24    40001-50000    : 733       
##  3rd Qu.: 3.000                         50001+         : 571       
##  Max.   :13.000                         Less than 10000:1841       
##  NA's   :11                             NA's           : 260       
##    Insured          InsuranceName      Last_hospital_visit_months
##  Length:6255        Length:6255        Min.   :   0.000          
##  Class :character   Class :character   1st Qu.:   2.000          
##  Mode  :character   Mode  :character   Median :   4.000          
##                                        Mean   :   6.663          
##                                        3rd Qu.:   8.000          
##                                        Max.   :2021.000          
##                                        NA's   :159               
##  Insured_last_visit routine_checkup last_check_up_years Cancerscreening
##  No  :3276          No  :4385       Min.   :1.000       Mode:logical   
##  Yes :2923          Yes :1847       1st Qu.:1.000       NA's:6255      
##  NA's:  56          NA's:  23       Median :1.000                      
##                                     Mean   :1.626                      
##                                     3rd Qu.:2.000                      
##                                     Max.   :3.000                      
##                                     NA's   :4683                       
##  Interval_of_screening
##  Min.   :1.000        
##  1st Qu.:1.000        
##  Median :2.000        
##  Mean   :1.858        
##  3rd Qu.:2.000        
##  Max.   :3.000        
##  NA's   :5114

Routine CheckUp vs Insured

Employment

Employment Status and Insurance Coverage