Analysis and Predict Heart Disease with Machine Learning

Lukas Kristianto
7 min readOct 8, 2022

--

Discover which factor should people keep an eye on for potential Cardiovascular Diseases (CVDs) based on historical data.

Summary

This study aims to figure out which factor in Cardiovascular Diseases (CVDs) should get attention, based on the list of reported Cardiovascular Diseases (CVDs) cases published by the several observation from Cleveland, Hungarian, Switzerland, Long Beach VA, and Stalog (Heart) Data Set. This study wherein a machine learning model can be great of help for the people with cardiovascular disease or who are at high cardiovascular risk need early detection and management.

Introduction

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age.

The most important behavioural risk factors of heart disease and stroke are unhealthy diet, physical inactivity, tobacco use and harmful use of alcohol. The effects of behavioural risk factors may show up in individuals as raised blood pressure, raised blood glucose, raised blood lipids, and overweight and obesity. These “intermediate risks factors” can be measured in primary care facilities and indicate an increased risk of heart attack, stroke, heart failure and other complications.

This article explores heart failure using dataset contains 11 features that can be used to predict a possible heart disease. This dataset was created by combining different datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes.

This article caters following questions:

  • Which factor contribute most to the number of Heart Disease cases in the world?
  • Are there any regular behaviors that could be helpful to detect potential Heart Disease cases going forward?

The analysis is performed using Python. Detailed steps to perform analysis refer to here.

Data Features

Here are the features considered for this study.

  • Age (The age of the patient, in years)
  • Sex (The patient’s gender, M or F)
  • Chest pain type (Asymptomatic, Typical Angina, Atypical Angina, and Non-Anginal Pain)
  • Resting BP (The patient’s resting blood pressure in mmHG)
  • Cholestrol (The patient’s serum cholestrol in mg/dl)
  • Fasting Blood Sugar (The patient’s fasting blood sugar)
  • Resting ECG (The patient’s resting ECG)
  • Max Heart Rate (The patient’s maximum heart rate achieved in beats per min)
  • Exercise Angina (Exercise included angina)
  • Oldpeak (The patient’s numeric measure of ST depression included by exercise relative to rest)
  • ST Slope (The slope of the peak exercise ST segment)

Findings

Which factor contribute most to the number of Heart Diseases cases in the world?

Pair Plot Heart Disease

From the pairplot, we can see that the chances of heart disease are higher for people in the following categories:

  1. Older people
  2. People with higher resting blood pressure.
  3. People with a higher cholesterol level.
  4. Older people with high OldPeak numbers.
  5. Older people with higher cholesterol.
  6. People with high max HR.

We are looking into chest pain type to impact heart disease

Chest Pain Bar Chart

By looking at the above chart, we can see that not because a person is asymptomatic that doesn’t mean they don’t have a risk of having a heart disease. We’ve expected a low chance of heart disease for people experiencing atypical angina. Why is that? Atypical angina is used to describe pain that does not fit the typical representation. Instead of it being heart related, most causes of atypical angina is brought on by respiratory, musculoskeletal, and gastrointestinal diseases.

We may confuse non anginal pain with heart disease as you usually feel it behind the breast bone and resembles heart pain. It is usually caused by muscle or bone problems, lung problems, and sometimes stomach problems such as ulcers. That is why there is a lower chance of heart disease for people that experienced non anginal pain. Typical anginal pain represents heart disease and that is why the chances of a person having a heart disease or not is close.

Fasting Blood Sugar Chart

Too much blood sugar can contribute to a build up of plaque in your arteries, which can eventually restrict the amount of blood flowing to your vital organs. Heart disease can occur as a result. The chart above clearly shows that someone with a FastingBS of higher than 120 mg/dl has a high chance of getting heart disease.

Correlation Heart Disease

We can see from above that your chance of having heart disease is correlated with OldPeak and Age, and reverse correlated with MaxHR.

Are there any regular behaviors that could be helpful to detect potential Heart Disease cases going forward?

Based on the figure below, it is understood that Heart Disease cases rising the ages between 45–68.

Heart Disease Frequency for Ages

Looking at the gender factor, the number of cases heart disease mostly happen at male gender but female probably also can have heart disease.

Heart Disease Frequency for Genders

The most patients who experience asymptomatic chest pain are diagnosed with heart attack while the least being typical angina. By data about 79% of patients who experienced asymptomatic chest pain got heart disease while the remaining 21% didn’t get one.

Heart Disease Frequency for Chest Pains

Resting ECG is a test that measures the electrical activity of the heart.

Based result of resting ECG, with normal result didn’t give much impact to detect heart disease, and then with LVH and ST result can be sign as a heart disease.

Heart Disease Frequency for Resting ECG

An ST-elevation myocardial infarction (STEMI) is a type of heart attack that mainly affects your heart’s lower chambers. They are named for how they change the appearance of your heart’s electrical activity on a certain type of diagnostic test.

When the ST Slope is Flat, there are more cases of a heart disease, about 83%. When the ST Slope if upsloping, about 80% of the times, according to our dataset we can say that the patient won’t getting a Heart Disease.

Heart Disease Frequency for ST Slope

According to our dataset, when the fasting blood sugar is below 121 mg/dl, about 48% of the patients were diagnosed with a heart disease.

When fasting blood sugar is greater 120 mg/dl, about 79% of the patients were diagnosed with a heart disease.

Heart Disease Frequency for Fasting Blood Sugar

Exercise Angina is a type of chest pain caused by reduced blood flow to the heart.

When exercise angina is Yes, our data shows, about 85% of the patients were diagnosed with a heart disease. In the absence of angina, only 35% of the patients were diagnosed with a heart disease.

Heart Disease Frequency for Exercise Angina

Prediction

We are using 5 models prediction for detect patients have heart disease or not.

  1. Logistic Regression
  2. Decision Tree
  3. K-Nearest Neighbor
  4. SVM
  5. Random Forest

Based the figure bellow the best accuracy is from Random Forest with 88%. For more detail can see here.

+---------------------+----------+-----------+--------+----------+
| Estimators | Accuracy | Precision | Recall | F1-Score |
+---------------------+----------+-----------+--------+----------+
| Random Forest | 0.88 | 0.90 | 0.90 | 0.90 |
| K-Nearest Neighbor | 0.86 | 0.89 | 0.87 | 0.88 |
| SVM | 0.86 | 0.89 | 0.87 | 0.88 |
| Logistic Regression | 0.84 | 0.89 | 0.84 | 0.86 |
| Decision Tree | 0.81 | 0.86 | 0.81 | 0.84 |
+---------------------+----------+-----------+--------+----------+

Conclusion

The ratio of heart failure patients has been increasing everyday. To overcome this dangerous situation and deteriorate the chances of heart failure disease, there is need for a system that can generate, rule or classify the data using machine learning approach. This project tested different models, and then proposed the best model for predicting people with heart failure disease provided that dataset containing various symptoms used in the project is available. The model can further be implemented to be a system for doctors and heart surgeons for timely diagnosis of the chances for heart attack in patients

--

--

Lukas Kristianto
Lukas Kristianto

Written by Lukas Kristianto

Senior Software Engineer Android and Artificial Intelligence

No responses yet