Skip to content

Ashwashhere/MedicalMachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

🩺 Medical Machine Learning: Pneumonia Diagnosis

Evaluating the effectiveness of Machine Learning in diagnosing Pneumonia using clinical tabular data and raw chest X-ray images.


Project Overview

Pneumonia is a severe respiratory infection that affects a significant portion of the global population. This project investigates whether machine learning models can assist clinical and non-clinical staff in diagnosing pneumonia efficiently. The investigation is split into two main approaches:

  • Analyzing tabular clinical data using standard classification algorithms and ensembles.
  • Classifying raw chest X-ray images directly using a Support Vector Machine (SVM).

Data Processing Pipeline

To ensure the models were trained on high-quality data, several preprocessing steps were applied to the raw pneumonia_raw.csv dataset:

  • Data Cleaning: Removed duplicate records and filtered out numerical outliers (e.g., negative values for consolidation dimensions).
  • Categorical Encoding: Converted the categorical target feature into a numerical format using label encoding.
  • Feature Selection: Dropped irrelevant identifiers, such as the Patient ID, based on correlation matrix results.
  • Scaling: Applied Standard Scaling via pipelines to ensure all features had a mean of 0 and a standard deviation of 1.
  • Imbalance Handling: Utilized stratified sampling and the class_weight='balanced' hyperparameter to counteract the higher volume of positive pneumonia cases in the dataset.

Tabular Data Classification

Five standalone models and four ensembles were evaluated using K-Fold Cross Validation and Confusion Matrices to determine the best approach for the tabular clinical data.

Model Performance Summary

Model Hyperparameters Tuned Accuracy
Support Vector Machine kernel='rbf', gamma=5 71.6%
Random Forest (Ensemble 1) n_estimators=150, max_depth=5 67.2%
Voting Ensemble 2 estimators = DT, KNN, LR 65.5%
K-Nearest Neighbors n_neighbors=7 64.7%
Final Ensemble (RF, SVM, LR) Diverse hyperparameter settings 64.7%
Ensemble of Ensembles estimators = Ensemble 1 & 2 64.7%
Logistic Regression solver='liblinear', max_iter=150 62.9%
Decision Tree max_depth=5, min_samples_split=2 60.3%
Gaussian Naive Bayes var_smoothing=1e-8 59.5%

Raw X-Ray Image Classification

Instead of relying solely on clinical measurements, an AI approach was deployed to directly analyze chest X-ray images.

  • Image Processing: Images were loaded, resized to 128x128 pixels, converted to grayscale, and flattened into 1D arrays to be compatible with standard machine learning classifiers.
  • Model Used: Support Vector Machine (Linear Kernel)
  • Result: The model achieved an Accuracy of 75.0%. It demonstrated strong precision and recall across both the "Pneumonia" and "Normal" classes, proving that supervised learning can effectively extract diagnostic patterns directly from pixel data.

Conclusions

Machine learning can successfully assist in diagnosing pneumonia. While individual models like the Support Vector Machine performed best on the clinical tabular data, ensemble methods proved highly reliable by combining the strengths of diverse algorithms. Furthermore, the 75.0% accuracy achieved on the raw X-ray dataset confirms that AI can bypass manual clinical measurements and analyze radiologic imaging directly with a high degree of success.

References

  • Rajaraman, S., et al. (2020). Efficient pneumonia detection in chest X-ray images using deep learning. BMC Medical Imaging.
  • Rajpurkar, P., et al. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint.
  • Pan, Z., et al. (2023). Diagnosis and detection of pneumonia using weak-label based on X-ray images. BMC Medical Imaging.

Requirements

To run this notebook, you will need:

  • Python 3
  • Libraries: pandas, numpy, scikit-learn
  • Platform: Optimized for Google Colab.

About

This project is a machine learning investigation focused on predicting the presence of pneumonia based on patient clinical data and X-ray features. The work is structured as a comparative analysis where multiple classification models are built, evaluated, and optimized to find the most accurate diagnostic tool.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors