🩺 Medical Machine Learning: Pneumonia Diagnosis

Evaluating the effectiveness of Machine Learning in diagnosing Pneumonia using clinical tabular data and raw chest X-ray images.

Project Overview

Pneumonia is a severe respiratory infection that affects a significant portion of the global population. This project investigates whether machine learning models can assist clinical and non-clinical staff in diagnosing pneumonia efficiently. The investigation is split into two main approaches:

Analyzing tabular clinical data using standard classification algorithms and ensembles.
Classifying raw chest X-ray images directly using a Support Vector Machine (SVM).

Data Processing Pipeline

To ensure the models were trained on high-quality data, several preprocessing steps were applied to the raw pneumonia_raw.csv dataset:

Data Cleaning: Removed duplicate records and filtered out numerical outliers (e.g., negative values for consolidation dimensions).
Categorical Encoding: Converted the categorical target feature into a numerical format using label encoding.
Feature Selection: Dropped irrelevant identifiers, such as the Patient ID, based on correlation matrix results.
Scaling: Applied Standard Scaling via pipelines to ensure all features had a mean of 0 and a standard deviation of 1.
Imbalance Handling: Utilized stratified sampling and the class_weight='balanced' hyperparameter to counteract the higher volume of positive pneumonia cases in the dataset.

Tabular Data Classification

Five standalone models and four ensembles were evaluated using K-Fold Cross Validation and Confusion Matrices to determine the best approach for the tabular clinical data.

Model Performance Summary

Model	Hyperparameters Tuned	Accuracy
Support Vector Machine	kernel='rbf', gamma=5	71.6%
Random Forest (Ensemble 1)	n_estimators=150, max_depth=5	67.2%
Voting Ensemble 2	estimators = DT, KNN, LR	65.5%
K-Nearest Neighbors	n_neighbors=7	64.7%
Final Ensemble (RF, SVM, LR)	Diverse hyperparameter settings	64.7%
Ensemble of Ensembles	estimators = Ensemble 1 & 2	64.7%
Logistic Regression	solver='liblinear', max_iter=150	62.9%
Decision Tree	max_depth=5, min_samples_split=2	60.3%
Gaussian Naive Bayes	var_smoothing=1e-8	59.5%

Raw X-Ray Image Classification

Instead of relying solely on clinical measurements, an AI approach was deployed to directly analyze chest X-ray images.

Image Processing: Images were loaded, resized to 128x128 pixels, converted to grayscale, and flattened into 1D arrays to be compatible with standard machine learning classifiers.
Model Used: Support Vector Machine (Linear Kernel)
Result: The model achieved an Accuracy of 75.0%. It demonstrated strong precision and recall across both the "Pneumonia" and "Normal" classes, proving that supervised learning can effectively extract diagnostic patterns directly from pixel data.

Conclusions

Machine learning can successfully assist in diagnosing pneumonia. While individual models like the Support Vector Machine performed best on the clinical tabular data, ensemble methods proved highly reliable by combining the strengths of diverse algorithms. Furthermore, the 75.0% accuracy achieved on the raw X-ray dataset confirms that AI can bypass manual clinical measurements and analyze radiologic imaging directly with a high degree of success.

References

Rajaraman, S., et al. (2020). Efficient pneumonia detection in chest X-ray images using deep learning. BMC Medical Imaging.
Rajpurkar, P., et al. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint.
Pan, Z., et al. (2023). Diagnosis and detection of pneumonia using weak-label based on X-ray images. BMC Medical Imaging.

Requirements

To run this notebook, you will need:

Python 3
Libraries: pandas, numpy, scikit-learn
Platform: Optimized for Google Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
1CWK100.ipynb		1CWK100.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩺 Medical Machine Learning: Pneumonia Diagnosis

Project Overview

Data Processing Pipeline

Tabular Data Classification

Model Performance Summary

Raw X-Ray Image Classification

Conclusions

References

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🩺 Medical Machine Learning: Pneumonia Diagnosis

Project Overview

Data Processing Pipeline

Tabular Data Classification

Model Performance Summary

Raw X-Ray Image Classification

Conclusions

References

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages