Classification of Cell Type Using NN on scRNA-seq Data


There are many applications of single cell RNA-seq and the computational challenges that arise from this complex data. In this project, we will work directly with scRNA-seq data in a real-world application. 

Background on the Dataset

Peripheral blood mononuclear cells (PBMCs) are a group of different cell types in our blood, a subset of what we call white blood cells. They include many important parts of our immune system: T cells, B cells, natural killer cells, and so on. These cells are responsible for our innate and adaptive immune responses. When we get infected or vaccinated, they kick into high gear. In different situations, or with different diseases, the mix of different cell types shifts and so does the gene expression within one type of cell. PBMCs can be isolated very easily from patient blood samples, just by spinning the blood in a centrifuge that separates different cell types by weight. So, looking at these cells can be very informative. 

Part I. Autoencoder

Starting from scRNA-seq data from PBMCs, we will look at the cells in a lower-dimension space. We will first implement an autoencoder to find a latent space representation of our data. Then, we will compare two-dimensional representations of our data using t-SNE, PCA, and the latent space defined by our autoencoder. 

Final_Project_Part_1_Autoencoder

Part II. Classification

labeled in the original data. We will use a few different ideas for classifier methods to test on what accuracy we can achieve with each of them. Specifically, here we will use SVC with AdaBoostClassifier and compare it to Random Forest.

Final_Project_Part_2_Classification