Effect of Dropout on RNA Classification by CNN
DOI:
https://doi.org/10.55632/pwvas.v94i1.931Keywords:
Machine learningAbstract
Long RNA sequences can be classified as long non-coding RNA (lncRNA) or protein-coding messenger RNA (mRNA). Automatic classification, based on sequence alone, could benefit biology and medical science. We trained and evaluated a convolutional neural network (CNN) to classify human RNA sequences. The CNN incorporated dropout, a technique that restricts the network to a random portion of its neurons during training. Dropout can reduce overfitting, which means relying on irrelevant aspects of the data to “memorize” the training set. We varied the dropout rate during training and measured the accuracy during testing for an RNA classification task. At dropout rates of 0.5, 0.6, 0.7, and 0.8, the CNN test accuracy was 93.15%, 90.50%, 89.95%, and 88.35%. We conclude that dropout rates above 50% did not improve learning. In the future, we hope to measure the effects of other hyperparameters and models for this classification task.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Proceedings of the West Virginia Academy of Science applies the Creative Commons Attribution-NonCommercial (CC BY-NC) license to works we publish. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.