BIGDATA: IA: Collaborative Research:
Parsimonious Anomaly Detection for Sequencing Data

Principal Investigators:
Roummel Marcia, University of California, Merced
Suzanne Sindi, University of California, Merced
Jennifer Erway, Wake Forest University
Supported by NSF Grants 1741490 and 1741264


Abstract. Genomes contain the complete set of instructions for building an organism. Structural variants are rearrangements in the genome such as insertions and deletions, whose discovery advances the understanding of the evolution and the adaptability of species. Recent advances in high-throughput sequencing technologies have led to the collection of vast quantities of genomic data. Because of this, fast and robust algorithms are needed to identify structural variants, which are rare and are prone to noise. This research will contribute fundamentally to optimization methods for large-scale problems in computational genomics. The algorithms will be disseminated publicly for use within and outside the biology, mathematics, and computer science community. Graduate students will be trained in scientific research and programming through this interdisciplinary research, and the participation of students from under-represented backgrounds will be highly encouraged.

The research objective of this award is to develop computational tools for large-scale data-driven problems arising in computational genomics. These problems are especially difficult to solve since they are high-dimensional and the data are noisy and inexact. This study will take advantage of known relationships in sequenced genomes to improve the accuracy of identifying genomic variants in population studies when there is both low coverage in the data and multiple related individuals are sequenced. Specifically, the proposed research will (i) explore statistical models for describing the presence of structural variants in genomes, (ii) develop and implement novel sparse optimization methods for genomic structural variant detection, and (iii) validate on existing genomic data sets and predict on new data.

Publications. This research grant has thus far resulted in the following articles:

[20] Deep convolutional autoencoders for deblurring and denoising low-resolution images,
       M. Mendez Jimenez, O. DeGuchy, and R. Marcia,
       Accepted to the 2020 International Symposium on Information Theory and Its Applications.

[19] Related inference: A supervised learning approach to detect signal variation in genome data,
       M. Banuelos, O. DeGuchy, S. Sindi, and R. Marcia,
       Accepted to the 2020 European Signal Processing Conference.

[18] Genomic signal processing for variant detection in diploid parent-child trios,
       M. Banuelos, M. Spence, R. Marcia, and S. Sindi,
       Accepted to the 2020 European Signal Processing Conference.

[17] Large-scale quasi-Newton trust-region methods with low-dimensional linear equality constraints,
       J. Brust, R. Marcia, and C. Petra,
       Computational Optimization and Applications, 74:3, pp. 669-701, 2019.
       [doi]

[16] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data,
       M. Spence, M. Banuelos, and R. Marcia, and S. Sindi,
       Methods, 173, p. 61-68, 2020.
       [doi]

[15] Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite
       Hessian Approximations,
       J. Erway, J. Griffin, R. Marcia, and R. Omheni,
       Optimization Methods and Software, 35:3, pp. 460-487, 2020.
       [doi]

[14] Computationally efficient decompositions of oblique projection matrices,
       J. Brust, R. Marcia, and C. Petra,
       SIAM Journal on Matrix Analysis and Applications, 41:2, pp. 852-870, 2020.
       [doi]

[13] Image disambiguation with deep neural networks,
       O. DeGuchy, A. Ho, and R. Marcia,
       Proceedings of the SPIE Applications of Machine Learning in San Diego, CA.
       [doi]

[12] Asynchronous parallel pattern search methods for parameter tuning in sparse signal reconstruction,
       O. DeGuchy and R. Marcia,
       Proceedings of the 2019 SPIE Wavelets and Sparsity XVIII in San Diego, CA.
       [doi]

[11] Predicting novel and inherited variants in parent-child trios,
       M. Spence, M. Banuelos, and R. Marcia, and S. Sindi,
       Proceedings of the 2019 IEEE International Symposium on Medical Measurements and Applications in
       Istanbul, Turkey.
       [doi]

[9] Deep neural networks for low-resolution photon-limited imaging,
       O. DeGuchy, F. Santiago, M. Banuelos, and R. Marcia,
       Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing in
       Brighton, UK.
       [doi]

[8] A dense initialization for limited-memory quasi-Newton methods,
       J. Brust, O. Burdakov, J. Erway, and R. Marcia,
       Computational Optimization and Applications, 74:1, pp. 121-142, 2019.
       [doi]

[7] Negative binomial optimization for biomedical structural variant signal reconstruction,
       M. Banuelos, S. Sindi, and R. Marcia,
       Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing in
       Calgary, Canada.
       [doi]

[6] Structural variant prediction in extended pedigrees through sparse negative binomial genome signal recovery,
       M. Banuelos, S. Sindi, and R. Marcia,
       Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology
       Society in Honolulu, HI.
       [doi]

[5] Structural variant prediction in extended pedigrees through sparse negative binomial genome signal recovery,
       M. Banuelos, S. Sindi, and R. Marcia,
       Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology
       Society in Honolulu, HI.
       [doi]

[4] Improving L-BFGS initialization for trust-region methods in deep learning,
       J. Rafati and R. Marcia,
       Proceedings of the 2018 IEEE Conference on Machine Learning and Applications in Orlando, FL.
       [doi]

[3] Detecting novel structural variants in genomes by leveraging parent-child relatedness,
       M. Spence, M. Banuelos, and R. Marcia, and S. Sindi,
       Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine in Madrid, Spain.
       [doi]

[2] Trust-Region Minimization Algorithm for Training Responses (TRMinATR): The Rise of Machine
       Learning Techniques,
       J. Rafati, O. DeGuchy, and R. Marcia,
       Proceedings of the the 26th European Signal Processing Conference (EUSIPCO 2018) in Rome, Italy
       [doi]

[1] Compact representation of the full Broyden class of quasi-Newton updates,
       O. DeGuchy, J. Erway, and R. Marcia,
       Numerical Linear Algebra with Applications, 25:5, p. e2186, 2018.
       [doi]

Presentations. The results of this grant have been presented at the following conferences, workshops, and seminars:

2019 International Congress on Industrial and Applied Mathematics, Valencia, Spain, July, 2019 (link)
2019 IEEE International Symposium on Medical Measurements and Applications, Istanbul, Turkey, June 2019 (link)
2019 Spring Central and Western Joint Sectional Meeting, Honolulu, HI, March, 2019 (link)
2019 SIAM Conference on Computational Science and Engineering, Spokane, WA, February, 2019 (link)
2018 International Congress of Mathematicians, Rio de Janeiro, Brazil, August, 2018 (link)
40th Annual Intl. Conf. of the IEEE Engineering in Medicine and Biology Society, Honolulu, HI, July, 2018 (link)
2018 SIAM Annual Meeting, Portland, OR, July, 2018 (link)
2018 IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, April, 2018 (link)
2017 Optimization Methods and Software Conference, Havana, Cuba, December, 2017 (link)

Any opinions, findings and conclusions or recommendations expressed in the publications supported by this grant are those of the author(s) and do not necessarily reflect the views of the NSF.