principal software engineer @ red hat
- 247 1166 (area code == 520)
- Developed, reviewed, maintained code in the ASF as a PMC and Commiter : Apache BigTop (the open source hadoop distribution), and Apache CTakes (the medical text analytics framework).
- Engineered, and maintained large components of ASF BigTop deployment, Integration and Smoke testing frameworks; The BigPetStore application; Distributing Containerizing Ctakes using Spark; HCFS initiative (https://wiki.apache.org/hadoop/HCFS/).
At Red Hat I work at the nexus of various emerging technologies, including PAAS platforms, distributed filesystems, and batch analytics platforms (corresponding buzzwords = kubernetes, glusterfs, ceph, spark, hadoop). Collaborated across multiple teams and communities building open source as well as internal IP consulting around the areas of immutable infrastructure, machine learning, data mining, and CI/CD.
- Distributed systems engineer. contributor and maintainer on Kubernetes platform (mostly golang). Contributed core features to the HTTP client performance, high availability tooling, developer automation, and E2E testing suites, lead and/or participated in several SIG's with the community.
- Implemented several optimizations and improvements to the kubernetes scheduler, including scale, performance, and cache optimizations which are only discoverable in large deployments / clusters of 1000s of nodes/pods.
- Mentored and trained several new developers on the internals of idiomatic kubernetes development and community practices, golang tooling, and on kubernetes framework engineering internals.
- Worked in several ASF projects and communities on behalf of Red Hat to increase our interop with open source big data projects, particularly in the Hadoop ecosystem (specifically BigTop, Apache Hadoop). Also wrote open source blueprints for internal evaluation and testing of Flink, Spark, Hadoop.
- Worked at the intersection of our interests in middleware, scalability, and bigdata, building automation and POCs around containerized I/O between big data frameworks (i.e. spark) and middleware data abstractions (TEEID/JDV).
- Implemented proof of concepts for spark, cassandra on kubernetes, maintained upstream end-to-end compatibility in the kubernetes community for bigdata framework validation.
- Built one of the first containerized, generic SparkStreaming blueprint applications (forked and used by 100s of individual developers) and other blueprint applications for bootstrapping bigdata workflows.
- Mentoring and training developers on integration testing for the hadoop ecosystem, microservice architectures, reproducible deployments with vagrant, internal cloud usage. Also spent a large amount of time with development tooling and engineering higher quality internal application blueprints and deployments.
- Founded boutique consulting firm, Rudolf Inc. Managed all financial affairs and contracts, presented, delivered technical solutions in the bigdata and web-data spaces to various multi-million dollar companies in Europe and the United States.
- Reported directly to the Head of Research and CTO @peerindex on BigData pipeline, status, and implementing architecture changes.
- Engineered Java and Hadoop pipeline MapReduce PISAE (Peerindex social action engine) platform. Development of our backend MapReduce java systems - largely based on ingestion and feature extraction; Real time streaming for tweets from high profile social entities; EMR and hadoop administratoin on an as needed basis, for 200+ node clusters;
- Coordinated the launch of our new site which was completely driven using a key-value storage (DynamoDB) and HDFS data backend, millisecond latency SLAs.
- Implemented algorithms alongside data scientists, engineering directors, and the front-end team to ensure that data quality standards for 100s of millions of outputs were always met using a rock solid data model contract.
- Reverse engineered of UML design for various specifications in accountancy systems for clients.
- Designed custom LISP /Python applications for ensuring transaction completeness in large, distributed MySQL systems.
- Evaluated of e-commerce solutions (i.e. google checkout, paypal) with home grown product inventory and management systems (seikelceramics.com)
- Built Full stack, rapid software prototying, informatics consulting, and helping individuals get up to speed with data mining and machine learning related initiatives.
- Bioinformatics Consulting with UNLV's Biomedical Sciences department on protein analytics, literature mining software.
- Developer/designer of a highly interactive client side application for fully integrated molecular visualization platform (VENN) which allows for Jmol based 3D analysis of protein evolutionary conservation (See Publications). Created a cross platform VM for NMR data processing (vagrant, virtualbox, and ubuntu).
- Deployed plug-in based modules for implementation of molecular analysis algorithms in a single computational proteomics framework. Optimized and changed features of the application to suit emergent needs, such as protein domain oriented analyses.
- Data Warehousing/Modelling of Functional Minimotifs Designed a Hibernate API and MySQL data repository along with an java based which integrated proteomic information spanning protein motifs, functional annotations, taxonomy data, sequences, and protein domains.
- Architected an Expert System (MIMOSA) which automatically implemented various text mining and correlation scoring algorithms using ontologies for 5,000,000 publically available medical abstracts. Developed a plug in oriented Graphical User Interface using the Java Swing framework which enabled database driven, high throughput annotation of "minimotifs".
- Collaborated on several database aspects of the NIH funded, publically available, web based Minimotif miner application (http://mnm.engr.uconn.edu). Engineered Java API's for reading/writing of large, binary FID representations present in vendor-specific NMR spectral data types in support of an open-source translation and conversion API.
- Prototyped a Swing-based workflow building environment for time domain NMR data processing, as a custom, visual, 2D graphical application which allowed for on the fly creation of data-processing "actors", with reloadable and persistent state and associated, workflows which triggered offline data processing tasks.
- Designed, architected of an integrated NMR visual data mining platform, as part of the Rudolf project using and Clojure to wrap existing Java API's.
- Presented work at premier scientific conferences at conferences (ICBMRS, Protein Folding Symposium, New England Structural Biology, Exp Nuclear Magnetic processing conference).
Proteomics data federation apps; published several (10+) articles in medium/high tier journals on algorithmic and visualization advances in data-integration of medical, biological, genomic data. Computational protein analysis structure, sequence, and functional analysis; workflow builders for NMR data processing and annotation of peptide sequences with respect to phenotypes, medical conditions.
Created an NMR software integration environment for protein structure calculation and data processing. This led to a PhD and several other adventures at the interface of data visualization, integration, and bioinformatics.
Mathematics (major) & Computer Science (minor).
A list of my publications: My primary research interests during my academic career was in data integration and knowledge extraction - and I was the primary developer and architect for the VENN Software platform for homology titration (Nucleic Acids Research, 2009) and the MIMOSA System for minimotif annotation (BMC Bioinformatics, 2009), which are also listed below. Another interesting project I engineered was an interactive, data-driven protein mining application, which was used to isolate the point in evolution of certain sporulating bacteria (J. Bac, 2011).
Article: A Domain-Driven, Generative Data Model for Big Pet Store http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7034765. RJ Nowling and Jay Vyas.
Article: A Pipeline Software Architecture for NMR Spectrum Data Translation. Heidi J C Ellis, Gerard Weatherby, Ronald J Nowling, Jay Vyas, Matthew Fenwick, Michael R Gryk Computing in Science and Engineering 05/2012; 15(1):76-83. · 1.25 Impact Factor
Conference Paper: An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities M. Fenwick, C. Sesanker, M.R. Schiller, H.J.C. Ellis, M.L. Hinman, J. Vyas, M.R. Gryk Information Technology: New Generations (ITNG), 2012 Ninth International Conference on; 01/2012
Article: HIVToolbox, an integrated web application for investigating HIV. David Sargeant, Sandeep Deverasetty, Yang Luo, Angel Villahoz Baleta, Stephanie Zobrist, Viraj Rathnayake, Jacqueline C Russo, Jay Vyas, Mark A Muesing, Martin R Schiller PLoS ONE 05/2011; 6(5):e20122. · 3.53 Impact Factor
Conference Paper: CONNJUR Workflow Builder: Open Source software for spectral reconstruction of NMR data Weatherby G, Vyas J, Nowling RJ, Heidi J.C. Ellis, Gryk MR 52th Experimental Nuclear Magnetic Resonance Conference,; 04/2011
Article: Iterative Development of an Application to Support Nuclear Magnetic Resonance Data Analysis of Proteins. Heidi J C Ellis, Ronald J Nowling, Jay Vyas, Timothy O Martyn, Michael R Gryk Proceedings of the ... International Conference on Information Technology: New Generations. International Conference on Information Technology: New Generations. 04/2011;
Article: CONNJUR spectrum translator: an open source application for reformatting NMR spectral data. Ronald J Nowling, Jay Vyas, Gerard Weatherby, Matthew W Fenwick, Heidi J C Ellis, Michael R Gryk Journal of Biomolecular NMR 03/2011; 50(1):83-9. · 3.31 Impact Factor
Article: Extremely variable conservation of γ-type small, acid-soluble proteins from spores of some species in the bacterial order Bacillales. Jay Vyas, Jesse Cox, Barbara Setlow, William H Coleman, Peter Setlow Journal of bacteriology 02/2011; 193(8):1884-92. · 2.69 Impact Factor
Article: SciReader enables reading of medical content with instantaneous definitions. Patrick R Gradie, Megan Litster, Rinu Thomas, Jay Vyas, Martin R Schiller BMC Medical Informatics and Decision Making 01/2011; 11:4. · 1.50 Impact Factor
Article: A computational tool for identifying minimotifs in protein-protein interactions and improving the accuracy of minimotif predictions. Sanguthevar Rajasekaran, Jerlin Camilus Merlin, Vamsi Kundeti, Tian Mi, Aaron Oommen, Jay Vyas, Izua Alaniz, Keith Chung, Farah Chowdhury, Sandeep Deverasatty, Tenisha M Irvey, David Lacambacal, Darlene Lara, Subhasree Panchangam, Viraj Rathnayake, Paula Watts, Martin R Schiller Proteins Structure Function and Bioinformatics 09/2010; 79(1):153-64. · 3.34 Impact Factor
Conference Paper: The CONNJUR Spectrum Translator: Open Source software for converting the format of time-domain NMR data Nowling RJ, Vyas J, Weatherby G, Ellis HJC, Gryk MR XXIVth International Conference on Magnetic Resonance in Biological Systems; 08/2010
Article: Biomolecular NMR data analysis. Michael R Gryk, Jay Vyas, Mark W Maciejewski Progress in Nuclear Magnetic Resonance Spectroscopy 05/2010; 56(4):329-45. · 8.71 Impact Factor
Article: MimoSA: a system for minimotif annotation. Jay Vyas, Ronald J Nowling, Thomas Meusburger, David Sargeant, Krishna Kadaveru, Michael R Gryk, Vamsi Kundeti, Sanguthevar Rajasekaran, Martin R Schiller BMC Bioinformatics 01/2010; 11:328. · 2.67 Impact Factor
Article: A proposed syntax for Minimotif Semantics, version 1. Jay Vyas, Ronald J Nowling, Mark W Maciejewski, Sanguthevar Rajasekaran, Michael R Gryk, Martin R Schiller BMC Genomics 09/2009; 10:360. · 4.04 Impact Factor
Article: VENN, a tool for titrating sequence conservation onto protein structures. Jay Vyas, Michael R Gryk, Martin R Schiller Nucleic Acids Research 09/2009; 37(18):e124. · 8.81 Impact Factor
Article: Minimotif miner 2nd release: a database and web system for motif search. Sanguthevar Rajasekaran, Sudha Balla, Patrick Gradie, Michael R Gryk, Krishna Kadaveru, Vamsi Kundeti, Mark W Maciejewski, Tian Mi, Nicholas Rubino, Jay Vyas, Martin R Schiller Nucleic Acids Research 11/2008; 37(Database issue):D185-90. · 8.81 Impact Factor
Article: Viral infection and human disease--insights from minimotifs. Krishna Kadaveru, Jay Vyas, Martin R Schiller