Summer Intern Presentation II
DATE & TIME: August 30, 2021, 2:00 PM - 3:00 PM CDT
LOCATION: Zoom Meeting
Speaker: Zifan(Fred) Yu (University of Washington)
Title: Fast Algorithms for Elastic-net Matrix Linear Models
Abstract: Our work builds upon the Matrix Linear Models, which have been established and well-studied by Dr. Sen and Jane W. Liang who was a previous intern. The models are suitable for modeling high-throughput data where the responses are multivariate with each column being, for example, a genetic strain, and each row being an experiment run, and they take full advantages of the column covariates (which represent usually the environmental conditions) and the row covariates (features that are associated with the of the type of the response). Extended from the univariate linear models and some standard statistical approaches such as the t-test, the Matrix Linear Models provide the flexibility in modeling and computational speed. The previous work developed solutions to estimate the models under Lasso penalization with the assumption that the effects will be rare (sparsity in the coefficients). In order to overcome some of the limitations while still preserve some of the strengths of the Lasso type of solutions, we developed solutions to the models with Elastic-net, which is a combined regularization approach of the Lasso and Ridge types of regression. Similar as the previous work, we extended the proximal algorithms (ISTA, FISTA and ADMM) to be suitable for solving the Elastic-net problems. The outcome of our work is the Julia (a relatively young but promising programming language) package called MatrixLMnet extended with Elastic-net solutions with full functionalities as the Lasso solutions, which initially developed by Jane W. Liang and Dr. Sen, Saunak.
Speaker: Phillip Winston Miller (University of Memphis)
Title: bigBERD: An R package to generate reproducible analysis
Abstract: bigBERD is an R package which automatically generates code for general analysis plans in a piecewise and robust manner. Nearly all standard analysis plans contain the same basic steps: visualization, summarization of variables, univariate testing, and some type of regression analysis. Much time is spent copying, modifying, or developing new code to analyze methodologically similar problems. My package aims to circumvent this issue by generating customized code for any dataset at hand. bigBERD operates in a stepwise manner. First, the user generates a Report directory tree to facilitate good data handling practices. Then the package reads the clean dataset and creates a short summary file, which contains useful information for the user and forms the basis of automated decision making. After these prerequisites are generated, the user can call one of four document generating functions: exploratory analysis, univariate analysis, multiple regression analysis, or time-to-event analysis. Each of these will generate a .Rmd document containing code customized to the task and dataset at hand. All code can be run immediately after generation or modified as the user sees fit. The exploratory document contains code for univariate plots of each variable, bivariate plots, and a summary table. The univariate analysis function writes code to perform univariate analysis on all variables, stratified by a user-defined condition of interest. It also generates code for simple linear/logistic/multinomial/ordinal regression on an outcome of interest, with accompanying tables and plots. The multiple regression analysis function generates code to build a main effects model and a stepwise AIC reduced model, with summary tables and plots. The time-to-event analysis function generates KM plots with log-rank tests for categorical variables and a main-effects cox-proportional hazard model, with summary table.