Advanced tools for data science
Goals
An advanced R workshop to provide training in developing predictive models, cluster analysis, time series and text mining. This workshop is intended for participants familiar with the R language and with some modeling experience in R or other systems such as SAS or Stata.
Agenda
Day 1: Statistics and predictive modeling
- Basic concepts in predictive modeling (prediction vs interpretability, bias-variance tradeoff, collinearity, creating dummy variables, training vs testing datasets/data splitting, parameter tuning)
- An introduction to the
caret
package - Regression (LM, GAM, SVM, random forest, boosting)
- Regularization/shrinkage (ridge, LASSO, elastic net) and dimension reduction (PLS)
- Classification (logistic, linear discriminant analysis, random forest and boosting)
Day 2: Time series analysis, cluster analysis, text mining, topic analysis and market basket
- Time series analysis (working with
ts
object,zoo
package, forecasting with exponential smoothing, forecasting with ARIMA) - Cluster analysis (K-mean, hierarchical clustering)
- Principal components analysis
- Text mining (focus on the
tm
package, corpus processing, word frequency, word associations) - Topic analysis (Latent Dirichlet Allocation using the
topicmodels
package) - Market basket analysis (focus on the
apriori
function andarules
package)
Workshop Instructor
Zev Ross is president of ZevRoss Spatial Analysis, a company focused on data science, statistics and data visualization. He is an RStudio recommended trainer and consultant and has used R on a daily basis for nearly 15 years conducting data analysis and statistics for a wide range of clients including some of the world’s largest public health agencies and Fortune 500 companies. Zev has authored or co-authored more than 40 scientific research papers and maintains a popular data science blog.
Zev has been teaching R workshops for 7 years and clients include the New York City Department of Health, Freddie Mac, Zurich Insurance, UCLA, Columbia University, Agriculture Canada and many others. These on-site wokshops have earned an average of 4.6 stars out of 5 in hundreds of anonymous reviews and 99% of the reviewers gave the workshops 4 or 5 stars.
Workshop Format
The workshop involves a mix of slides, collaborative hands-on material and exercises and can be held at your institution. Participants use their own laptops and will be provided access to RStudio Server for the course. To fit with busy schedules the workshop can be held on weekdays or weekends. The workshop is designed for up to 15 participants (but let us know if you have a small or large group).