Advanced tools for data science
An advanced R workshop to provide training in developing predictive models, cluster analysis, time series and text mining. This workshop is intended for participants familiar with the R language and with some modeling experience in R or other systems such as SAS or Stata.
Day 1: Statistics and predictive modeling
- Basic concepts in predictive modeling (prediction vs interpretability, bias-variance tradeoff, collinearity, creating dummy variables, training vs testing datasets/data splitting, parameter tuning)
- An introduction to the
- Regression (LM, GAM, SVM, random forest, boosting)
- Regularization/shrinkage (ridge, LASSO, elastic net) and dimension reduction (PLS)
- Classification (logistic, linear discriminant analysis, random forest and boosting)
Day 2: Time series analysis, cluster analysis, text mining, topic analysis and market basket
- Time series analysis (working with
zoopackage, forecasting with exponential smoothing, forecasting with ARIMA)
- Cluster analysis (K-mean, hierarchical clustering)
- Principal components analysis
- Text mining (focus on the
tmpackage, corpus processing, word frequency, word associations)
- Topic analysis (Latent Dirichlet Allocation using the
- Market basket analysis (focus on the
Zev Ross is president of ZevRoss Spatial Analysis, a company focused on data science, statistics and data visualization. He is an RStudio recommended trainer and consultant and has used R on a daily basis for nearly 15 years conducting data analysis and statistics for a wide range of clients including some of the world’s largest public health agencies and Fortune 500 companies. Zev has authored or co-authored more than 40 scientific research papers and maintains a popular data science blog.
Zev has been teaching R workshops for 7 years and clients include the New York City Department of Health, Freddie Mac, Zurich Insurance, UCLA, Columbia University, Agriculture Canada and many others. These on-site wokshops have earned an average of 4.6 stars out of 5 in hundreds of anonymous reviews and 99% of the reviewers gave the workshops 4 or 5 stars.
The workshop involves a mix of slides, collaborative hands-on material and exercises and can be held at your institution. Participants use their own laptops and will be provided access to RStudio Server for the course. To fit with busy schedules the workshop can be held on weekdays or weekends. The workshop is designed for up to 15 participants (but let us know if you have a small or large group).