# Resources

**Places for learning R and Data Science:**

Swirl: Learn R in R. It sounds hard but it really isn’t.

DataCamp: A little heavy with hand-holding but it is great for beginners.

Coursera R programming course

Udemy R programming course

Introduction to R (book) by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau

Rafael Irizarry Teaching Materials: Harvard statistics professor’s amazing collection of teaching materials.

BookDown: Collection of free open source books written by some of the top people. Especially check out these ones:

- R for data science by Hadley Wickham and Garett Grolemund
- Hands-on programming with R by Garett Grolemund
- R programming for data science by Robert Peng
- Introduction to Data Science by Rafael Irizarry
- R Markdown definitive guide by Yihui Xie, J. J. Allaire, Garrett Grolemund
- R Markdown cookbook by Yihui Xie, Christophe Dervieux, Emily Riederer
- Data Science Live book by Pablo Casas (for understanding of common issues when data analysis and machine learning are done)
- And many, many more in their archives!

The Big Book of R: It is a large collection of all-things-R. Some of the books I wrote above are also cited here. It is a nice compendium summarizing many great resources for R learning.

Gaston Sanchez, UC Berkeley: So many R tutorials and vignettes that will blow your mind.

Statistical tools for high-throughput data analysis (STHDA): Maintained by Alboukadel Kassambara (PhD in Bioinformatics and Cancer Biology) who authored several helpful R packages including

`ggpubr`

,`survminer`

,`ggcorplot`

, and`factoextra`

.useR! Machine Learning Tutorial: Tutorial from the R user conference 2016 focusing on using machine learning algorithms in R.

**Fantastic datasets and where to find them:**

- Kaggle: Community curated datasets from all sorts of disciplines
- Harvard Dataverse: Harvard-managed database containing ~100K datasets from various sources.
- Our World in Data: Numbers of the World
- UCI Machine Learning Data Repository: “A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms”

**Websites that give you a helping hand when you are stuck:**

- Stack Overflow: For coding problems
- Cross Validated: For questions about statistics and whatnot
- Biostars:Bioinformatics forum contributed by many across the globe
- R-bloggers: A collection for R blogs across the globe. You will never know the gems you’ll discover here. Many thanks Tal Galili for creating and maintaining the platform!

**Places for understanding statistics and machine learning better**

StatQuest: A great way of learning statistics and machine learning concepts without getting into heavy mathematics.

Introduction to Statistical Learning: Perfect for understanding how statistics and machine learning works, and it involves minimal maths.

Elements of Statistical Learning: The big brother of the Introduction to Statistical Learning course above. For a more detailed dive into the concepts.

Setosa.io: A blog for visually explaining things. Great for understanding things like principle component analysis.

**For learning more about Git/GitHub**

- Happy Git with R: All the good things together R, RStudio, and Git
- Git Docs Tutorial
- Git Book