Resources

Places for learning R and Data Science:

Fantastic datasets and where to find them:

  • Kaggle: Community curated datasets from all sorts of disciplines
  • Harvard Dataverse: Harvard-managed database containing ~100K datasets from various sources.
  • Our World in Data: Numbers of the World
  • UCI Machine Learning Data Repository: “A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms”

Websites that give you a helping hand when you are stuck:

  • Stack Overflow: For coding problems
  • Cross Validated: For questions about statistics and whatnot
  • Biostars:Bioinformatics forum contributed by many across the globe
  • R-bloggers: A collection for R blogs across the globe. You will never know the gems you’ll discover here. Many thanks Tal Galili for creating and maintaining the platform!

Places for understanding statistics and machine learning better

  • StatQuest: A great way of learning statistics and machine learning concepts without getting into heavy mathematics.

  • Introduction to Statistical Learning: Perfect for understanding how statistics and machine learning works, and it involves minimal maths.

  • Elements of Statistical Learning: The big brother of the Introduction to Statistical Learning course above. For a more detailed dive into the concepts.

  • Setosa.io: A blog for visually explaining things. Great for understanding things like principle component analysis.

For learning more about Git/GitHub