Daily Schedule Part 2 (Actual — Kept Retrospectively)
Back to Course home page
See also Daily Schedule - Part 1
Part 2: Data Science Foundations (using Joel Grus, Data Science from Scratch, 2nd Edition)
Part 2 Uses Grus and lasts for the remaining four weeks of Term 6
Week 4 — Yet Another Review of Python — Some Vector and Matrix Algebra — Statistics and Probability
- June 4 — Chapters 1-3: Another excellent review of Python and Matplotlib which will help systematize your understanding of the language features you were using in Pasha’s book — The assignment is to do the review of the three chapters, but to completely stop using Jupyter or Jupyter lab, and instead get everything working in PyCharm Professional Edition (free for students) or VS Code (but I have zero experience with that) — When Grus says (at the beginning of Chapter 2) that you should not be tampering with your base Python environment, he is completely correct (so learn how to make a venv that you could call grus or dsfs and then switch to it — if you didn’t already do that for working through Pasha)
- June 7 — Chapters 4-6: Linear Algebra (wherein Grus introduces his Vector and Matrix implementations which could have been classes, or could have leveraged numpy, but which he craftily used type aliases, because that was the simplest way to implement from scratch), Statistics, and Probability (due to having taken last fall’s Bayesian Statistics class, the math in Chapters 5 and 6 will be review)
Week 5 — Optimization (aka Minimization and Maximization) — Working with Data
- June 11 — Chapters 7 and 8: Hypotheses & Inference and Gradient Descent — Make a local repo from the magic hexijin.github.io GitHub repo, put an index.md file in it, and then push to origin main — The only remaining step to having your own home page is to enable GitHub pages in this repo — For more advanced reading, Grus recommends this Overview of Gradient Descent by Eric Ruder
- June 15 — Chapters 9 and 10: Getting and Working with Data (including subtracting the mean and dividing by the standard deviation to get rescaled data sets, and a load of utilities for doing principal component analysis, that Grus somewhat-too-rapidly introduced at the end of Chapter 10)
Week 6 — Machine Learning — Linear Regression
- June 19 — Chapters 11 and 13: Machine Learning and Naive Bayes (and you may need to pick up some material from Chapter 12 on k-Nearest Neighbors which we are otherwise skipping)
- June 21 — Chapters 14 and 15: Simple Linear Regression and Multiple Regression — In Chapter 15, Grus squeezed in a digression on The Bootstrap which is a computational approach not just to estimating parameters, but to estimating uncertainties in those parameters
In the interest of getting to Neural Networks and Deep Learning in our final week, we are skipping Chapter 12 (on k-Nearest Neighbors), Chapter 16 (on Logistic Regression), and Chapter 17 (on Decision Trees)
Week 7 — Neural Networks — Deep Learning
- June 23 — Chapter 18: Neural Networks
- June 25 (no meeting, but do the live coding session) — Get a feeling for how a real pro codes, including type-hinting, systematic adherence to style choices, and code testing, by building the code in PyCharm as Grus builds a deep learning libary in VS Code, pausing the live coding session whenever you need to catch up with him, and fixing the style errors that PyCharm linter catches and mypy misses — Grus’s live coding session is effectively a blindingly-fast introduction to the material of Chapters 18 and 19
- June 26 (final meeting) — Chapter 19: Deep Learning — Only up to and including the section titled “Softmaxes and Cross-Entropy”
See also Looking Beyond