DSCI420Notes/A1-IntroductionAndMotivation.Rmd at main · TheElementsMath/DSCI420Notes · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# Introduction and Motivation

Machine learning focuses on designing algorithms that automatically extract meaningful information from data.  A concise and widely accepted definition of machine learning comes from Tom Mitchell (1997):

> “A computer program is said to learn from experience E with respect to
> some class of tasks $T$ and performance measure $P$, if its performance
> at tasks in $T$, as measured by $P$, improves with experience $E$.”


## Finding Words for Intuitions


<div class="definition">
**Machine learning**  is the study and development of algorithms that improve automatically through experience and data, without being explicitly programmed for each task.
</div>

Machine learning is a field that combines **data**, **models**, and **learning methods** to identify patterns and make predictions or decisions — ideally generalizing well to new, unseen situations.  Data is the foundation — machine learning aims to discover useful patterns from data without relying heavily on domain expertise.

<div class="definition">**Data** are pieces of information collected to describe, measure, or analyze phenomena.
</div>

In practice, data is represented numerically, often as **vectors**, $\mathbf{x} = \begin{bmatrix}x_1\\ x_2\\ \vdots\\ x_N \end{bmatrix}$.  Models describe how data is generated or how inputs map to outputs.

<div class="definition">A **model** is a learned representation that maps inputs to outputs based on patterns found in data.
</div>

A model *learns* when its performance improves after processing data.  Good models generalize to new, unseen data.

<div class="definition">**Learning** is the process of using data to automatically improve a model’s ability to perform a task.
</div>

The goal is not just to fit the training data, but to perform well on new examples.


Formally, you can think of an algorithm as a mapping from inputs to outputs, where each step is precise, unambiguous, and executable by a computer.

<div class="definition">
An **Algorithm:** is a finite sequence of well-defined instructions or steps designed to solve a specific problem or perform a computation.
</div>


In the context of machine learning, an algorithm provides a systematic procedure for processing data — either to make predictions (as in a predictive algorithm) or to adjust model parameters (as in a training algorithm).  In this way, machine learning involves two overlapping meanings of “algorithm”:

1. A **predictor** that makes predictions based on data.
2. A **training procedure** that updates the predictor’s parameters to improve future performance.

Understanding the **mathematical foundations** behind data, models, and learning helps us build, interpret, and improve machine learning systems — and recognize their assumptions and limitations.


## Two Ways to Read This Book


There are two main strategies for learning the mathematics that underpins machine learning.
The **bottom-up approach** builds understanding from fundamental mathematical concepts toward more advanced ideas. This method provides a solid conceptual foundation, ensuring that each new idea rests on well-understood principles. However, for many learners, this approach can feel slow or disconnected from practical motivation, since the relevance of abstract concepts may not be immediately clear.

In contrast, the **top-down approach** begins with real-world problems and drills down to the mathematics required to solve them. This goal-driven strategy keeps motivation high and helps learners understand why each concept matters. The drawback, however, is that the underlying mathematical ideas can remain fragile—readers may learn to use tools effectively without fully grasping their theoretical basis.

Mathematics for Machine Learning is designed to support both approaches  — foundational (Part I) and applied (Part II) — so readers can move between mathematics and machine learning freely.

This book is designed to assist readers in their understanding of the textbook Mathematics for Machine Learning.  It is more of a foundational approach designed to fill in any gaps a reader might have.  In particular, we aim to provide more examples in a less theoretical way.  Whether you are reading from a top down or bottom up approach, this book will support your learning.

---

### Part I: Mathematical Foundations

Part I develops the mathematical tools that support all major ML methods — the four pillars of machine learning:

1. **Regression**
2. **Dimensionality Reduction**
3. **Density Estimation**
4. **Classification**


<p align="center">
<img src="Figure1.1MML.png" alt="The Foundations and Four Pillars of Machine Learning" width="400">
</p>

It covers:

- **Linear Algebra (Ch. 2):** Vectors, matrices, and their relationships.
- **Analytic Geometry (Ch. 3):** Similarity and distance between vectors.
- **Matrix Decomposition (Ch. 4):** Interpreting and simplifying data.
- **Vector Calculus (Ch. 5):** Gradients and differentiation.
- **Probability Theory (Ch. 6):** Quantifying uncertainty and noise.
- **Optimization (Ch. 7):** Finding parameters that maximize performance.

---

### Part II: Machine Learning Applications

Part II applies the math from Part I to the four pillars:

- **Ch. 8 — Foundations of ML:** Data, models, and learning; designing robust experiments.
- **Ch. 9 — Regression:** Predicting continuous outcomes using linear and Bayesian approaches.
- **Ch. 10 — Dimensionality Reduction:** Compressing high-dimensional data (e.g., PCA).
- **Ch. 11 — Density Estimation:** Modeling data distributions (e.g., Gaussian mixtures).
- **Ch. 12 — Classification:** Assigning discrete labels (e.g., support vector machines).

---

### Learning Path

Readers are encouraged to mix **bottom-up** and **top-down** learning:

- Build foundational skills when needed.
- Explore applications that connect math to real machine learning systems.

This modular structure makes the book suitable for both **mathematical learners** and **practitioners** aiming to deepen their theoretical understanding.


## Exercises and Feedback

While Mathematics for Machine learning provides some examples and exercises, this book is built to support those who want to practice particular skills or build their knowledge in a particular area.  We have added a number of exercises, examples, and videos to hopefully aid your understanding of the material.


### Exercises {.unnumbered .unlisted}


<div class="exercise">
Discuss the ideas of data, models and learning.  How are they related?

<div style="text-align: right;">
[Solution](https://youtu.be/YSlIPuSSvaI)
</div>
</div>


<div class="exercise">
In machine learning, how are data typically represented?

<div style="text-align: right;">
[Solution](https://youtu.be/-39cITqPMdk)
</div>
</div>


<div class="exercise">
What is meant by learning in the context of a model?

<div style="text-align: right;">
[Solution](https://youtu.be/4HR3Kkb2Ieg)
</div>
</div>