How I built a Neural Network from Scratch in Rust

AI often feels like magic, but beneath the surface, it’s just a combination of mathematical foundations and code. In this post, we’ll build a neural network from scratch in Rust — no ML libraries, just math and well-structured code.

The Matrix Structure#

To handle the math, we first define our Matrix structure. We use a flat Vec to represent the matrix data, which is more cache-friendly than a vector of vectors. From this base struct we implement the core operations we’ll need throughout the network: dot products, addition, and subtraction.

1
pub struct Matrix {
2
    pub rows: usize,
3
    pub cols: usize,
4
    pub data: Vec<f64>,
5
}
6

7
impl Matrix {
8
    pub fn dot_multiply(&self, other: &Matrix) -> Matrix {
9
        // ... implementation of matrix multiplication
10
    }
11
}

The Network Architecture#

Our Network struct uses the Matrix type to manage the weights and biases between layers. It also holds the learning rate and the chosen activation function. The feed_forward method handles transforming input data through each layer in sequence, producing the final prediction.

1
pub struct Network {
2
    pub layers: Vec<usize>,
3
    pub weights: Vec<Matrix>,
4
    pub biases: Vec<Matrix>,
5
    pub activation: Activation,
6
    pub learning_rate: f64,
7
}
8

9
impl Network {
10
    pub fn feed_forward(&mut self, inputs: Matrix) -> Matrix {
11
        // Compute output by passing data through layers
12
        // Using our dot_multiply implementation
13
    }
14
}

Backpropagation#

The most critical part of learning is backpropagation. After a forward pass we compute the error between the prediction and the target, then propagate that error backwards through each layer. At each step we calculate the gradient and apply it to the weights and biases via gradient descent, nudging them in the direction that reduces the error.

1
pub fn back_propagate(&mut self, inputs: Matrix, targets: Matrix) {
2
    let errors = targets.subtract(&inputs);
3

4
    // Iterate through layers in reverse
5
    for i in (0..self.layers.len() - 1).rev() {
6
        // Apply gradient descent
7
        self.weights[i] = self.weights[i].add(&gradients);
8
    }
9
}

The key fields driving this process are the layer topology (e.g. [2, 3, 1]), the weight matrices representing connection strengths, the bias vectors that shift each activation, and the activation function itself which introduces the non-linearity the network needs to learn complex patterns.

Training the Model#

Finally, we loop through our training data over several epochs. On each pass the network runs a forward prediction, measures its error, and backpropagates to improve. After 10,000 epochs the weights have converged and the network reliably predicts the correct outputs.

1
fn main() {
2
    let mut network = Network::new(vec![2, 3, 1], Activation::SIGMOID, 0.5);
3

4
    // Train the network 10,000 times
5
    network.train(inputs, targets, 10000);
6

7
    println!("Training complete!");
8
}

Final Result#

Video screenshot — Training loop output and XOR results

Watch the full step-by-step video tutorial below:

The Matrix Structure#

The Network Architecture#

Backpropagation#

Training the Model#

Final Result#

YouTube#