AI often feels like magic, but beneath the surface, it’s just a combination of mathematical foundations and code. In this post, we’ll build a neural network from scratch in Rust — no ML libraries, just math and well-structured code.
The Matrix Structure
To handle the math, we first define our Matrix structure. We use a flat Vec to represent the matrix data, which is more cache-friendly than a vector of vectors. From this base struct we implement the core operations we’ll need throughout the network: dot products, addition, and subtraction.
pub struct Matrix { pub rows: usize, pub cols: usize, pub data: Vec<f64>,}
impl Matrix { pub fn dot_multiply(&self, other: &Matrix) -> Matrix { // ... implementation of matrix multiplication }}The Network Architecture
Our Network struct uses the Matrix type to manage the weights and biases between layers. It also holds the learning rate and the chosen activation function. The feed_forward method handles transforming input data through each layer in sequence, producing the final prediction.
pub struct Network { pub layers: Vec<usize>, pub weights: Vec<Matrix>, pub biases: Vec<Matrix>, pub activation: Activation, pub learning_rate: f64,}
impl Network { pub fn feed_forward(&mut self, inputs: Matrix) -> Matrix { // Compute output by passing data through layers // Using our dot_multiply implementation }}Backpropagation
The most critical part of learning is backpropagation. After a forward pass we compute the error between the prediction and the target, then propagate that error backwards through each layer. At each step we calculate the gradient and apply it to the weights and biases via gradient descent, nudging them in the direction that reduces the error.
pub fn back_propagate(&mut self, inputs: Matrix, targets: Matrix) { let errors = targets.subtract(&inputs);
// Iterate through layers in reverse for i in (0..self.layers.len() - 1).rev() { // Apply gradient descent self.weights[i] = self.weights[i].add(&gradients); }}The key fields driving this process are the layer topology (e.g. [2, 3, 1]), the weight matrices representing connection strengths, the bias vectors that shift each activation, and the activation function itself which introduces the non-linearity the network needs to learn complex patterns.
Training the Model
Finally, we loop through our training data over several epochs. On each pass the network runs a forward prediction, measures its error, and backpropagates to improve. After 10,000 epochs the weights have converged and the network reliably predicts the correct outputs.
fn main() { let mut network = Network::new(vec![2, 3, 1], Activation::SIGMOID, 0.5);
// Train the network 10,000 times network.train(inputs, targets, 10000);
println!("Training complete!");}Final Result

Watch the full step-by-step video tutorial below: