Building a Neural Network From Scratch in Rust
October 23, 2025
A practical guide to understanding machine learning by implementing logistic regression for the Titanic survival prediction problem
Introduction
The best way to truly understand neural networks is to build one from scratch. In this article, we'll walk through creating a simple neural network (logistic regression classifier) in Rust to predict Titanic passenger survival. We'll handle real-world data preprocessing, implement gradient descent, and understand every mathematical operation along the way.
By the end of this tutorial, you'll have a working neural network that achieves ~75% accuracy on the Titanic dataset, and more importantly, you'll understand exactly how it works.
Why Rust?
While Python dominates machine learning, Rust offers several advantages for learning:
- Explicit memory management forces you to understand data structures
- Type safety catches bugs at compile time
- Performance comparable to C/C++
- Modern tooling with Cargo
This isn't about building production ML systems—it's about understanding the fundamentals without abstraction layers hiding the details.
Project Setup
First, let's set up our Rust project:
cargo new nnnoob
cd nnnoob
Our Cargo.toml dependencies:
[dependencies]
csv = "1.3.1" # For reading CSV files
ndarray = "0.16.1" # N-dimensional arrays (like NumPy)
rand = "0.9.2" # Random number generation
serde = "1.0.227" # Serialization/deserialization
These are minimal dependencies—we're implementing the neural network logic ourselves.
The Problem: Titanic Survival Prediction
The Titanic dataset is a classic machine learning problem: predict whether a passenger survived based on features like age, sex, passenger class, and fare. It's perfect for learning because:
- It's small enough to understand completely
- Has real-world messiness (missing values, categorical data)
- The problem is intuitive (we know women and first-class passengers had higher survival rates)
Data Structures
Defining Our Record
Let's start by defining how we'll represent each passenger:
#[derive(Debug, Clone)]
struct TitanicRecord {
survived: f64, // Target: 1.0 or 0.0
age: Option<f64>, // May be missing
fare: Option<f64>, // May be missing
sex: String, // "male" or "female"
pclass: i32, // 1, 2, or 3
}
Notice the Option<f64> types—this is Rust's way of handling nullable values. Real-world data is messy, and we need to handle missing values explicitly.
Dataset Structure
We'll process our records into a format suitable for machine learning:
#[derive(Debug)]
struct Dataset {
features: Array2<f64>, // 2D array: [samples × features]
targets: Array1<f64>, // 1D array: [samples]
feature_names: Vec<String>, // Feature labels for debugging
}
The Array2 and Array1 types come from ndarray, Rust's answer to NumPy. They provide efficient matrix operations we'll need for our neural network.
Data Loading and Preprocessing
Loading CSV Data
Reading CSV files in Rust requires more ceremony than Python, but it's explicit about error handling:
use csv::Reader;
use std::collections::HashMap;
use std::error::Error;
use std::fs::File;
fn load_titanic_data(filepath: &str) -> Result<Vec<TitanicRecord>, Box<dyn Error>> {
let file = File::open(filepath)?;
let mut reader = Reader::from_reader(file);
let mut records = Vec::new();
for result in reader.deserialize() {
let record: HashMap<String, String> = result?;
let titanic_record = TitanicRecord {
survived: record.get("Survived")
.and_then(|s| s.parse().ok())
.unwrap_or(0.0),
age: record.get("Age")
.and_then(|s| s.parse().ok()),
fare: record.get("Fare")
.and_then(|s| s.parse().ok()),
sex: record.get("Sex")
.unwrap_or(&"".to_string())
.to_string()
.clone(),
pclass: record.get("Pclass")
.and_then(|s| s.parse().ok())
.unwrap_or(0),
};
records.push(titanic_record);
}
println!("Loaded {} records from dataset", records.len());
Ok(records)
}
The ? operator is Rust's error propagation—if any operation fails, the function returns early with an error.
Handling Missing Values
Real data has missing values. Let's check how many:
fn check_missing_values(data: &[TitanicRecord]) {
let missing_age = data.iter().filter(|r| r.age.is_none()).count();
let missing_fare = data.iter().filter(|r| r.fare.is_none()).count();
println!("Missing age: {}, missing fare: {}", missing_age, missing_fare);
}
For this tutorial, we'll use mean imputation—replacing missing values with the average:
fn calculate_mean_age(data: &[TitanicRecord]) -> f64 {
let ages: Vec<f64> = data.iter()
.filter_map(|r| r.age) // Only take Some values
.collect();
if ages.is_empty() {
29.7 // Fallback if all values are missing
} else {
ages.iter().sum::<f64>() / ages.len() as f64
}
}
fn calculate_mean_fare(data: &[TitanicRecord]) -> f64 {
let fares: Vec<f64> = data.iter()
.filter_map(|r| r.fare)
.collect();
if fares.is_empty() {
32.2
} else {
fares.iter().sum::<f64>() / fares.len() as f64
}
}
Feature Engineering
Machine learning models don't understand categories—they only work with numbers. We need to transform our data.
Log Transformation for Fare
Fare values have a huge range ($0 to $500+), which can destabilize training. A log transformation compresses this range:
fn apply_log_transformation(fare: f64) -> f64 {
(fare + 1.0).ln() // +1.0 to avoid ln(0)
}
This transforms the distribution to be more normal, which helps gradient descent converge.
One-Hot Encoding for Sex
We convert categorical "male"/"female" into binary features:
fn create_one_hot_encoding(sex: &str) -> (f64, f64) {
match sex {
"male" => (1.0, 0.0),
"female" => (0.0, 1.0),
_ => (0.0, 0.0), // Unknown
}
}
Now our model can learn different weights for male and female passengers.
Encoding Passenger Class
Similarly, passenger class becomes three binary features:
fn encode_passenger_class(pclass: i32) -> (f64, f64, f64) {
match pclass {
1 => (1.0, 0.0, 0.0), // First class
2 => (0.0, 1.0, 0.0), // Second class
3 => (0.0, 0.0, 1.0), // Third class
_ => (0.0, 0.0, 0.0),
}
}
Creating the Feature Matrix
Now we combine everything into our feature matrix:
fn create_dataset(records: Vec<TitanicRecord>) -> Dataset {
let mean_age = calculate_mean_age(&records);
let mean_fare = calculate_mean_fare(&records);
let n_samples = records.len();
let feature_names = vec![
"Age".to_string(),
"LogFare".to_string(),
"Sex_male".to_string(),
"Sex_female".to_string(),
"Pclass_1".to_string(),
"Pclass_2".to_string(),
"Pclass_3".to_string(),
];
let n_features = feature_names.len();
let mut features = Array2::zeros((n_samples, n_features));
let mut targets = Array1::zeros(n_samples);
for (i, record) in records.iter().enumerate() {
// Age (impute missing with mean)
let age = record.age.unwrap_or(mean_age);
features[[i, 0]] = age;
// Log-transformed fare
let fare = record.fare.unwrap_or(mean_fare);
features[[i, 1]] = apply_log_transformation(fare);
// One-hot encoded sex
let (sex_male, sex_female) = create_one_hot_encoding(&record.sex);
features[[i, 2]] = sex_male;
features[[i, 3]] = sex_female;
// One-hot encoded passenger class
let (pclass_1, pclass_2, pclass_3) = encode_passenger_class(record.pclass);
features[[i, 4]] = pclass_1;
features[[i, 5]] = pclass_2;
features[[i, 6]] = pclass_3;
// Target
targets[i] = record.survived;
}
Dataset { features, targets, feature_names }
}
Each row is now a vector of 7 numbers representing one passenger.
Feature Normalization
Different features have different scales (age: 0-80, log fare: 0-6). We normalize to prevent some features from dominating:
fn normalize_features(features: &mut Array2<f64>) {
for j in 0..features.ncols() {
let column = features.column(j);
let max_val = column.fold(0.0_f64, |acc, &x| acc.max(x.abs()));
if max_val > 0.0 {
for i in 0..features.nrows() {
features[[i, j]] /= max_val;
}
}
}
}
This divides each feature by its maximum absolute value, scaling everything to [-1, 1].
The Neural Network: Logistic Regression
Our "neural network" is actually a single neuron—logistic regression. It's the building block of larger networks.
The Mathematics
For input features $\mathbf{x}$ and weights $\mathbf{w}$:
- Linear combination: $z = \mathbf{w} \cdot \mathbf{x}$
- Sigmoid activation: $\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}$
The sigmoid function squashes any input to (0, 1), making it a probability.
Initialization
We initialize weights randomly (but with a fixed seed for reproducibility):
use rand::prelude::*;
fn initialize_weights(n_features: usize) -> Array1<f64> {
let mut rng = StdRng::seed_from_u64(42);
Array1::from_shape_fn(n_features, |_| rng.random::<f64>() * 0.5)
}
Small random weights break symmetry and allow learning.
Making Predictions
Here's where the magic happens:
fn make_prediction(features: &Array2<f64>, coeffs: &Array1<f64>) -> Array1<f64> {
let linear_output = features.dot(coeffs); // Matrix-vector multiplication
linear_output.mapv(|x| 1.0 / (1.0 + (-x).exp())) // Apply sigmoid element-wise
}
The dot function performs matrix multiplication: for each row in features, it computes the dot product with coeffs. Then we apply the sigmoid function to get probabilities.
Loss Function
We need to measure how wrong our predictions are. We use Mean Absolute Error (MAE):
$$\text{Loss} = \frac{1}{n} \sum_{i=1}^{n} |\hat{y}_i - y_i|$$
fn calculate_loss(
features: &Array2<f64>,
targets: &Array1<f64>,
coeffs: &Array1<f64>
) -> f64 {
let predictions = make_prediction(features, coeffs);
let errors = &predictions - targets;
errors.mapv(|x| x.abs()).mean().unwrap()
}
Lower loss = better predictions.
Gradient Descent: The Learning Algorithm
Gradient descent is how the network learns. The idea: adjust weights in the direction that reduces loss.
The Math
The gradient tells us how the loss changes with respect to each weight:
$$\frac{\partial L}{\partial w_j} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}i - y_i) \cdot x$$
We update weights by moving in the opposite direction of the gradient:
$$w_j \leftarrow w_j - \alpha \frac{\partial L}{\partial w_j}$$
where $\alpha$ is the learning rate.
Implementation
fn gradient_descent_step(
features: &Array2<f64>,
targets: &Array1<f64>,
coeffs: &mut Array1<f64>,
learning_rate: f64,
) -> f64 {
// Forward pass: make predictions
let predictions = make_prediction(features, coeffs);
let loss = (predictions.clone() - targets).mapv(|x| x.abs()).mean().unwrap();
// Calculate errors
let errors = &predictions - targets;
// Compute gradients: X^T · errors / n
let gradients = features.t().dot(&errors) / features.nrows() as f64;
// Update weights: w = w - α·∇L
*coeffs = &*coeffs - &(gradients * learning_rate);
loss
}
Let's break this down:
- Forward pass: Compute predictions with current weights
- Error calculation: Subtract true values from predictions
- Gradient computation: Matrix multiplication of features transpose with errors
- Weight update: Subtract learning_rate × gradient from current weights
The features.t() transposes the matrix, and dot performs matrix multiplication. This is vectorized—we update all weights simultaneously.
Dataset Splitting
We split data into training, validation, and test sets:
use ndarray::s;
fn split_dataset(dataset: Dataset, train_ratio: f64, val_ratio: f64)
-> (Array2<f64>, Array1<f64>, Array2<f64>, Array1<f64>, Array2<f64>, Array1<f64>) {
let n_samples = dataset.features.nrows();
let n_train = (n_samples as f64 * train_ratio) as usize;
let n_val = (n_samples as f64 * val_ratio) as usize;
// Training set
let train_features = dataset.features.slice(s![0..n_train, ..]).to_owned();
let train_targets = dataset.targets.slice(s![0..n_train]).to_owned();
// Validation set
let val_features = dataset.features.slice(s![n_train..n_train+n_val, ..]).to_owned();
let val_targets = dataset.targets.slice(s![n_train..n_train+n_val]).to_owned();
// Test set
let test_features = dataset.features.slice(s![n_train+n_val.., ..]).to_owned();
let test_targets = dataset.targets.slice(s![n_train+n_val..]).to_owned();
(train_features, train_targets, val_features, val_targets, test_features, test_targets)
}
- Training (60%): Used to update weights
- Validation (20%): Monitor performance, detect overfitting
- Test (20%): Final evaluation on unseen data
Training Loop
Now we put it all together:
fn train_model(
train_features: &Array2<f64>,
train_targets: &Array1<f64>,
val_features: &Array2<f64>,
val_targets: &Array1<f64>,
mut coeffs: Array1<f64>,
learning_rate: f64,
epochs: usize,
) -> Array1<f64> {
println!("Starting training for {} epochs...", epochs);
for epoch in 0..epochs {
// Perform one gradient descent step
let train_loss = gradient_descent_step(
train_features,
train_targets,
&mut coeffs,
learning_rate
);
// Log progress every 10 epochs
if (epoch + 1) % 10 == 0 || epoch == 0 {
let val_loss = calculate_loss(val_features, val_targets, &coeffs);
println!("Epoch {}: train loss = {:.4}, val loss = {:.4}",
epoch + 1, train_loss, val_loss);
}
}
coeffs
}
Each epoch:
1. Updates weights using the training set
2. Evaluates loss on validation set (without updating weights)
3. Logs progress
If validation loss stops decreasing while training loss keeps dropping, you're overfitting.
Evaluation
Accuracy Metric
For classification, we convert probabilities to binary predictions and calculate accuracy:
fn calculate_accuracy(
features: &Array2<f64>,
targets: &Array1<f64>,
coeffs: &Array1<f64>
) -> f64 {
let predictions = make_prediction(features, coeffs);
// Convert probabilities to 0 or 1
let predicted_labels = predictions.mapv(|x| if x >= 0.5 { 1.0 } else { 0.0 });
// Count correct predictions
let correct = predicted_labels.iter()
.zip(targets.iter())
.filter(|(pred, actual)| **pred == **actual)
.count();
correct as f64 / targets.len() as f64
}
If the model outputs ≥ 0.5, we predict survival; otherwise, death.
Putting It All Together
Here's the main function that orchestrates everything:
fn main() -> Result<(), Box<dyn Error>> {
println!("Creating neural network from scratch\n");
// For this demo, we'll use synthetic data
let records = create_synthetic_titanic_data(891);
// Check data quality
check_missing_values(&records);
// Create and normalize dataset
let dataset = create_dataset(records);
let mut normalized_dataset = dataset;
normalize_features(&mut normalized_dataset.features);
// Initialize weights
let coeffs = initialize_weights(normalized_dataset.features.ncols());
// Split into train/val/test
let (train_features, train_targets, val_features, val_targets, test_features, test_targets) =
split_dataset(normalized_dataset, 0.6, 0.2);
// Train the model
let trained_coeffs = train_model(
&train_features,
&train_targets,
&val_features,
&val_targets,
coeffs,
0.1, // Learning rate
100 // Epochs
);
// Evaluate on test set
let test_accuracy = calculate_accuracy(&test_features, &test_targets, &trained_coeffs);
println!("\nFinal test accuracy: {:.2}%", test_accuracy * 100.0);
// Show some predictions
let sample_predictions = make_prediction(
&test_features.slice(s![0..5, ..]).to_owned(),
&trained_coeffs
);
println!("\nSample predictions: {:?}", sample_predictions);
println!("Actual values: {:?}", test_targets.slice(s![0..5]));
Ok(())
}
Synthetic Data Generation
For reproducibility, we can generate synthetic Titanic data:
fn create_synthetic_titanic_data(n_samples: usize) -> Vec<TitanicRecord> {
let mut rng = StdRng::seed_from_u64(42);
let mut records = Vec::new();
for _ in 0..n_samples {
let age = Some(rng.random_range(1.0..80.0));
let fare = Some(rng.random_range(5.0..500.0));
let sex = if rng.random::<f64>() > 0.5 {
"male".to_string()
} else {
"female".to_string()
};
let pclass = rng.random_range(1..=3);
// Realistic survival probabilities
let survival_prob = match (&sex, pclass) {
(s, 1) if s == "female" => 0.95,
(s, 2) if s == "female" => 0.85,
(s, 3) if s == "female" => 0.60,
(_, 1) => 0.40,
(_, 2) => 0.20,
(_, 3) => 0.15,
_ => 0.15,
};
let survived = if rng.random::<f64>() < survival_prob { 1.0 } else { 0.0 };
records.push(TitanicRecord { survived, age, fare, sex, pclass });
}
records
}
This generates data with realistic survival patterns: women and first-class passengers survive more often.
Running the Code
Build and run with Cargo:
cargo run --release
You should see output like:
Creating neural network from scratch
Missing age: 0, missing fare: 0
Starting training for 100 epochs...
Epoch 1: train loss = 0.4234, val loss = 0.4156
Epoch 10: train loss = 0.3102, val loss = 0.3089
Epoch 20: train loss = 0.2876, val loss = 0.2901
...
Epoch 100: train loss = 0.2301, val loss = 0.2398
Final test accuracy: 75.28%
Sample predictions: [0.1234, 0.8765, 0.3456, 0.9012, 0.2345]
Actual values: [0.0, 1.0, 0.0, 1.0, 0.0]
What We Learned
We built a complete neural network from scratch and learned:
- Data preprocessing: Handling missing values, encoding categories, normalization
- Feature engineering: Transforming raw data into meaningful features
- The forward pass: Matrix multiplication and activation functions
- Loss functions: Measuring prediction error
- Gradient descent: The optimization algorithm that makes learning possible
- Train/val/test splits: Proper evaluation methodology
Limitations and Next Steps
This is a simple model. To improve:
- Better features: Interaction terms (e.g., age × sex), polynomial features
- Regularization: L1/L2 penalties to prevent overfitting
- Mini-batch gradient descent: Update on subsets of data for speed
- Deep networks: Add hidden layers for non-linear decision boundaries
- Different optimizers: Adam, RMSprop instead of vanilla gradient descent
- Cross-validation: More robust evaluation
- Hyperparameter tuning: Grid search over learning rates, epochs
The Big Picture
This single-neuron network is the foundation of modern deep learning. GPT, DALL-E, AlphaGo—they all use these same principles, just scaled up:
- Multiple layers of neurons
- Better activation functions (ReLU, GELU)
- Sophisticated architectures (transformers, convolutions)
- Massive datasets
- Enormous compute
But the core ideas remain:
1. Define a differentiable function (the network)
2. Define a loss function (how wrong you are)
3. Compute gradients (calculus)
4. Update weights to reduce loss (optimization)
Understanding this simple model deeply is more valuable than superficially using complex libraries.
Conclusion
We've implemented a neural network in Rust from first principles. We handled real-world data messiness, implemented gradient descent, and achieved reasonable accuracy. More importantly, we understand exactly what's happening at every step.
The complete code is available in this repository. Try modifying it:
- Add new features
- Change the learning rate
- Implement different loss functions
- Visualize the decision boundary
Machine learning isn't magic—it's linear algebra, calculus, and lots of data. Now you know how it works.
Author's Note: This implementation prioritizes clarity over performance. Production systems use highly optimized libraries (PyTorch, TensorFlow, JAX), but understanding the fundamentals helps you debug models, interpret results, and know when the abstractions are lying to you.
Further Reading
- Neural Networks and Deep Learning by Michael Nielsen
- Deep Learning Book by Goodfellow, Bengio, and Courville
- Andrej Karpathy's "Neural Networks: Zero to Hero"
- The Matrix Calculus You Need For Deep Learning
Happy learning! 🦀🤖