Comparing the Traditional Perceptron and Hashtron Classifiers

The Traditional Perceptron

The perceptron is one of the oldest binary classification algorithms, inspired by biological neurons. Here’s how it works:

Summary

Input: Receives a feature vector x = [x₁, x₂, …, xₙ] of size m
Weights: Maintains learnable weights w = [w₁, w₂, …, wₙ] of size m
Activation: Computes output using a step function:

$${f(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{otherwise} \end{cases}}$$

where: $${z = w · x + b}$$ (b is a bias term).
Learning: Updates weights through error correction:

$${w ← w + α(y - ŷ)x}$$

where α is the learning rate, y is the true label, and ŷ is the prediction.

Hashtron’s Hash Function

Before discussing Hashtron, let’s examine its core hash operation conceptually:

\[\text{hash}(\text{input}, \text{salt}, \text{modulo}) = \text{mix}(\text{input}, \text{salt}) \mod \text{modulo}\]

The deterministic hash function maps any integer input to a hash digest in the range [0, modulo-1]. For more details how this works, see the previous article.

Introducing Hashtron

The Hashtron classifier takes a radically different approach compared to classical neural networks. Instead of learned weights, it uses a program defined as:

Hashtron Program Structure

An array of n tuples containing (salt, modulo) values:

$${Program = [(s₀, m₀), (s₁, m₁), ..., (sₙ₋₁, mₙ₋₁)]}$$

Typically:

Salts sᵢ are arbitrary integers, resulting from machine learning (to be described in a subsequent article).
Modulus mᵢ are monotonically decreasing and depend on the learning rate that was used when training.
Final modulo mₙ₋₁ = 2 (forces binary output).

Classification Process

Initialize state: x₀ = input command.
Iteratively apply hash operations:

$${xᵢ₊₁ = hash(xᵢ, sᵢ, mᵢ)}$$
Final output: xₙ ∈ {0,1} (when final modulo (mₙ₋₁) is 2).

Example inference chain:

$${Command → hash(Command, s₀, m₀) = x₁}$$ $${x₁ → hash(x₁, s₁, m₁) = ... → xₙ = BinaryOutput}$$

Hashtron Program Serialization

To optimize disk storage, programs are encoded using delta compression:

Tuple Position	Salt Storage	Modulo Storage
First (i=0)	Stored directly: s₀	Stored directly: m₀
Subsequent (i>0)	`$${Δsalt = sᵢ₋₁ ⊕ sᵢ}$$`	`$${Δmod = mᵢ₋₁ - mᵢ}$$`

Example Reconstruction:

Reconstructing second tuple:

$${salt₁ = salt₀ ⊕ storedSalt₁}$$ $${mod₁ = mod₀ - storedMod₁}$$

Reconstructing i-th tuple

$${saltᵢ = saltᵢ₋₁ ⊕ storedSaltᵢ}$$ $${modᵢ = modᵢ₋₁ - storedModᵢ}$$

This delta encoding significantly reduces storage requirements compared to storing raw values.

Why XOR encoding for salts?

The learning algorithm searches the “next” salt near the previous salt as an optimalization. The consecutive salts were noticed to be similar integers. Thus the learning algorithm brutefortes nonce starting from nonce 0, such that nonce ⊕ previous salt doesn’t cause any collision between the true set and the false set.

That’s why the xors of nearby salts tend to be small integers. This reduces the storage requirement.

Implementation Reference

Hashtron inference has been implemented in multiple programming languages. See working examples at:
Rosetta Code - Hashtron Inference

Key Differences

Aspect	Perceptron	Hashtron
Learning	Weight updates	Program configuration
Space Complexity	`$${O(m)}$$` for weights	`$${O(n)}$$` for hash chain
Explainability	Weight interpretation	Opaque hash chain
Execution	Matrix multiplication	Hash chain evaluation
Arithmetic	Floating point	Integer
Problem domain	Differentiable	Arbitrary
Disadvantages	Linear separability	Novel algorithm

Both approaches demonstrate different philosophical approaches to machine learning: parametric vs. algorithmic.

Comparing the Traditional Perceptron and Hashtron Classifiers for Machine Learning

Neurlang

2025/01/30