Comparing the Traditional Perceptron and Hashtron Classifiers for Machine Learning

Neurlang

2025/01/30

Comparing the Traditional Perceptron and Hashtron Classifiers

The Traditional Perceptron

The perceptron is one of the oldest binary classification algorithms, inspired by biological neurons. Here’s how it works:

Summary

  1. Input: Receives a feature vector x = [x₁, x₂, …, xₙ] of size m

  2. Weights: Maintains learnable weights w = [w₁, w₂, …, wₙ] of size m

  3. Activation: Computes output using a step function:

    $${f(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{otherwise} \end{cases}}$$

    where: $${z = w · x + b}$$ (b is a bias term).

  4. Learning: Updates weights through error correction:

    $${w ← w + α(y - ŷ)x}$$

    where α is the learning rate, y is the true label, and ŷ is the prediction.


Hashtron’s Hash Function

Before discussing Hashtron, let’s examine its core hash operation conceptually:

\[\text{hash}(\text{input}, \text{salt}, \text{modulo}) = \text{mix}(\text{input}, \text{salt}) \mod \text{modulo}\]

The deterministic hash function maps any integer input to a hash digest in the range [0, modulo-1]. For more details how this works, see the previous article.


Introducing Hashtron

The Hashtron classifier takes a radically different approach compared to classical neural networks. Instead of learned weights, it uses a program defined as:

Hashtron Program Structure

An array of n tuples containing (salt, modulo) values:

$${Program = [(s₀, m₀), (s₁, m₁), ..., (sₙ₋₁, mₙ₋₁)]}$$

Typically:

Classification Process

  1. Initialize state: x₀ = input command.

  2. Iteratively apply hash operations:

    $${xᵢ₊₁ = hash(xᵢ, sᵢ, mᵢ)}$$

  3. Final output: xₙ ∈ {0,1} (when final modulo (mₙ₋₁) is 2).

Example inference chain:

$${Command → hash(Command, s₀, m₀) = x₁}$$ $${x₁ → hash(x₁, s₁, m₁) = ... → xₙ = BinaryOutput}$$


Hashtron Program Serialization

To optimize disk storage, programs are encoded using delta compression:

Tuple Position Salt Storage Modulo Storage
First (i=0) Stored directly: s₀ Stored directly: m₀
Subsequent (i>0) $${Δsalt = sᵢ₋₁ ⊕ sᵢ}$$ $${Δmod = mᵢ₋₁ - mᵢ}$$

Example Reconstruction:

Reconstructing second tuple:

$${salt₁ = salt₀ ⊕ storedSalt₁}$$ $${mod₁ = mod₀ - storedMod₁}$$

Reconstructing i-th tuple

$${saltᵢ = saltᵢ₋₁ ⊕ storedSaltᵢ}$$ $${modᵢ = modᵢ₋₁ - storedModᵢ}$$

This delta encoding significantly reduces storage requirements compared to storing raw values.

Why XOR encoding for salts?

The learning algorithm searches the “next” salt near the previous salt as an optimalization. The consecutive salts were noticed to be similar integers. Thus the learning algorithm brutefortes nonce starting from nonce 0, such that nonce ⊕ previous salt doesn’t cause any collision between the true set and the false set.

That’s why the xors of nearby salts tend to be small integers. This reduces the storage requirement.


Implementation Reference

Hashtron inference has been implemented in multiple programming languages. See working examples at:
Rosetta Code - Hashtron Inference


Key Differences

Aspect Perceptron Hashtron
Learning Weight updates Program configuration
Space Complexity $${O(m)}$$ for weights $${O(n)}$$ for hash chain
Explainability Weight interpretation Opaque hash chain
Execution Matrix multiplication Hash chain evaluation
Arithmetic Floating point Integer
Problem domain Differentiable Arbitrary
Disadvantages Linear separability Novel algorithm

Both approaches demonstrate different philosophical approaches to machine learning: parametric vs. algorithmic.