Comparing the Traditional Perceptron and Hashtron Classifiers
The Traditional Perceptron
The perceptron is one of the oldest binary classification algorithms, inspired by biological neurons. Here’s how it works:
Summary
-
Input: Receives a feature vector x = [x₁, x₂, …, xₙ] of size m
-
Weights: Maintains learnable weights w = [w₁, w₂, …, wₙ] of size m
-
Activation: Computes output using a step function:
$${f(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{otherwise} \end{cases}}$$
where:
$${z = w · x + b}$$
(b is a bias term). -
Learning: Updates weights through error correction:
$${w ← w + α(y - ŷ)x}$$
where α is the learning rate, y is the true label, and ŷ is the prediction.
Hashtron’s Hash Function
Before discussing Hashtron, let’s examine its core hash operation conceptually:
\[\text{hash}(\text{input}, \text{salt}, \text{modulo}) = \text{mix}(\text{input}, \text{salt}) \mod \text{modulo}\]
The deterministic hash function maps any integer input to a hash digest in the range [0, modulo-1]. For more details how this works, see the previous article.
Introducing Hashtron
The Hashtron classifier takes a radically different approach compared to classical neural networks. Instead of learned weights, it uses a program defined as:
Hashtron Program Structure
An array of n tuples containing (salt, modulo) values:
$${Program = [(s₀, m₀), (s₁, m₁), ..., (sₙ₋₁, mₙ₋₁)]}$$
Typically:
- Salts sᵢ are arbitrary integers, resulting from machine learning (to be described in a subsequent article).
- Modulus mᵢ are monotonically decreasing and depend on the learning rate that was used when training.
- Final modulo mₙ₋₁ = 2 (forces binary output).
Classification Process
-
Initialize state: x₀ = input command.
-
Iteratively apply hash operations:
$${xᵢ₊₁ = hash(xᵢ, sᵢ, mᵢ)}$$
-
Final output: xₙ ∈ {0,1} (when final modulo (mₙ₋₁) is 2).
Example inference chain:
$${Command → hash(Command, s₀, m₀) = x₁}$$
$${x₁ → hash(x₁, s₁, m₁) = ... → xₙ = BinaryOutput}$$
Hashtron Program Serialization
To optimize disk storage, programs are encoded using delta compression:
Tuple Position | Salt Storage | Modulo Storage |
---|---|---|
First (i=0) | Stored directly: s₀ | Stored directly: m₀ |
Subsequent (i>0) | $${Δsalt = sᵢ₋₁ ⊕ sᵢ}$$ |
$${Δmod = mᵢ₋₁ - mᵢ}$$ |
Example Reconstruction:
Reconstructing second tuple:
$${salt₁ = salt₀ ⊕ storedSalt₁}$$
$${mod₁ = mod₀ - storedMod₁}$$
Reconstructing i-th tuple
$${saltᵢ = saltᵢ₋₁ ⊕ storedSaltᵢ}$$
$${modᵢ = modᵢ₋₁ - storedModᵢ}$$
This delta encoding significantly reduces storage requirements compared to storing raw values.
Why XOR encoding for salts?
The learning algorithm searches the “next” salt near the previous salt as an optimalization. The consecutive salts were noticed to be similar integers. Thus the learning algorithm brutefortes nonce starting from nonce 0, such that nonce ⊕ previous salt doesn’t cause any collision between the true set and the false set.
That’s why the xors of nearby salts tend to be small integers. This reduces the storage requirement.
Implementation Reference
Hashtron inference has been implemented in multiple programming languages. See working examples at:
Rosetta Code - Hashtron Inference
Key Differences
Aspect | Perceptron | Hashtron |
---|---|---|
Learning | Weight updates | Program configuration |
Space Complexity | $${O(m)}$$ for weights |
$${O(n)}$$ for hash chain |
Explainability | Weight interpretation | Opaque hash chain |
Execution | Matrix multiplication | Hash chain evaluation |
Arithmetic | Floating point | Integer |
Problem domain | Differentiable | Arbitrary |
Disadvantages | Linear separability | Novel algorithm |
Both approaches demonstrate different philosophical approaches to machine learning: parametric vs. algorithmic.