When Less is More: Revolutionizing Language AI with Boolean Logic

What if we told you that a transformer with just 145 neurons could understand and process 140 different languages? In the world of large language models where billions of parameters are the norm, this might sound impossible. Yet, through the innovative Goruut project, we’ve achieved exactly that using a revolutionary approach called “weightless transformers.”

Traditional transformers rely on massive weight matrices and complex floating-point operations. Our weightless transformer, however, operates entirely on boolean logic and hash-based neurons called “hashtrons.” This fundamental shift in architecture allows us to achieve remarkable multilingual capabilities with minimal computational overhead.

The Two Transformer Variants in Production

As part of the Goruut phonetic to IPA translator, we have deployed two distinct weightless transformer architectures:

Self-Attention Phonetic Transformer: A weightless transformer based on non-masked cross-attention, currently in production use for all supported languages. This model handles the core phonetic-to-IPA translation task (for out of vocabulary words).
Cross-Attention Homograph Disambiguator: A specialized transformer using non-masked cross-attention, initially developed for English and Hebrew homograph disambiguation, with plans to extend support to most languages.

Both architectures share the same fundamental principle: they process information through boolean operations rather than traditional matrix multiplications, making them incredibly efficient while maintaining high accuracy across diverse linguistic contexts.

The Architecture: Rethinking Attention Mechanisms

Token Processing and the Query-Key-Value Paradigm

Our weightless transformer maintains the familiar Query-Key-Value (QKV) structure but implements it through a radically different approach. The architecture features 8 front layer slots, each accepting 3 tokens (query, key, value), resulting in a total of 24 tokens being processed simultaneously by the front layer.

This design choice reflects a key insight: rather than processing sequences of arbitrary length, we can achieve remarkable results by focusing on fixed-size windows with intelligent wraparound handling. When the input exceeds 24 tokens, the system employs a wraparound mechanism, ensuring that longer sequences are still processed effectively without the computational overhead of traditional attention mechanisms.

Key architectural decisions:

No positional encoding: Unlike traditional transformers, our system doesn’t rely on positional embeddings. The hashtron neurons inherently capture positional relationships through their boolean operations.
Fixed window size: The 24-token limit isn’t a constraint—it’s a feature that enables consistent, predictable processing across all languages.
Boolean-first design: Every operation is designed around boolean logic, eliminating the need for floating-point arithmetic.

The Hashtron Revolution

At the heart of our architecture lies the hashtron—a novel type of neuron that operates entirely on hash-based boolean logic. Each of the 24 input tokens is fed into a hashtron neuron, which yields a boolean output. This seemingly simple transformation is where the magic happens.

The 24 boolean outputs from the first layer form the foundation of our attention mechanism. Rather than computing attention weights through softmax operations over continuous values, we create discrete attention patterns through boolean matrices. This approach offers several advantages:

Deterministic behavior: Boolean operations are predictable and reproducible
Memory efficiency: Storing booleans requires minimal memory compared to floating-point weights
Speed: Boolean operations are computationally faster than floating-point arithmetic
Interpretability: The attention patterns are directly visible as boolean matrices

Architecture of the network

graph TD
    subgraph "24 Token Input Layer (Vertical Processing)"
        T0["Token 0 - Q"]
        T1["Token 1 - K"]
        T2["Token 2 - V"]
        T3[...]
        T4["Token n"]
        T5[...]
        T23["Token 23 - V"]
    end
    
    subgraph "Layer 1"
        L1H0[Hashtron 0]
        L1H1[Hashtron 1]
        L1H2[Hashtron 2]
        L1H3[...]
        L1H4[Hashtron n]
        L1H5[...]
        L1H23[Hashtron 23]
        
        L1H0 --> L1B0{bool}
        L1H1 --> L1B1{bool}
        L1H2 --> L1B2{bool}
        L1H3 --> L1B3{...}
        L1H4 --> L1B4{bool}
        L1H5 --> L1B5{...}
        L1H23 --> L1B23{bool}
    end
    
    subgraph "8×24 Attention Matrix"
        %% Row 1
        AM1_1["●"] 
        AM1_2["○"] 
        AM1_3["●"] 
        AM1_4["○"] 
        AM1_5["●"] 

        
        %% Row 2
        AM2_1["○"] 
        AM2_2["●"] 
        AM2_3["○"] 
        AM2_4["●"] 
        AM2_5["○"] 

        
        %% Row 3
        AM3_1["●"] 
        AM3_2["○"] 
        AM3_3["●"] 
        AM3_4["○"] 
        AM3_5["●"] 

        
        %% Row 4
        AM4_1["○"] 
        AM4_2["●"] 
        AM4_3["○"] 
        AM4_4["●"] 
        AM4_5["○"] 

        
        %% Row 5
        AM5_1["●"] 
        AM5_2["○"] 
        AM5_3["●"] 
        AM5_4["○"] 
        AM5_5["●"] 

        
        %% Row 6
        AM6_1["○"] 
        AM6_2["●"] 
        AM6_3["○"] 
        AM6_4["●"] 
        AM6_5["○"] 

        
        %% Row 7
        AM7_1["●"] 
        AM7_2["○"] 
        AM7_3["●"] 
        AM7_4["○"] 
        AM7_5["●"] 

        
        %% Row 8
        AM8_1["○"] 
        AM8_2["●"] 
        AM8_3["○"] 
        AM8_4["●"] 
        AM8_5["○"] 

        
        %% Row-to-row connections
        AM1_1 --> AM2_1
        AM1_2 --> AM2_2
        AM1_3 --> AM2_3
        AM1_4 --> AM2_4
        AM1_5 --> AM2_5

        
        AM2_1 --> AM3_1
        AM2_2 --> AM3_2
        AM2_3 --> AM3_3
        AM2_4 --> AM3_4
        AM2_5 --> AM3_5

        
        AM3_1 --> AM4_1
        AM3_2 --> AM4_2
        AM3_3 --> AM4_3
        AM3_4 --> AM4_4
        AM3_5 --> AM4_5

        
        AM4_1 --> AM5_1
        AM4_2 --> AM5_2
        AM4_3 --> AM5_3
        AM4_4 --> AM5_4
        AM4_5 --> AM5_5

        
        AM5_1 --> AM6_1
        AM5_2 --> AM6_2
        AM5_3 --> AM6_3
        AM5_4 --> AM6_4
        AM5_5 --> AM6_5

        
        AM6_1 --> AM7_1
        AM6_2 --> AM7_2
        AM6_3 --> AM7_3
        AM6_4 --> AM7_4
        AM6_5 --> AM7_5

        
        AM7_1 --> AM8_1
        AM7_2 --> AM8_2
        AM7_3 --> AM8_3
        AM7_4 --> AM8_4
        AM7_5 --> AM8_5

    end
    
    subgraph "Agreements Column Summation"
        CS0[∑ col0]
        CS1[∑ col1]
        CS2[∑ col2]
        CS3[...]
        CS4[∑ col n]
        CS5[...]
        CS23[∑ col23]
        
        CS0 --> I0[small int]
        CS1 --> I1[small int]
        CS2 --> I2[small int]
        CS3 --> I3[...]
        CS4 --> I4[small int]
        CS5 --> I5[...]
        CS23 --> I23[small int]
    end
    
    subgraph "Stochastic Layer (Hashtron)"
        SL0[Hashtron 0]
        SL1[Hashtron 1]
        SL2[Hashtron 2]
        SL3[...]
        SL4[Hashtron n]
        SL5[...]
        SL23[Hashtron 23]
        
        SL0 --> SLB0{bool}
        SL1 --> SLB1{bool}
        SL2 --> SLB2{bool}
        SL3 --> SLB3{...}
        SL4 --> SLB4{bool}
        SL5 --> SLB5{...}
        SL23 --> SLB23{bool}
    end
    
    subgraph "Repeat 8x"
        R1["Layer 1"]
        R2["Layer 2"] 
        R3["Layer 3"] 
        R4["..."] 
        R5["Layer 8"] 
        R6["Attention → Sum"]
        R7["→ Hashtron"]
    end
    
    subgraph "Final Output Layer"
        FO0[bool 0]
        FO1[bool 1]
        FO2[bool 2]
        FO3[...]
        FO4[bool n]
        FO5[...]
        FO23[bool 23]
        
        FSum[∑ all booleans] --> Total[Integer total]
        Total --> FH[Final Hashtron]
        FH --> Answer{"Boolean Answer<br/>Solution to problem"}
    end
    
    %% Token connections
    T0 --> L1H0
    T1 --> L1H1
    T2 --> L1H2
    T4 --> L1H4
    T23 --> L1H23
    
    %% Attention matrix formation
    L1B0 --> AM1_1
    L1B1 --> AM1_2
    L1B2 --> AM1_3
    L1B4 --> AM1_4
    L1B23 --> AM1_5
    
    %% Column sums from matrix (from last row)
    AM8_1 --> CS0
    AM8_2 --> CS1
    AM8_3 --> CS2
    AM8_4 --> CS4
    AM8_5 --> CS23
    
    %% Stochastic layer connections
    I0 --> SL0
    I1 --> SL1
    I2 --> SL2
    I4 --> SL4
    I23 --> SL23
    
    %% Final outputs
    SLB0 --> FO0
    SLB1 --> FO1
    SLB2 --> FO2
    SLB4 --> FO4
    SLB23 --> FO23
    
    %% Repeat layer connections (simplified)
    SLB0 --> R1
    SLB1 --> R1
    SLB2 --> R1
    SLB23 --> R1
    R7 --> FO0
    R7 --> FO1
    R7 --> FO2
    R7 --> FO23
    
    %% Final summation
    FO0 --> FSum
    FO1 --> FSum
    FO2 --> FSum
    FO4 --> FSum
    FO23 --> FSum

    style T0 fill:#e1f5fe
    style T1 fill:#e1f5fe
    style T2 fill:#e1f5fe
    style T4 fill:#e1f5fe
    style T23 fill:#e1f5fe
    style L1H0 fill:#f3e5f5
    style L1H1 fill:#f3e5f5
    style L1H2 fill:#f3e5f5
    style L1H4 fill:#f3e5f5
    style L1H23 fill:#f3e5f5
    %% Attention Matrix Styling - Row 1
    style AM1_1 fill:#fff3e0
    style AM1_2 fill:#fff3e0
    style AM1_3 fill:#fff3e0
    style AM1_4 fill:#fff3e0
    style AM1_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 2
    style AM2_1 fill:#fff3e0
    style AM2_2 fill:#fff3e0
    style AM2_3 fill:#fff3e0
    style AM2_4 fill:#fff3e0
    style AM2_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 3
    style AM3_1 fill:#fff3e0
    style AM3_2 fill:#fff3e0
    style AM3_3 fill:#fff3e0
    style AM3_4 fill:#fff3e0
    style AM3_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 4
    style AM4_1 fill:#fff3e0
    style AM4_2 fill:#fff3e0
    style AM4_3 fill:#fff3e0
    style AM4_4 fill:#fff3e0
    style AM4_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 5
    style AM5_1 fill:#fff3e0
    style AM5_2 fill:#fff3e0
    style AM5_3 fill:#fff3e0
    style AM5_4 fill:#fff3e0
    style AM5_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 6
    style AM6_1 fill:#fff3e0
    style AM6_2 fill:#fff3e0
    style AM6_3 fill:#fff3e0
    style AM6_4 fill:#fff3e0
    style AM6_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 7
    style AM7_1 fill:#fff3e0
    style AM7_2 fill:#fff3e0
    style AM7_3 fill:#fff3e0
    style AM7_4 fill:#fff3e0
    style AM7_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 8
    style AM8_1 fill:#fff3e0
    style AM8_2 fill:#fff3e0
    style AM8_3 fill:#fff3e0
    style AM8_4 fill:#fff3e0
    style AM8_5 fill:#fff3e0
    style CS0 fill:#e8f5e8
    style CS1 fill:#e8f5e8
    style CS2 fill:#e8f5e8
    style CS4 fill:#e8f5e8
    style CS23 fill:#e8f5e8
    style SL0 fill:#f3e5f5
    style SL1 fill:#f3e5f5
    style SL2 fill:#f3e5f5
    style SL4 fill:#f3e5f5
    style SL23 fill:#f3e5f5
    style R1 fill:#f5f5f5
    style R2 fill:#f5f5f5
    style R3 fill:#f5f5f5
    style R4 fill:#f5f5f5
    style R5 fill:#f5f5f5
    style R6 fill:#f5f5f5
    style R7 fill:#f5f5f5
    style FO0 fill:#fce4ec
    style FO1 fill:#fce4ec
    style FO2 fill:#fce4ec
    style FO4 fill:#fce4ec
    style FO23 fill:#fce4ec
    style FSum fill:#e8f5e8
    style FH fill:#f3e5f5
    style Answer fill:#ffebee

Understanding the Flow: From Tokens to Decisions

The diagram above illustrates the complete data flow through our weightless transformer. Let’s walk through each stage:

Stage 1: Token Input and Initial Processing

The journey begins with 24 tokens arranged in Query-Key-Value triplets. Each token is processed by a dedicated hashtron in Layer 1, converting the input into boolean representations. This binary transformation is crucial—it’s where continuous linguistic features become discrete, processable patterns.

Stage 2: The Boolean Attention Matrix

The 24 boolean outputs form an 8×24 attention matrix—the heart of our attention mechanism. Unlike traditional transformers that compute attention weights through complex mathematical operations, our system creates attention patterns through boolean logic. Each cell in this matrix represents a discrete attention decision: either a token pair is relevant (●) or it isn’t (○).

The vertical flow through the 8 rows represents the depth of attention processing. Each row refines the attention pattern, allowing the system to capture increasingly complex relationships between tokens.

Stage 3: Column Summation and Aggregation

Each column in the attention matrix is summed, producing small integers that represent the “agreement” level for each token position. These integers capture how many layers found a particular token position relevant, providing a natural weighting mechanism without floating-point operations.

Stage 4: Stochastic Processing

The integer sums feed into another layer of hashtron neurons—the stochastic layer. This layer introduces controlled randomness into the decision-making process, helping the system generalize across different linguistic contexts while maintaining deterministic core behavior.

The attention-summation-hashtron pattern repeats 8 times, allowing the system to iteratively refine its understanding of the input. Each iteration can capture different aspects of the linguistic relationships, from local syntactic patterns to broader semantic connections.

Stage 6: Final Decision

The final 24 boolean outputs are summed and fed to a single hashtron neuron that produces the ultimate boolean answer. This binary output can represent various linguistic decisions: phonetic classifications, homograph disambiguations, or other language processing tasks.

Why This Architecture Works

The success of our weightless transformer stems from several key insights:

Linguistic Discreteness: Natural language, despite its apparent complexity, often involves discrete decisions. Our boolean approach aligns naturally with this reality.

Efficiency Through Simplicity: By eliminating floating-point operations, we achieve remarkable computational efficiency without sacrificing capability.

Scalable Attention: The boolean attention matrix scales linearly rather than quadratically, making it practical for real-world applications.

Cross-Linguistic Generalization: The hash-based approach naturally handles the diversity of linguistic features across 140 languages without requiring language-specific modifications.

Performance and Real-World Impact

The true test of any language model lies in its real-world performance. Our 145-neuron weightless transformer has been extensively evaluated across 140 languages, with results that demonstrate both the power and the practical limitations of this approach.

Comprehensive Multilingual Evaluation

The table below presents Word Error Rate (WER) and Character Error Rate (CER) measurements across a diverse set of 53 languages, representing different language families, writing systems, and phonological complexities. These metrics were computed on standardized test corpora, providing a fair comparison across linguistic contexts.

language model	corpus lang_iso	word success rate	char success rate	word success rate (nostress)	char success rate (nostress)
albanian	sq	53%	91%	53%	91%
arabic	ar	63%	93%	43%	89%
armenian	hy	46%	95%	46%	95%
azerbaijani	az	27%	85%	27%	85%
bengali	bn	42%	94%	42%	94%
bulgarian	bg	69%	93%	35%	88%
catalan	ca	28%	83%	31%	83%
chinese/mandarin	zh	9%	83%	8%	83%
czech	cs	64%	87%	55%	86%
danish	da	53%	84%	53%	84%
dutch	nl	73%	91%	31%	84%
english	en	81%	93%	28%	77%
english/american	en	84%	92%	31%	78%
english/british	en	84%	93%	33%	79%
estonian	et	42%	91%	43%	91%
farsi	fa	63%	94%	52%	92%
finnish	fi	40%	90%	58%	95%
french	fr	44%	86%	13%	77%
georgian	ka	86%	99%	86%	99%
german	de	63%	85%	10%	75%
greek	el	58%	94%	26%	88%
hebew3	he	83%	97%	5%	86%
hebrew2	he	12%	82%	2%	82%
hindi	hi	73%	97%	73%	97%
hungarian	hu	65%	91%	60%	90%
icelandic	is	69%	91%	64%	90%
indonesian	id	79%	95%	51%	88%
italian	it	59%	91%	38%	90%
japanese	ja	12%	85%	12%	85%
kazakh	kk	38%	91%	27%	90%
korean	ko	60%	93%	60%	93%
latvian	lv	43%	91%	42%	91%
lithuanian	lt	41%	90%	41%	90%
macedonian	mk	64%	96%	51%	94%
malay/latin	ms	100%	1%	100%	1%
malayalam	ml	96%	64%	95%	64%
marathi	mr	73%	96%	72%	96%
nepali	ne	48%	94%	48%	94%
norwegian	no	67%	89%	48%	82%
polish	pl	57%	84%	57%	85%
portuguese	pt	58%	95%	26%	79%
romanian	ro	73%	93%	58%	89%
russian	ru	82%	95%	11%	88%
serbian	sr	88%	98%	88%	98%
slovak	sk	65%	92%	64%	92%
slovenian	sl	51%	91%	39%	87%
spanish	es	79%	93%	24%	78%
swedish	sv	71%	92%	18%	85%
tamil	ta	58%	96%	58%	96%
thai	th	23%	88%	23%	88%
turkish	tr	72%	92%	44%	87%
ukrainian	uk	67%	91%	53%	90%
urdu	ur	65%	94%	61%	94%
vietnamese/central	vi	73%	96%	72%	96%
vietnamese/northern	vi	85%	97%	85%	97%
vietnamese/southern	vi	78%	96%	77%	96%

Key Performance Insights

Standout Performers: Several languages achieve exceptional accuracy, with Serbian (88% word success), Georgian (86%), and Vietnamese Northern (85%) leading the pack. These results demonstrate that the weightless transformer can achieve near-human performance for certain linguistic contexts.

Logographic Language Challenges: Chinese/Mandarin and Japanese show notably lower word success rates (9% and 12% respectively). This is expected given the fundamental differences in how logographic writing systems map to phonetic representations. Importantly, for these languages, the error rates represent sentence-level errors rather than individual word errors, making direct comparison with alphabetic languages less straightforward.

Character-Level Robustness: Even in challenging cases, character-level accuracy remains consistently high across most languages, typically exceeding 80%. This suggests that while complete word accuracy may be difficult to achieve, the system maintains strong phonetic approximations.

Stress Sensitivity: The “nostress” columns reveal how lack of stress marking affects performance. Languages like English show dramatic differences (84% vs 28% word success), highlighting the system’s sensitivity to prosodic features.

Computational Efficiency

Beyond accuracy, the weightless transformer delivers remarkable computational efficiency:

Memory footprint: 145 neurons require minimal storage compared to billion-parameter models
Processing speed: Boolean operations execute orders of magnitude faster than floating-point matrix multiplications
Energy consumption: The architecture is particularly well-suited for edge deployment and mobile applications
Scalability: Linear scaling with input size rather than quadratic attention complexity

Real-World Deployment Impact

This architecture has enabled practical applications that would be impossible with traditional transformers:

Mobile Integration: The entire model runs efficiently on smartphones without requiring cloud connectivity
Embedded Systems: IoT devices can now perform sophisticated phonetic processing locally
Low-Resource Languages: The system provides reasonable performance even for languages with limited training data
Real-Time Processing: The boolean operations enable real-time phonetic translation in interactive applications

The implications extend beyond just efficiency. This architecture opens new possibilities for deploying sophisticated language models in resource-constrained environments, from mobile devices to embedded systems, making advanced NLP accessible in contexts where traditional transformers would be impractical.

Inside the 145 Neuron Transformer Which Speaks 140 Languages

Neurlang

2026/01/01