Inside the 145 Neuron Transformer Which Speaks 140 Languages

Neurlang

2026/01/01

When Less is More: Revolutionizing Language AI with Boolean Logic

What if we told you that a transformer with just 145 neurons could understand and process 140 different languages? In the world of large language models where billions of parameters are the norm, this might sound impossible. Yet, through the innovative Goruut project, we’ve achieved exactly that using a revolutionary approach called “weightless transformers.”

Traditional transformers rely on massive weight matrices and complex floating-point operations. Our weightless transformer, however, operates entirely on boolean logic and hash-based neurons called “hashtrons.” This fundamental shift in architecture allows us to achieve remarkable multilingual capabilities with minimal computational overhead.

The Two Transformer Variants in Production

As part of the Goruut phonetic to IPA translator, we have deployed two distinct weightless transformer architectures:

  1. Self-Attention Phonetic Transformer: A weightless transformer based on non-masked cross-attention, currently in production use for all supported languages. This model handles the core phonetic-to-IPA translation task (for out of vocabulary words).

  2. Cross-Attention Homograph Disambiguator: A specialized transformer using non-masked cross-attention, initially developed for English and Hebrew homograph disambiguation, with plans to extend support to most languages.

Both architectures share the same fundamental principle: they process information through boolean operations rather than traditional matrix multiplications, making them incredibly efficient while maintaining high accuracy across diverse linguistic contexts.

The Architecture: Rethinking Attention Mechanisms

Token Processing and the Query-Key-Value Paradigm

Our weightless transformer maintains the familiar Query-Key-Value (QKV) structure but implements it through a radically different approach. The architecture features 8 front layer slots, each accepting 3 tokens (query, key, value), resulting in a total of 24 tokens being processed simultaneously by the front layer.

This design choice reflects a key insight: rather than processing sequences of arbitrary length, we can achieve remarkable results by focusing on fixed-size windows with intelligent wraparound handling. When the input exceeds 24 tokens, the system employs a wraparound mechanism, ensuring that longer sequences are still processed effectively without the computational overhead of traditional attention mechanisms.

Key architectural decisions:

The Hashtron Revolution

At the heart of our architecture lies the hashtron—a novel type of neuron that operates entirely on hash-based boolean logic. Each of the 24 input tokens is fed into a hashtron neuron, which yields a boolean output. This seemingly simple transformation is where the magic happens.

The 24 boolean outputs from the first layer form the foundation of our attention mechanism. Rather than computing attention weights through softmax operations over continuous values, we create discrete attention patterns through boolean matrices. This approach offers several advantages:

Architecture of the network

graph TD
    subgraph "24 Token Input Layer (Vertical Processing)"
        T0["Token 0 - Q"]
        T1["Token 1 - K"]
        T2["Token 2 - V"]
        T3[...]
        T4["Token n"]
        T5[...]
        T23["Token 23 - V"]
    end
    
    subgraph "Layer 1"
        L1H0[Hashtron 0]
        L1H1[Hashtron 1]
        L1H2[Hashtron 2]
        L1H3[...]
        L1H4[Hashtron n]
        L1H5[...]
        L1H23[Hashtron 23]
        
        L1H0 --> L1B0{bool}
        L1H1 --> L1B1{bool}
        L1H2 --> L1B2{bool}
        L1H3 --> L1B3{...}
        L1H4 --> L1B4{bool}
        L1H5 --> L1B5{...}
        L1H23 --> L1B23{bool}
    end
    
    subgraph "8×24 Attention Matrix"
        %% Row 1
        AM1_1["●"] 
        AM1_2["○"] 
        AM1_3["●"] 
        AM1_4["○"] 
        AM1_5["●"] 

        
        %% Row 2
        AM2_1["○"] 
        AM2_2["●"] 
        AM2_3["○"] 
        AM2_4["●"] 
        AM2_5["○"] 

        
        %% Row 3
        AM3_1["●"] 
        AM3_2["○"] 
        AM3_3["●"] 
        AM3_4["○"] 
        AM3_5["●"] 

        
        %% Row 4
        AM4_1["○"] 
        AM4_2["●"] 
        AM4_3["○"] 
        AM4_4["●"] 
        AM4_5["○"] 

        
        %% Row 5
        AM5_1["●"] 
        AM5_2["○"] 
        AM5_3["●"] 
        AM5_4["○"] 
        AM5_5["●"] 

        
        %% Row 6
        AM6_1["○"] 
        AM6_2["●"] 
        AM6_3["○"] 
        AM6_4["●"] 
        AM6_5["○"] 

        
        %% Row 7
        AM7_1["●"] 
        AM7_2["○"] 
        AM7_3["●"] 
        AM7_4["○"] 
        AM7_5["●"] 

        
        %% Row 8
        AM8_1["○"] 
        AM8_2["●"] 
        AM8_3["○"] 
        AM8_4["●"] 
        AM8_5["○"] 

        
        %% Row-to-row connections
        AM1_1 --> AM2_1
        AM1_2 --> AM2_2
        AM1_3 --> AM2_3
        AM1_4 --> AM2_4
        AM1_5 --> AM2_5

        
        AM2_1 --> AM3_1
        AM2_2 --> AM3_2
        AM2_3 --> AM3_3
        AM2_4 --> AM3_4
        AM2_5 --> AM3_5

        
        AM3_1 --> AM4_1
        AM3_2 --> AM4_2
        AM3_3 --> AM4_3
        AM3_4 --> AM4_4
        AM3_5 --> AM4_5

        
        AM4_1 --> AM5_1
        AM4_2 --> AM5_2
        AM4_3 --> AM5_3
        AM4_4 --> AM5_4
        AM4_5 --> AM5_5

        
        AM5_1 --> AM6_1
        AM5_2 --> AM6_2
        AM5_3 --> AM6_3
        AM5_4 --> AM6_4
        AM5_5 --> AM6_5

        
        AM6_1 --> AM7_1
        AM6_2 --> AM7_2
        AM6_3 --> AM7_3
        AM6_4 --> AM7_4
        AM6_5 --> AM7_5

        
        AM7_1 --> AM8_1
        AM7_2 --> AM8_2
        AM7_3 --> AM8_3
        AM7_4 --> AM8_4
        AM7_5 --> AM8_5

    end
    
    subgraph "Agreements Column Summation"
        CS0[∑ col0]
        CS1[∑ col1]
        CS2[∑ col2]
        CS3[...]
        CS4[∑ col n]
        CS5[...]
        CS23[∑ col23]
        
        CS0 --> I0[small int]
        CS1 --> I1[small int]
        CS2 --> I2[small int]
        CS3 --> I3[...]
        CS4 --> I4[small int]
        CS5 --> I5[...]
        CS23 --> I23[small int]
    end
    
    subgraph "Stochastic Layer (Hashtron)"
        SL0[Hashtron 0]
        SL1[Hashtron 1]
        SL2[Hashtron 2]
        SL3[...]
        SL4[Hashtron n]
        SL5[...]
        SL23[Hashtron 23]
        
        SL0 --> SLB0{bool}
        SL1 --> SLB1{bool}
        SL2 --> SLB2{bool}
        SL3 --> SLB3{...}
        SL4 --> SLB4{bool}
        SL5 --> SLB5{...}
        SL23 --> SLB23{bool}
    end
    
    subgraph "Repeat 8x"
        R1["Layer 1"]
        R2["Layer 2"] 
        R3["Layer 3"] 
        R4["..."] 
        R5["Layer 8"] 
        R6["Attention → Sum"]
        R7["→ Hashtron"]
    end
    
    subgraph "Final Output Layer"
        FO0[bool 0]
        FO1[bool 1]
        FO2[bool 2]
        FO3[...]
        FO4[bool n]
        FO5[...]
        FO23[bool 23]
        
        FSum[∑ all booleans] --> Total[Integer total]
        Total --> FH[Final Hashtron]
        FH --> Answer{"Boolean Answer<br/>Solution to problem"}
    end
    
    %% Token connections
    T0 --> L1H0
    T1 --> L1H1
    T2 --> L1H2
    T4 --> L1H4
    T23 --> L1H23
    
    %% Attention matrix formation
    L1B0 --> AM1_1
    L1B1 --> AM1_2
    L1B2 --> AM1_3
    L1B4 --> AM1_4
    L1B23 --> AM1_5
    
    %% Column sums from matrix (from last row)
    AM8_1 --> CS0
    AM8_2 --> CS1
    AM8_3 --> CS2
    AM8_4 --> CS4
    AM8_5 --> CS23
    
    %% Stochastic layer connections
    I0 --> SL0
    I1 --> SL1
    I2 --> SL2
    I4 --> SL4
    I23 --> SL23
    
    %% Final outputs
    SLB0 --> FO0
    SLB1 --> FO1
    SLB2 --> FO2
    SLB4 --> FO4
    SLB23 --> FO23
    
    %% Repeat layer connections (simplified)
    SLB0 --> R1
    SLB1 --> R1
    SLB2 --> R1
    SLB23 --> R1
    R7 --> FO0
    R7 --> FO1
    R7 --> FO2
    R7 --> FO23
    
    %% Final summation
    FO0 --> FSum
    FO1 --> FSum
    FO2 --> FSum
    FO4 --> FSum
    FO23 --> FSum

    style T0 fill:#e1f5fe
    style T1 fill:#e1f5fe
    style T2 fill:#e1f5fe
    style T4 fill:#e1f5fe
    style T23 fill:#e1f5fe
    style L1H0 fill:#f3e5f5
    style L1H1 fill:#f3e5f5
    style L1H2 fill:#f3e5f5
    style L1H4 fill:#f3e5f5
    style L1H23 fill:#f3e5f5
    %% Attention Matrix Styling - Row 1
    style AM1_1 fill:#fff3e0
    style AM1_2 fill:#fff3e0
    style AM1_3 fill:#fff3e0
    style AM1_4 fill:#fff3e0
    style AM1_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 2
    style AM2_1 fill:#fff3e0
    style AM2_2 fill:#fff3e0
    style AM2_3 fill:#fff3e0
    style AM2_4 fill:#fff3e0
    style AM2_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 3
    style AM3_1 fill:#fff3e0
    style AM3_2 fill:#fff3e0
    style AM3_3 fill:#fff3e0
    style AM3_4 fill:#fff3e0
    style AM3_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 4
    style AM4_1 fill:#fff3e0
    style AM4_2 fill:#fff3e0
    style AM4_3 fill:#fff3e0
    style AM4_4 fill:#fff3e0
    style AM4_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 5
    style AM5_1 fill:#fff3e0
    style AM5_2 fill:#fff3e0
    style AM5_3 fill:#fff3e0
    style AM5_4 fill:#fff3e0
    style AM5_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 6
    style AM6_1 fill:#fff3e0
    style AM6_2 fill:#fff3e0
    style AM6_3 fill:#fff3e0
    style AM6_4 fill:#fff3e0
    style AM6_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 7
    style AM7_1 fill:#fff3e0
    style AM7_2 fill:#fff3e0
    style AM7_3 fill:#fff3e0
    style AM7_4 fill:#fff3e0
    style AM7_5 fill:#fff3e0
    
    %% Attention Matrix Styling - Row 8
    style AM8_1 fill:#fff3e0
    style AM8_2 fill:#fff3e0
    style AM8_3 fill:#fff3e0
    style AM8_4 fill:#fff3e0
    style AM8_5 fill:#fff3e0
    style CS0 fill:#e8f5e8
    style CS1 fill:#e8f5e8
    style CS2 fill:#e8f5e8
    style CS4 fill:#e8f5e8
    style CS23 fill:#e8f5e8
    style SL0 fill:#f3e5f5
    style SL1 fill:#f3e5f5
    style SL2 fill:#f3e5f5
    style SL4 fill:#f3e5f5
    style SL23 fill:#f3e5f5
    style R1 fill:#f5f5f5
    style R2 fill:#f5f5f5
    style R3 fill:#f5f5f5
    style R4 fill:#f5f5f5
    style R5 fill:#f5f5f5
    style R6 fill:#f5f5f5
    style R7 fill:#f5f5f5
    style FO0 fill:#fce4ec
    style FO1 fill:#fce4ec
    style FO2 fill:#fce4ec
    style FO4 fill:#fce4ec
    style FO23 fill:#fce4ec
    style FSum fill:#e8f5e8
    style FH fill:#f3e5f5
    style Answer fill:#ffebee

Understanding the Flow: From Tokens to Decisions

The diagram above illustrates the complete data flow through our weightless transformer. Let’s walk through each stage:

Stage 1: Token Input and Initial Processing

The journey begins with 24 tokens arranged in Query-Key-Value triplets. Each token is processed by a dedicated hashtron in Layer 1, converting the input into boolean representations. This binary transformation is crucial—it’s where continuous linguistic features become discrete, processable patterns.

Stage 2: The Boolean Attention Matrix

The 24 boolean outputs form an 8×24 attention matrix—the heart of our attention mechanism. Unlike traditional transformers that compute attention weights through complex mathematical operations, our system creates attention patterns through boolean logic. Each cell in this matrix represents a discrete attention decision: either a token pair is relevant (●) or it isn’t (○).

The vertical flow through the 8 rows represents the depth of attention processing. Each row refines the attention pattern, allowing the system to capture increasingly complex relationships between tokens.

Stage 3: Column Summation and Aggregation

Each column in the attention matrix is summed, producing small integers that represent the “agreement” level for each token position. These integers capture how many layers found a particular token position relevant, providing a natural weighting mechanism without floating-point operations.

Stage 4: Stochastic Processing

The integer sums feed into another layer of hashtron neurons—the stochastic layer. This layer introduces controlled randomness into the decision-making process, helping the system generalize across different linguistic contexts while maintaining deterministic core behavior.

Stage 5: Iterative Refinement

The attention-summation-hashtron pattern repeats 8 times, allowing the system to iteratively refine its understanding of the input. Each iteration can capture different aspects of the linguistic relationships, from local syntactic patterns to broader semantic connections.

Stage 6: Final Decision

The final 24 boolean outputs are summed and fed to a single hashtron neuron that produces the ultimate boolean answer. This binary output can represent various linguistic decisions: phonetic classifications, homograph disambiguations, or other language processing tasks.

Why This Architecture Works

The success of our weightless transformer stems from several key insights:

Linguistic Discreteness: Natural language, despite its apparent complexity, often involves discrete decisions. Our boolean approach aligns naturally with this reality.

Efficiency Through Simplicity: By eliminating floating-point operations, we achieve remarkable computational efficiency without sacrificing capability.

Scalable Attention: The boolean attention matrix scales linearly rather than quadratically, making it practical for real-world applications.

Cross-Linguistic Generalization: The hash-based approach naturally handles the diversity of linguistic features across 140 languages without requiring language-specific modifications.

Performance and Real-World Impact

The true test of any language model lies in its real-world performance. Our 145-neuron weightless transformer has been extensively evaluated across 140 languages, with results that demonstrate both the power and the practical limitations of this approach.

Comprehensive Multilingual Evaluation

The table below presents Word Error Rate (WER) and Character Error Rate (CER) measurements across a diverse set of 53 languages, representing different language families, writing systems, and phonological complexities. These metrics were computed on standardized test corpora, providing a fair comparison across linguistic contexts.

language model corpus lang_iso word success rate char success rate word success rate (nostress) char success rate (nostress)
albanian sq 53% 91% 53% 91%
arabic ar 63% 93% 43% 89%
armenian hy 46% 95% 46% 95%
azerbaijani az 27% 85% 27% 85%
bengali bn 42% 94% 42% 94%
bulgarian bg 69% 93% 35% 88%
catalan ca 28% 83% 31% 83%
chinese/mandarin zh 9% 83% 8% 83%
czech cs 64% 87% 55% 86%
danish da 53% 84% 53% 84%
dutch nl 73% 91% 31% 84%
english en 81% 93% 28% 77%
english/american en 84% 92% 31% 78%
english/british en 84% 93% 33% 79%
estonian et 42% 91% 43% 91%
farsi fa 63% 94% 52% 92%
finnish fi 40% 90% 58% 95%
french fr 44% 86% 13% 77%
georgian ka 86% 99% 86% 99%
german de 63% 85% 10% 75%
greek el 58% 94% 26% 88%
hebew3 he 83% 97% 5% 86%
hebrew2 he 12% 82% 2% 82%
hindi hi 73% 97% 73% 97%
hungarian hu 65% 91% 60% 90%
icelandic is 69% 91% 64% 90%
indonesian id 79% 95% 51% 88%
italian it 59% 91% 38% 90%
japanese ja 12% 85% 12% 85%
kazakh kk 38% 91% 27% 90%
korean ko 60% 93% 60% 93%
latvian lv 43% 91% 42% 91%
lithuanian lt 41% 90% 41% 90%
macedonian mk 64% 96% 51% 94%
malay/latin ms 100% 1% 100% 1%
malayalam ml 96% 64% 95% 64%
marathi mr 73% 96% 72% 96%
nepali ne 48% 94% 48% 94%
norwegian no 67% 89% 48% 82%
polish pl 57% 84% 57% 85%
portuguese pt 58% 95% 26% 79%
romanian ro 73% 93% 58% 89%
russian ru 82% 95% 11% 88%
serbian sr 88% 98% 88% 98%
slovak sk 65% 92% 64% 92%
slovenian sl 51% 91% 39% 87%
spanish es 79% 93% 24% 78%
swedish sv 71% 92% 18% 85%
tamil ta 58% 96% 58% 96%
thai th 23% 88% 23% 88%
turkish tr 72% 92% 44% 87%
ukrainian uk 67% 91% 53% 90%
urdu ur 65% 94% 61% 94%
vietnamese/central vi 73% 96% 72% 96%
vietnamese/northern vi 85% 97% 85% 97%
vietnamese/southern vi 78% 96% 77% 96%

Key Performance Insights

Standout Performers: Several languages achieve exceptional accuracy, with Serbian (88% word success), Georgian (86%), and Vietnamese Northern (85%) leading the pack. These results demonstrate that the weightless transformer can achieve near-human performance for certain linguistic contexts.

Logographic Language Challenges: Chinese/Mandarin and Japanese show notably lower word success rates (9% and 12% respectively). This is expected given the fundamental differences in how logographic writing systems map to phonetic representations. Importantly, for these languages, the error rates represent sentence-level errors rather than individual word errors, making direct comparison with alphabetic languages less straightforward.

Character-Level Robustness: Even in challenging cases, character-level accuracy remains consistently high across most languages, typically exceeding 80%. This suggests that while complete word accuracy may be difficult to achieve, the system maintains strong phonetic approximations.

Stress Sensitivity: The “nostress” columns reveal how lack of stress marking affects performance. Languages like English show dramatic differences (84% vs 28% word success), highlighting the system’s sensitivity to prosodic features.

Computational Efficiency

Beyond accuracy, the weightless transformer delivers remarkable computational efficiency:

Real-World Deployment Impact

This architecture has enabled practical applications that would be impossible with traditional transformers:

The implications extend beyond just efficiency. This architecture opens new possibilities for deploying sophisticated language models in resource-constrained environments, from mobile devices to embedded systems, making advanced NLP accessible in contexts where traditional transformers would be impractical.