Common Voice is a great resource for natural language processing enthusiasts. Today I was giving it a pass using Goruut 0.4.0, which uncovered several outliers.
| Language | Outlier | 
|---|---|
| ar | Meaning: Someone in a difficult situation will do anything to get out of it. | 
| bn | Google Play | 
| ca | " | 
| es | וְתִסְמְכֵנוּ לְשָלוֹם. | 
| fa | amir šāyān | 
| ja | a | 
| ja | A | 
| ja | fgtyht | 
| ja | gfvrv | 
| ja | Give us back our time. | 
| ja | green | 
| ja | hello | 
| ja | I don’t know | 
| ja | If you love you, please love me. | 
| ja | I saw him today | 
| ja | I wanna fly so high Yeah, I know my wings are dried | 
| ja | jdjdjd | 
| ja | ksskskksksks | 
| ja | tyyjnybt | 
| ja | v | 
| ja | Windows | 
| ja | X軸Y軸 | 
| ja | You must be teasing us. | 
| ja | 歴史的修正主義者De Ste。 | 
| ka | ???????? ???????? ????? ?????? ??????. | 
| mn | ?????? | 
| ru | ?????? ???? ?????, ??? ????????? ???? ??? ?????? | 
| ru | Firefox | 
| sr | Sreća je u malim stvarima | 
| ur | ﺍﺱ ﭘﺮ ﺟﻮﺍﮨﺮ ﻻﻝ ﻧﮩﺮﻭ ﻧﮯ ﻣﺴﮑﺮﺍ ﮐﺮ ﮐﮩﺎ ﺗﮭﺎ | 
| ur | ﭨﮭﻮﻧﺲ ﺩﯾﺘﮯ ﺗﮭﮯ | 
| ur | ﮈﺍﮐﯿﮧ ﭘﮭﺮ ﭘﺮﯾﺸﺎﻥ ﮨﻮ ﮔﯿﺎ | 
| ur | ﮐﯿﺌﺮﭨﯿﮑﺮ ﺣﮑﻮﻣﺖ ﺑﮭﯽ ﻗﻮﻣﯽ ﺣﮑﻮﻣﺖ ﺑﻦ ﺳﮑﺘﯽ ﮨﮯ | 
| ur | ﻟﻮﮔﻮﮞ ﻧﮯ ﺣﯿﺮﺕ ﺳﮯ ﭘﻮﭼﮭﺎ | 
A number of English texts in non-English datasets. No idea what Hebrew phrase does in Spanish dataset. The word “Firefox” should be probably in Cyrillics same for “Sreća je u malim stvarima”. About the Urdu outliers, no idea why these came out, but listing them just in case.