#DeepSpeech in < 5 mins

open source sequence to sequence
speech recognition for everyone

 

Kathy Reid @KathyReid

PhD Candidate @3AInstitute, Australian National University & Open Source Voice Specialist @Mozilla

Canberra Python User Group, 4th March 2021

2021

7.85 billion
people

Source: https://www.worldometers.info/world-population/

7139
languages

Source: https://www.ethnologue.com/guides/how-many-languages

≈200

languages have speech to text support

Source: Internal Mozilla research

135 languages

55 of which are variants of
English, Spanish or Arabic

Source: Internal Mozilla research

so?

what about DeepSpeech?

inference or training

seq2seq

does not use phonemes

Vowels Monopthongs Dipthongs

sheep
ɪ
ship
ʊ
good

shoot
ɪə
here

wait
e
bed
ə
teacher
ɜː
bird
ɔː
door
ʊə
tourist
ɔɪ
boy
əʊ
show
æ
cat
ʌ
up
ɑː
far
ɒ
on

hair

my

cow
Consonants p
pea
b
boat
d
tea
t
dog

cheese

June
k
car
g
go
f
fly
v
video
θ
think
ð
this
s
see
z
zoo
ʃ
shall
ʒ
television
m
man
n
now
ŋ
sing
h
hat
l
love
r
red
w
wet
j
yes

alphabet.txt

contains the characters to train for

ⴰⵣⵓⵍ

ng’ombe

Cześć

Helô

≈201

languages have speech to text support

Thank you