5 Minutes With: Amelia Kelly, Soapbox Labs

January 19, 2022

undefined mins

Technology Magazine spoke to Amelia Kelly, VP of Speech Technology at SoapBox Labs, about her career and how the company has evolved

Tell us about your current role and responsibilities

I joined SoapBox in 2015 as one of the first employees. SoapBox founder, Dr. Patricia Scanlon, hired us to implement her vision – to create accurate and scalable speech recognition for kids’ voices. Fast-forward 7 years and I’m now VP of Speech Technology, overseeing a team of computational linguists, speech engineers and scientists.

What were some career highlights before your current role?

After finishing my PhD at Trinity College Dublin, I went to Silicon Valley to work at a start-up company. It was there that I really cut my teeth in the practical aspects of speech recognition, NLP and intent classification. I also found that working in a startup environment offered just the right amount of risk and excitement, where decisions I made affected immediate change.

It’s an environment I would revisit later in my career when joining SoapBox, but first I would spend some time at IBM Watson. I believe my experience in Watson was crucial as it allowed me not only to contribute to the very first iterations of what is now the Watson AI assistant, but also to gain invaluable customer experience in the US. As a solutions engineer I gained insight into how a product goes from conception to being incorporated into some of the most widely used banking, insurance and health systems in the world.

More recently, I was awarded a Fulbright Tech Impact scholarship and in March 2021 I spent a few months in Boulder, Colorado collaborating with a university group at the Institute of Cognitive Studies on how they could use child speech recognition to improve learning outcomes for children in math and science classes.

What is SoapBox Labs and why does it exist?

SoapBox Labs is the foremost speech engine for children’s voices in the world. It was built from the ground up to cater to 2-12 year old kids’ voices of every accent and dialect, and to ensure that every child’s voice is heard in the digital world.

Kids’ voices and behaviors are unique. Speech tech modeled primarily on adult voices - as found in mainstream off-the-shelf offerings - does not work well for kids and the younger the child, the poorer the experience. Kids deserve to be heard and to have joyful digital experiences using their voices. SoapBox Labs powers those experiences.

How do you see your role evolving over time?

When I began working at SoapBox my primary role, as the speech engineer, was to write the code to build, from scratch, the models and speech systems. Over time, as the team grew, I became more involved in the big-picture planning, creating “grand visions” of what we wanted to build, and managing the growing number of speech engineers who were implementing it. As the company grew I also became more customer focused, and one of the real pleasures of my job today is speaking with companies all over the world about the engine we have built, why it is so accurate and trustworthy, and how, though our engine and our full stack services offering, we can help clients to maximize the potential of voice technology to deliver immersive and fun voice-enabled learning and play experiences for children. Over time I hope to continue focusing on strategy and supporting my team with the knowledge and experience I have gained over the course of my career.

What initially drew you to work for Soapbox?

For me the real draw was the opportunity to build an entire speech engine from scratch for a sector of the community that is largely overlooked in voice technology – children. Adult voice tech doesn’t perform well on child speech, and for too long, we assumed that adult voice technology, though ill-fitting, was good enough. It’s largely because of this attitude that we’ve missed out on years of formative learning and play experiences for kids.

At SoapBox we have reimagined every stage of the technology building process, from the privacy considerations (child speech data obviously requires a complete rethink of how data is stored and handled) to devising new and innovative ways of modeling the speech and language patterns. We believe that kids should not be relegated to using adult speech technology hand-me-downs that they’re expected to “grow into.” Voice technology should be accurate and appropriate for children’s development, and respect the challenges and requirements that are particular to their unique stage of life. I could tell from the outset that SoapBox founder and then-CEO Dr. Patricia Scanlon had a unique vision in this regard and this ultimately influenced my decision to join her team.

Can you highlight a couple of achievements you're most proud of since you joined?

Helping to build the SoapBox engine from the ground up has been the highlight of my professional career to date, but what I’m most proud of is how consistently accurately it works across all sectors of society. Kids should be served by technology that treats them all equally, and works equally accurately regardless of the child’s accent, dialect, gender, age or socioeconomic status. The Florida Center for Reading Research (FCRR) recently completed an independent study of the accuracy of speech systems with groups of Black, Latinx and white children from different socioeconomic backgrounds. Their results showed that SoapBox performed equally well for each group, and demonstrated no bias towards or against any particular cohort. I’m very proud of that independent validation from FCRR.

I’m also really proud of the Fluency feature of the SoapBox engine. One of the things that makes speech technology for kids different is that not only does it have to work in the situations in which kids normally find themselves (e.g. reading a passage of text aloud in a noisy classroom), it also has to measure and feed back information that is pertinent to the users, be they the parent, educator, school board, content creator or the child themselves. For example, our client Amplify takes the data points returned from the SoapBox engine and uses them to populate a teacher dashboard, so the teacher can see, for each child who has practised their reading, how many reading errors they made, how well they pronounced certain words, down to the level of the individual speech sound. Amplify’s independent testing validated that SoapBox engine performed at 96% accuracy, when compared to human assessors. This is a result that I’m really proud of as it crystalises SoapBox’s place as a trustworthy tool that teachers and educators can employ to help their students on their path to literacy.

Another thing I’m very proud of is how we have designed the SoapBox engine using a privacy-by-design approach to our technology and our data which ensures that childrens’ voice data is always protected, never sold, never shared and never reused for commercial purposes.

What trends are you seeing in the voice tech industry right now that are having the most impact?

We’re seeing more and more education companies starting to add voice technology to their literacy tools in particular, to help kids practice their reading and to help teachers do more regular assessments using the granular data from our voice engine to individualise instruction and intervene at an earlier stage where needed. Voice is a powerful feature of any learning solution for kids at any and all stages of the reading journey. We’re also seeing it being used more and more often in the area of special education where it can support dyslexia screening, support kids with dyspraxia, become an integral feature of speech therapy solutions and so much more.

Based on the feedback from our own clients, 2022 will be a breakout year for voice in education. In 2022, it will become a mainstream tool and have a hugely positive impact on the literacy crisis.

Voice democratizes technology experiences for kids, enabling them to control technology even if they are pre-literate or lack the dexterity to use a hand-held device. This is as true in the games and media spaces as it is in education. Think digital and mobile games, interactive TV experiences and the metaverse! We’re seeing clients developing immersive voice-first experiences for kids with low-footprint low-latency speech systems created by SoapBox and deployed on next-gen chips. Voice will be the primary interface kids will leverage for these experiences and for 2-12 year old voices, those experiences will be powered by SoapBox Labs.

What motivates and drives you each day in your role?

I’m motivated by the knowledge that I am creating something useful that will have a positive impact on society. I believe that all artificial intelligence professionals working with emerging technologies bear a burden of responsibility to create systems that are fair and impartial in the accuracy levels regardless of the demographic of the user.

My team shares this view and works tirelessly to ensure that our speech engine shows no systematic bias towards or against users from different backgrounds or different accents and dialects. Everyday I'm inspired by the expertise, enthusiasm and creativity of my co-workers, which motivates me to work my hardest to provide a nurturing environment where our top-class linguistics, engineers and scientists can thrive.

Technology speech Data research