Busuu

·

2024

·

iOS

Busuu

·

2024

·

iOS

AI Speaking practice

AI Speaking practice

AI Speaking practice

My experience designing an end-to-end speaking practice feature that leveraged generative AI to solve for speaking anxiety.

Chapter 1

Context

Chapter 1

Context

Busuu is a language learning app designed to help users develop practical communication skills through interactive, self-paced lessons.

In 2023, our research uncovered a recurring theme: fear of speaking. A lack of confidence, compounded by real-time speaking anxiety, was stopping them from practicing the very skill they needed most.

In order to understand the problem space further, we sent out a survey and conducted 10+ interviews to dig deeper into the role of emerging tech language learning.

Chapter 2

Research

Chapter 2

Research

What are the teachers' goals?
  1. Progress

Teachers want to seeing students improve in their target language and make progress.

  1. Engagement

When students are engaged, they book more lessons and/or increase frequency of lessons thus making progress.

  1. Students’ own motivation

When students self-studied between lessons and showed willingness to grow, teachers felt fulfilled in their profession.

However , their goals were hindered by 3 main pain points. Our most significant discovery was around the topic of discoverability: 4 out of 8 teachers did not know where to find the lesson summary feature.

Chapter 2

Research

Chapter 2

Research

Top consumer segments

1

Younger generation

The younger generation learn about new cuisine through social media

2

Local population

With increased immigration and mobility, local residents are exposed to new food.

3

Immigrant, expats

Immigrants and expats search for authentic food items from their home country.

Market size

2019-2024 projected growth

11.8%

11.8%

CAGR

⭐️

Connect to Content

Add layers or components to make infinite auto-playing slideshows.

16%

Churned used cited lack of speaking practice as the reason for leaving the app

16%

Churned used cited lack of speaking practice as the reason for leaving the app

16%

Churned used cited lack of speaking practice as the reason for leaving the app

6 out of 7

Active users cited speaking practice in the app as their most desired feature

6 out of 7

Active users cited speaking practice in the app as their most desired feature

6 out of 7

Active users cited speaking practice in the app as their most desired feature

From 500+ participant survey

Opportunity

How might we help success-seeking learners build speaking confidence using the benefits of AI?

Chapter 5

Ideating solutions

Chapter 5

Ideating solutions

We kicked off a 3-day design sprint with the goal of designing a low-pressure, AI-powered speaking practice feature that felt uniquely Busuu. Through rapid exploration, we identified three key focus areas for our solution:

  • A user-friendly listen-and-repeat exercise to practice pronunciation.

  • A sense of human connection, even in an asynchronous environment.

  • AI-powered, centering our use case around helping learners improve through instant, personalized feedback.

Chapter 1

Overcoming challenges

Chapter 1

Overcoming challenges

Chapter 1

Overcoming challenges

We had a promising solution on the table, but bringing it to life with emerging LLM tech came with its own set of challenges.

Challenge 1

.

Defining our competitive edge

To stand out from competitors, we wanted to build on our strength -- high-quality videos with real people -- and design a more immersive speaking experience. This meant making a bold decision to break from standard patterns in the app and go with a seamless interaction that integrated with the video background.

Challenge 1

.

Defining our competitive edge

To stand out from competitors, we wanted to build on our strength -- high-quality videos with real people -- and design a more immersive speaking experience. This meant making a bold decision to break from standard patterns in the app and go with a seamless interaction that integrated with the video background.

Challenge 1

.

Defining our competitive edge

To stand out from competitors, we wanted to build on our strength -- high-quality videos with real people -- and design a more immersive speaking experience. This meant making a bold decision to break from standard patterns in the app and go with a seamless interaction that integrated with the video background.

Challenge 2

.

Building trust through transparency

With privacy becoming an increasingly important concern, we aimed to prioritize transparency from the very first interaction with the feature. Since this was our first initiative involving voice recording, data retention, and model training, we made sure to communicate and secure users' explicit consent before they engaged with the feature.

Challenge 2

.

Building trust through transparency

With privacy becoming an increasingly important concern, we aimed to prioritize transparency from the very first interaction with the feature. Since this was our first initiative involving voice recording, data retention, and model training, we made sure to communicate and secure users' explicit consent before they engaged with the feature.

Challenge 2

.

Building trust through transparency

With privacy becoming an increasingly important concern, we aimed to prioritize transparency from the very first interaction with the feature. Since this was our first initiative involving voice recording, data retention, and model training, we made sure to communicate and secure users' explicit consent before they engaged with the feature.

Challenge 2

.

Building trust through transparency

With privacy becoming an increasingly important concern, we aimed to prioritize transparency from the very first interaction with the feature. Since this was our first initiative involving voice recording, data retention, and model training, we made sure to communicate and secure users' explicit consent before they engaged with the feature.

Challenge 3

.

Finding the "right" type of feedback

We quickly realized there were multiple ways to present pronunciation feedback: overall scores, IPA transcriptions, and breakdowns by areas like fluency and pronunciation. To align the experience with users’ mental models, we ran both moderated and unmoderated usability tests and found that clarity -- no scores or IPA -- was the preferred option.

Challenge 3

.

Finding the "right" type of feedback

We quickly realized there were multiple ways to present pronunciation feedback: overall scores, IPA transcriptions, and breakdowns by areas like fluency and pronunciation. To align the experience with users’ mental models, we ran both moderated and unmoderated usability tests and found that clarity -- no scores or IPA -- was the preferred option.

Challenge 3

.

Finding the "right" type of feedback

We quickly realized there were multiple ways to present pronunciation feedback: overall scores, IPA transcriptions, and breakdowns by areas like fluency and pronunciation. To align the experience with users’ mental models, we ran both moderated and unmoderated usability tests and found that clarity -- no scores or IPA -- was the preferred option.

Challenge 3

.

Finding the "right" type of feedback

We quickly realized there were multiple ways to present pronunciation feedback: overall scores, IPA transcriptions, and breakdowns by areas like fluency and pronunciation. To align the experience with users’ mental models, we ran both moderated and unmoderated usability tests and found that clarity -- no scores or IPA -- was the preferred option.

Percentages are too abstract… what does 85% accurate really means? - Konstantin

Chapter 4

Launch outcomes

Chapter 4

Launch outcomes

Chapter 4

Launch outcomes

The team launched the Speaking practice feature as an A/B experiment. Within two months, analysis showed a 3.59% uplift in conversion, marking the feature as a success as well as showing profitability. Despite infrastructure costs from OpenAI and Azure, it remained profitable, proving its sustainability.

Pronunciation was always a limited skill because of fear, and lack of feedback.. so I feel that the new [speaking practice] feature helps me speak more naturally. - Paloma

Chapter 5

Learnings & iterations

Chapter 5

Learnings & iterations

From post-launch interviews and retros, we identified 3 key learnings:

  • LLM quality isn’t a one-size-fits-all: with 13 interface languages on Busuu, we saw very inconsistent AI feedback quality across languages. We're now working closely with the localization team to roll out the feature gradually as language models continue to improve.

  • Users want more pronunciation help: one of the strongest product opportunities came from user interviews: there was a desire to review and revisit difficult sounds. This insight opens the door to future improvements, like a dedicated pronunciation review tool to help learners target their weak areas.

  • Privacy considerations needs to lead, not follow: real-time voice data and retention comes with real responsibilities. If I could revisit this process, I’d bring in legal and security partners much earlier to build a smoother, more scalable privacy pipeline from the start.