Text-to-Speech feature with OpenAI and LWC

Introduction : Text-to-Speech technology

Text-to-Speech (TTS) technology has become a valuable tool for enhancing user interactions in various applications. In this blog post, we will explore how to implement a Text-to-Speech feature in Salesforce using Lightning Web Components (LWC). We’ll provide a detailed walkthrough of the LWC code, discuss potential use cases within Salesforce.

📺 Demo

Understanding the Text-to-Speech Lightning Web Component

Let’s dive into the code and explore its inner workings.

<template>
    <lightning-card title="Text to Speech Conversion">
        <div class="input-container">
            <lightning-textarea label="Enter Text" value={inputText} onchange={handleInputChange}></lightning-textarea>
        </div>
        <div class="button-container">
            <lightning-button-icon
                icon-name="utility:listen"
                variant="brand"
                onclick={convertToSpeech}
                alternative-text="Convert to Speech"
                disabled={isConverting}
            ></lightning-button-icon>
            <lightning-spinner if:true={isConverting} alternative-text="Loading"></lightning-spinner>
            <audio if:true={audioUrl} controls oncanplaythrough={startPlaying}>
                <source src={audioUrl} type="audio/mp3" />
            </audio>
        </div>
    </lightning-card>
</template>

This Lightning Web Component consists of a user interface designed to convert text input to speech. Here’s a breakdown:

Input Text Area: Uses lightning-textarea to capture the user’s input.
Convert Button: Utilizes lightning-button-icon to trigger the text-to-speech conversion process. It is disabled (disabled={isConverting}) when the conversion is in progress.
Loading Spinner: Displays a spinner (lightning-spinner) while the text-to-speech conversion is ongoing.
Audio Player: Employs the <audio> HTML element to play the generated audio. The oncanplaythrough event triggers the startPlaying method, enabling automatic playback.

Now, let’s examine the JavaScript code:

import { LightningElement } from 'lwc';

export default class TextToSpeech extends LightningElement {
    inputText = '';
    audioUrl;
    isConverting = false;

    handleInputChange(event) {
        this.inputText = event.target.value;
    }

    convertToSpeech() {
        if (!this.inputText.trim()) {
            console.error('Input text is empty or contains only whitespace.');
            return;
        }

        this.callOpenAIAPI();
    }

    async callOpenAIAPI() {
        const apiUrl = 'https://api.openai.com/v1/audio/speech';
        const apiKey = 'OPEN_AI_KEY'; // Replace with your actual OpenAI API key
        this.showHideSpinner(true);
        this.audioUrl = '';

        try {
            const response = await fetch(apiUrl, {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                    'Authorization': `Bearer ${apiKey}`,
                },
                body: JSON.stringify({
                    model: 'tts-1',
                    input: this.inputText,
                    voice: 'alloy',
                }),
            });

            if (response.ok) {
                const blob = await response.blob();
                const audioUrl = URL.createObjectURL(blob);
                this.audioUrl = audioUrl;
                this.showHideSpinner(false);
            } else {
                console.error('Failed to convert text to speech:', response.status, response.statusText);
                this.showHideSpinner(false);
            }
        } catch (error) {
            console.error('Error during text-to-speech conversion:', error);
            this.showHideSpinner(false);
        }
    }

    startPlaying() {
        const audioElement = this.template.querySelector('audio');
        if (audioElement) {
            audioElement.play();
        }
    }

    showHideSpinner(bool) {
        this.isConverting = bool;
    }
}

This JavaScript code orchestrates the text-to-speech conversion process. Key functions include:

handleInputChange: Captures and updates user input.
convertToSpeech: Initiates the text-to-speech conversion by calling callOpenAIAPI if input is valid.
callOpenAIAPI: Makes a POST request to the OpenAI API, handling success and error scenarios. It updates audioUrl for audio playback and manages the loading spinner.
startPlaying: Programmatically starts audio playback.
showHideSpinner: Toggles the loading spinner based on the conversion state.

Adding a URL to the Trusted Sites is essential because it establishes a secure and controlled environment for making callouts, especially from JavaScript code. Salesforce follows a security model that restricts cross-site scripting and ensures that only approved domains can be accessed to prevent potential security vulnerabilities.

Ensure that you include the OpenAI API endpoint URL in the trusted sites of your Salesforce org. Adding the URL to trusted sites is necessary when making callouts from JavaScript code.

Avoid hardcoding the OpenAI API key directly in the component. It’s recommended to use a server-side Apex method & Named Credentials to make the API call securely. This way, the API key is not exposed to the client-side code.

Use Cases in Salesforce

Accessibility Features:
- Assist users with visual impairments by providing spoken content.
Voice-Enabled User Interfaces:
- Enhance user interactions by allowing voice-enabled commands and responses.
Call Logging and Summaries:
- Convert call notes into audio summaries for quick review by sales representatives.
Training Materials:
- Improve training resources by offering audio versions of written content.
Voice Alerts and Notifications:
- Deliver important updates, deadlines, or changes via spoken messages.
Multilingual Support:
- Support users with diverse language preferences by providing text-to-speech in multiple languages.

The Importance of Text-to-Speech Technology

Text-to-Speech (TTS) technology plays a pivotal role in modern applications, contributing significantly to accessibility, user engagement, and the creation of dynamic user interfaces. Here’s a closer look at the importance of Text-to-Speech and some key research data supporting its significance:

1. Accessibility Enhancement:

Research Insight: According to the World Health Organization (WHO), over 2 billion people globally have some form of visual impairment. TTS technology bridges accessibility gaps by converting text content into spoken words, enabling individuals with visual impairments to consume information effectively.

2. User Engagement and Experience:

Research Insight: A study conducted by Nielsen Norman Group found that users tend to scan rather than read web content, with an average page visit lasting only about 10-20 seconds. TTS can enhance engagement by providing an audio alternative, allowing users to multitask or consume content in a more passive manner.

3. Multilingual Support:

Research Insight: In a survey by Common Sense Advisory, 75% of consumers expressed a preference for purchasing products in their native language. TTS technology facilitates the delivery of content in multiple languages, catering to diverse user preferences and expanding the global reach of applications.

4. Learning and Training Resources:

Research Insight: Educational research by Mayer and Moreno suggests that the combination of visual and auditory information enhances learning retention. TTS can be a valuable asset in educational materials, providing audio versions of textbooks, training modules, and other learning resources.

5. Time Efficiency:

Research Insight: The National Center for Biotechnology Information (NCBI) highlights that auditory processing is faster than reading, with an average speaking rate of 125-150 words per minute compared to the typical reading rate of 200-250 words per minute. TTS can save time for users, especially in scenarios where information needs to be quickly absorbed.

6. Voice-Enabled Interfaces:

Research Insight: Gartner predicts that by 2023, 25% of employee interactions with applications will be voice-enabled. TTS is a key component in the development of voice-activated interfaces, providing a natural and intuitive way for users to interact with applications.

7. Inclusive Design:

Research Insight: The Web Content Accessibility Guidelines (WCAG) emphasize the importance of creating content that is perceivable, operable, and understandable by a diverse audience. TTS aligns with inclusive design principles, ensuring that digital experiences are accessible to users with different abilities and preferences.

Text-to-Speech technology is a transformative force, making digital content accessible, engaging, and efficient. As applications continue to evolve, the integration of TTS, especially with advanced capabilities from platforms like OpenAI, will play a crucial role in shaping the future of user experiences.

Conclusion

In this blog post, we explored the implementation of a Text-to-Speech feature in Salesforce using Lightning Web Components. We discussed the structure of the LWC code, identified potential use cases within Salesforce, importance of TTS technology and outlined best practices for making secure and maintainable API calls.

Implementing features like Text-to-Speech not only enhances accessibility but also opens up new possibilities for creating dynamic and engaging user experiences within the Salesforce platform. As organizations continue to leverage the power of Salesforce, integrating innovative features like Text-to-Speech can contribute to improved user productivity and satisfaction.

About the blog

SFDCLessons is a blog where you can find various Salesforce tutorials and tips that we have written to help beginners and experienced developers alike. we also share my experience and knowledge on Salesforce best practices, troubleshooting, and optimization. Don’t forget to follow us on:

Newsletter

Subscribe to our email newsletter to be notified when a new post is published.

This Post Has 2 Comments

Rohan Danwade December 12, 2023 Reply

Th component is not woking even after replacing with your actual OpenAI API key

Loading...
1. Arun Kumar December 13, 2023 Reply
  
  Hi Rohan,
  
  Which error message are you encountering? Make sure that you’ve included the OpenAI URL (https://api.openai.com) in the trusted sites within Salesforce. It’s essential to add the URLs to the trusted list when making callouts from JavaScript code.
  
  The post has been updated to include instructions on adding trusted sites.
  
  https://help.salesforce.com/s/articleView?id=sf.security_trusted_urls_manage.htm&type=5
  
  Thanks
  
  Loading...