chat gpt help , speech text. see in last pages. for actual code.
- Get link
- X
- Other Apps
i want to execute below code: this code save as C:\Users\04758W744\Downloads\tts.py ; C:\Users\04758W744\Downloads\credentials; modify tts.py code, to access these file locations and correct code if required. steps required to execute code? import os import json from google.cloud import texttospeech # Load your Google Cloud project credentials with open('credentials.json') as f: credentials = json.load(f) # Configure the Text-to-Speech client client = texttospeech.TextToSpeechClient.from_service_account_json( 'credentials.json' ) # Telugu text to be converted text = "నీకు నేను ఎలా ఉన్నాను" # Configure voice and speed voice = texttospeech.VoiceSelectionParams( language_code='te-IN', # Telugu (India) name='te-IN-Standard-A', # Female voice ssml_gender=texttospeech.SsmlVoiceGender.FEMALE ) audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.MP3, speaking_rate=1.0 # Normal speed, 1.0 is default ) # Synthesize the speech synthesis_input = texttospeech.SynthesisInput(text=text) response = client.synthesize_speech( input=synthesis_input, voice=voice, audio_config=audio_config ) # Save the MP3 file with open('output.mp3', 'wb') as out: out.write(response.audio_content) print('Audio content written to file "output.mp3"')
To modify and execute the tts.py
code using the credentials from the specified location, follow these steps:
Install Google Cloud SDK: Make sure you have the Google Cloud SDK installed and configured on your system. This includes installing the
google-cloud-texttospeech
library.bashpip install google-cloud-texttospeech
Modify
tts.py
Code: Update the code to use the correct file paths for the credentials and to save the MP3 file.pythonimport os import json from google.cloud import texttospeech # Set the file paths credentials_path = r'C:\Users\04758W744\Downloads\credentials.json' output_path = r'C:\Users\04758W744\Downloads\output.mp3' # Load your Google Cloud project credentials with open(credentials_path) as f: credentials = json.load(f) # Configure the Text-to-Speech client client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path) # Telugu text to be converted text = "నీకు నేను ఎలా ఉన్నాను" # Configure voice and speed voice = texttospeech.VoiceSelectionParams( language_code='te-IN', # Telugu (India) name='te-IN-Standard-A', # Female voice ssml_gender=texttospeech.SsmlVoiceGender.FEMALE ) audio_config = texttospeech.AudioConfig( audio_encoding=texttospeech.AudioEncoding.MP3, speaking_rate=1.0 # Normal speed, 1.0 is default ) # Synthesize the speech synthesis_input = texttospeech.SynthesisInput(text=text) response = client.synthesize_speech( input=synthesis_input, voice=voice, audio_config=audio_config ) # Save the MP3 file with open(output_path, 'wb') as out: out.write(response.audio_content) print('Audio content written to file "output.mp3"')
Execute the Code:
Open a command prompt or terminal.
Navigate to the directory where
tts.py
is saved.bashcd C:\Users\04758W744\Downloads
Run the script:
bashpython tts.py
Ensure that the
google-cloud-texttospeech
library is installed in the environment where you are running the script.
These steps will help you execute the tts.py
script correctly with the provided file locations.
To install the Google Cloud SDK on your machine, follow these steps:
1. Download the SDK Installer
Windows:
- Go to the Google Cloud SDK download page.
- Under "Installing the latest Cloud SDK version", find the section for Windows.
- Download the installer for Windows (
google-cloud-sdk-<VERSION>-windows-x86_64-bundled-python.zip
).
2. Run the Installer
- Unzip the downloaded file.
- Run the
google-cloud-sdk\install.bat
file. - Follow the prompts to complete the installation.
3. Initialize the SDK
Open a command prompt (cmd) or PowerShell.
Run the following command to initialize the SDK:
bashgcloud init
Follow the prompts to log in to your Google account and configure the SDK. You will be asked to choose a project and set default configurations.
4. Install the google-cloud-texttospeech
Library
Open a command prompt or PowerShell and run the following command to install the google-cloud-texttospeech
library:
bashpip install google-cloud-texttospeech
5. Verify the Installation
You can verify that the SDK is installed and working by running the following command:
bashgcloud --version
This command should display the installed version of the Google Cloud SDK and other components.
Summary
- Download the SDK Installer: From the Google Cloud SDK download page for Windows.
- Run the Installer: Unzip and run
google-cloud-sdk\install.bat
. - Initialize the SDK: Run
gcloud init
and follow the prompts. - Install the Library: Run
pip install google-cloud-texttospeech
. - Verify the Installation: Run
gcloud --version
.
These steps will set up the Google Cloud SDK on your machine, allowing you to use the google-cloud-texttospeech
library in your Python scripts.
To execute this code online without installing any software on your local machine, you can use Google Colab, which provides a cloud-based Jupyter notebook environment. Here are the steps:
Step 1: Open Google Colab
- Go to Google Colab.
Step 2: Create a New Notebook
- Click on "New Notebook" to create a new Jupyter notebook.
Step 3: Upload Your Credentials File
- Click on the folder icon on the left sidebar to open the file browser.
- Click the upload button (an icon of a paper with an upward arrow) and upload your
credentials.json
file.
Step 4: Write and Execute the Code
- Copy and paste the following code into a cell in the notebook. Make sure to adjust the file path for the uploaded
credentials.json
file.
python!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
# Load your Google Cloud project credentials
with open('/content/credentials.json') as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json('/content/credentials.json')
# Telugu text to be converted
text = "నీకు నేను ఎలా ఉన్నాను"
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Save the MP3 file
with open('/content/output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
- Run the cell by pressing Shift + Enter. This will install the required library and execute the code.
Step 5: Download the Output File
- After the code runs successfully, you will see the message "Audio content written to file 'output.mp3'".
- To download the
output.mp3
file, click on the file icon on the left sidebar to open the file browser. - Right-click on the
output.mp3
file and select "Download".
This method allows you to execute the code online without installing any software on your local machine.
To ensure that the credentials file is correctly located and accessible in Google Colab, follow these steps carefully:
Step 1: Open Google Colab
- Go to Google Colab.
Step 2: Create a New Notebook
- Click on "New Notebook" to create a new Jupyter notebook.
Step 3: Upload Your Credentials File
- Click on the folder icon on the left sidebar to open the file browser.
- Click the upload button (an icon of a paper with an upward arrow) and upload your
credentials.json
file.
Step 4: Verify File Upload
- After uploading, ensure that the file
credentials.json
is listed under/content
in the file browser on the left sidebar.
Step 5: Write and Execute the Code
- Copy and paste the following code into a cell in the notebook. This code installs the necessary library and executes the text-to-speech conversion.
python# Install the necessary library
!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
# Verify the file exists
import os
if os.path.exists('/content/credentials.json'):
print("Credentials file found.")
else:
print("Credentials file not found.")
# Load your Google Cloud project credentials
with open('/content/credentials.json') as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json('/content/credentials.json')
# Telugu text to be converted
text = "నీకు నేను ఎలా ఉన్నాను"
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Save the MP3 file
with open('/content/output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
- Run the cell by pressing Shift + Enter. This will install the required library and execute the code.
Step 6: Download the Output File
- After the code runs successfully, you will see the message "Audio content written to file 'output.mp3'".
- To download the
output.mp3
file, click on the file icon on the left sidebar to open the file browser. - Right-click on the
output.mp3
file and select "Download".
Summary
- Open Google Colab and create a new notebook.
- Upload the
credentials.json
file to the Colab environment. - Verify the file upload to ensure it is in the correct location.
- Copy and run the provided code, making sure it references
/content/credentials.json
. - Download the output file from the Colab environment.
This method ensures that the credentials file is correctly located and accessible in the Colab environment.
Great to hear it worked! To upload a text file containing the Telugu text ("నీకు నేను ఎలా ఉన్నాను"
) and use it in the code, follow these steps:
Step 1: Upload the Text File
- Create a text file named
text_input.txt
containing the text"నీకు నేను ఎలా ఉన్నాను"
. - Upload the
text_input.txt
file to the Colab environment using the file upload feature.
Step 2: Modify and Run the Code
Here's the modified code to read the text from the uploaded file and use it in the text-to-speech conversion:
python# Install the necessary library
!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
import os
# Verify the file exists
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Load the text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Save the MP3 file
with open(output_file_path, 'wb') as out:
out.write(response.audio_content)
print(f'Audio content written to file "{output_file_path}"')
Summary
- Upload
text_input.txt
: Ensure it is in the Colab environment. - Modify and run the code: The code now reads text from the
text_input.txt
file and performs text-to-speech conversion.
This approach will allow you to dynamically load text from a file and convert it to speech.
Here’s the modified code that reads the text from a file and saves the output MP3 file, incorporating the structure you provided:
python!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Save the MP3 file
with open(output_file_path, 'wb') as out:
out.write(response.audio_content)
print(f'Audio content written to file "{output_file_path}"')
Summary of Changes
- Read Text from File: Loads the text from
text_input.txt
instead of using a hardcoded string. - Update Paths: Uses
/content/credentials.json
and/content/text_input.txt
for file locations in Google Colab.
Upload the text_input.txt
file containing your Telugu text to the Colab environment, and then run this script to generate the MP3 file.
input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.125.95:443 {grpc_message:"Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.", grpc_status:3, created_time:"2024-07-30T16:18:46.044708684+00:00"}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
76 return callable_(*args, **kwargs)
77 except grpc.RpcError as exc:
---> 78 raise exceptions.from_grpc_error(exc) from exc
79
80 return error_remapped_callable
InvalidArgument: 400 Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.The error you're encountering suggests that the text you are trying to synthesize exceeds the size limit for a single request in the Google Cloud Text-to-Speech API. The limit is 5000 bytes for the text or SSML.
Steps to Handle Large Text
Check the Length of Your Text: Ensure that your text does not exceed the limit. For the Telugu text
"నీకు నేను ఎలా ఉన్నాను"
, it should be well below the limit, so this error may be caused by some other issue.Split Large Text: If you are working with a large amount of text, split it into smaller chunks and process each chunk separately.
Long Audio API: For text exceeding the limit, you should consider using the Long Audio API for handling larger texts.
Modified Code to Handle Large Text
Here's an updated version of the code to handle potential issues with large texts. It includes error handling for large text sizes and processes text in chunks if necessary.
python!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Function to split text into chunks
def split_text(text, max_length=5000):
"""Split text into chunks of max_length bytes."""
chunks = []
while len(text) > max_length:
split_point = text.rfind(' ', 0, max_length)
if split_point == -1:
split_point = max_length
chunks.append(text[:split_point])
text = text[split_point:].strip()
chunks.append(text)
return chunks
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech in chunks
def synthesize_speech_in_chunks(text_chunks):
audio_contents = []
for chunk in text_chunks:
synthesis_input = texttospeech.SynthesisInput(text=chunk)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
audio_contents.append(response.audio_content)
return b''.join(audio_contents)
# Process text and save audio
text_chunks = split_text(text)
audio_content = synthesize_speech_in_chunks(text_chunks)
# Save the MP3 file
with open(output_file_path, 'wb') as out:
out.write(audio_content)
print(f'Audio content written to file "{output_file_path}"')
Summary
- Error Handling: The code now includes a function to split large texts into smaller chunks.
- Text Splitting: If the text is too large, it is split into manageable chunks and processed sequentially.
- Combine Audio: The audio contents from each chunk are combined into a single MP3 file.
This approach should help manage large texts and avoid the size limit issue.
input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.132.95:443 {created_time:"2024-07-30T16:20:58.344603795+00:00", grpc_status:3, grpc_message:"Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis."}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
76 return callable_(*args, **kwargs)
77 except grpc.RpcError as exc:
---> 78 raise exceptions.from_grpc_error(exc) from exc
79
80 return error_remapped_callable
InvalidArgument: 400 Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.It seems like the text might still be too long or there's a problem with the text encoding. Let's address this step-by-step:
1. Verify Text Length
Double-check the length of the text in text_input.txt
. Although "నీకు నేను ఎలా ఉన్నాను"
is quite short, ensure there are no hidden characters or encoding issues.
2. Simplify and Test
To rule out issues, try running the code with a simple, short text directly in the script to confirm that the API and code are functioning correctly. This will help isolate whether the issue is with the text file or the overall setup.
Simplified Code Example
python!pip install google-cloud-texttospeech
import json
from google.cloud import texttospeech
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Test with a simple, short text
test_text = "Hello, this is a test."
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Synthesize the speech
synthesis_input = texttospeech.SynthesisInput(text=test_text)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# Save the MP3 file
with open(output_file_path, 'wb') as out:
out.write(response.audio_content)
print(f'Audio content written to file "{output_file_path}"')
3. Check Text Encoding
Ensure that the text file is properly encoded. UTF-8 is generally recommended for text files containing non-ASCII characters. You can re-save the file in UTF-8 encoding using a text editor or directly in Colab:
python# Re-read and re-save the text file with UTF-8 encoding
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Save it back to ensure proper encoding
with open(text_file_path, 'w', encoding='utf-8') as file:
file.write(text)
4. Use the Long Audio API
If the text is indeed too long, consider breaking it into smaller chunks. However, if you're still having issues, and you know your text length should be valid, try the Long Audio API:
Summary
- Test with Simplified Code: Verify the API works with short, simple text.
- Check Text Encoding: Ensure the file is properly encoded in UTF-8.
- Use Long Audio API: For very large texts, refer to the Long Audio API documentation.
These steps should help you troubleshoot and resolve the issue with the text-to-speech conversion.
To handle large texts by splitting them into smaller chunks and appending the audio files, you can modify the code as follows:
- Split the text into manageable chunks.
- Synthesize speech for each chunk.
- Append the audio files together into a single MP3 file.
Here’s how you can modify the code to achieve this:
python!pip install google-cloud-texttospeech pydub
import json
from google.cloud import texttospeech
from pydub import AudioSegment
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Function to split text into chunks
def split_text(text, max_length=5000):
"""Split text into chunks of max_length bytes."""
chunks = []
while len(text) > max_length:
split_point = text.rfind(' ', 0, max_length)
if split_point == -1:
split_point = max_length
chunks.append(text[:split_point])
text = text[split_point:].strip()
chunks.append(text)
return chunks
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Function to synthesize speech for each chunk
def synthesize_speech(text_chunk):
synthesis_input = texttospeech.SynthesisInput(text=text_chunk)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
return response.audio_content
# Function to save audio content to a temporary file
def save_temp_audio(audio_content, temp_file_path):
with open(temp_file_path, 'wb') as out:
out.write(audio_content)
# Process text and combine audio files
text_chunks = split_text(text)
temp_files = []
for i, chunk in enumerate(text_chunks):
temp_file_path = f'/content/temp_chunk_{i}.mp3'
temp_files.append(temp_file_path)
audio_content = synthesize_speech(chunk)
save_temp_audio(audio_content, temp_file_path)
# Combine audio files
combined = AudioSegment.empty()
for temp_file in temp_files:
audio_segment = AudioSegment.from_mp3(temp_file)
combined += audio_segment
# Export the combined audio to the final output file
combined.export(output_file_path, format='mp3')
print(f'Audio content written to file "{output_file_path}"')
# Clean up temporary files
for temp_file in temp_files:
os.remove(temp_file)
Summary
- Install Required Libraries:
pydub
for audio manipulation andgoogle-cloud-texttospeech
for text-to-speech. - Split Text: Divide the text into chunks if it's too large.
- Synthesize Speech: Generate audio for each chunk and save it as a temporary MP3 file.
- Combine Audio Files: Use
pydub
to concatenate all temporary MP3 files into a single MP3 file. - Cleanup: Remove temporary files after combining them.
This approach ensures that even if the text is large, it gets processed in manageable chunks, and all generated audio is appended to create a single output file.
The error indicates that the enable_time_pointing
field is not recognized in the AudioConfig
class. This field is not required for your current purpose. Let's simplify the code by removing this field.
Here's the revised code without the enable_time_pointing
field:
python!pip install google-cloud-texttospeech pydub
import json
from google.cloud import texttospeech
from pydub import AudioSegment
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Function to split text into chunks
def split_text(text, max_length=5000):
"""Split text into chunks of max_length bytes."""
chunks = []
while len(text) > max_length:
split_point = text.rfind(' ', 0, max_length)
if split_point == -1:
split_point = max_length
chunks.append(text[:split_point])
text = text[split_point:].strip()
chunks.append(text)
return chunks
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Function to synthesize speech for each chunk
def synthesize_speech(text_chunk):
synthesis_input = texttospeech.SynthesisInput(text=text_chunk)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
return response.audio_content
# Function to save audio content to a temporary file
def save_temp_audio(audio_content, temp_file_path):
with open(temp_file_path, 'wb') as out:
out.write(audio_content)
# Process text and combine audio files
text_chunks = split_text(text)
temp_files = []
for i, chunk in enumerate(text_chunks):
temp_file_path = f'/content/temp_chunk_{i}.mp3'
temp_files.append(temp_file_path)
audio_content = synthesize_speech(chunk)
save_temp_audio(audio_content, temp_file_path)
# Combine audio files
combined = AudioSegment.empty()
for temp_file in temp_files:
audio_segment = AudioSegment.from_mp3(temp_file)
combined += audio_segment
# Export the combined audio to the final output file
combined.export(output_file_path, format='mp3')
print(f'Audio content written to file "{output_file_path}"')
# Clean up temporary files
for temp_file in temp_files:
os.remove(temp_file)
Steps
- Install Required Libraries: Ensure
pydub
andgoogle-cloud-texttospeech
are installed. - Verify File Paths: Check if the credentials and text files exist.
- Load Credentials: Load Google Cloud credentials from the specified path.
- Load Text: Read the text from the specified file.
- Split Text: Divide the text into chunks to handle large texts.
- Configure TTS: Set up the voice and audio configurations for the Text-to-Speech API.
- Synthesize Speech: Generate audio for each text chunk.
- Save Temporary Audio Files: Save the audio content to temporary MP3 files.
- Combine Audio Files: Concatenate all temporary MP3 files into a single MP3 file using
pydub
. - Cleanup: Remove temporary files after combining them.
This code should handle large texts by processing them in chunks and appending the audio files correctly.
input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.206.95:443 {grpc_message:"Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.", grpc_status:3, created_time:"2024-07-30T16:28:42.538432672+00:00"}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
76 return callable_(*args, **kwargs)
77 except grpc.RpcError as exc:
---> 78 raise exceptions.from_grpc_error(exc) from exc
79
80 return error_remapped_callable
InvalidArgument: 400 Either input.text
or input.ssml
is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.The error indicates that the text or SSML input exceeds the 5000-byte limit, which means we need to ensure that each chunk of text we send to the Google Cloud Text-to-Speech API is within this limit. Since the limit is in bytes, and different characters may have different byte lengths, we'll need to adjust our text splitting method accordingly.
Let's modify the split_text
function to handle the byte length more accurately and ensure that each chunk is within the 5000-byte limit.
Here’s the updated code:
python!pip install google-cloud-texttospeech pydub
import json
from google.cloud import texttospeech
from pydub import AudioSegment
import os
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Function to split text into chunks based on byte size
def split_text(text, max_bytes=5000):
"""Split text into chunks of max_bytes bytes."""
chunks = []
current_chunk = ""
current_chunk_bytes = 0
for char in text:
char_bytes = len(char.encode('utf-8'))
if current_chunk_bytes + char_bytes > max_bytes:
chunks.append(current_chunk)
current_chunk = char
current_chunk_bytes = char_bytes
else:
current_chunk += char
current_chunk_bytes += char_bytes
if current_chunk:
chunks.append(current_chunk)
return chunks
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Function to synthesize speech for each chunk
def synthesize_speech(text_chunk):
synthesis_input = texttospeech.SynthesisInput(text=text_chunk)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
return response.audio_content
# Function to save audio content to a temporary file
def save_temp_audio(audio_content, temp_file_path):
with open(temp_file_path, 'wb') as out:
out.write(audio_content)
# Process text and combine audio files
text_chunks = split_text(text)
temp_files = []
for i, chunk in enumerate(text_chunks):
temp_file_path = f'/content/temp_chunk_{i}.mp3'
temp_files.append(temp_file_path)
audio_content = synthesize_speech(chunk)
save_temp_audio(audio_content, temp_file_path)
# Combine audio files
combined = AudioSegment.empty()
for temp_file in temp_files:
audio_segment = AudioSegment.from_mp3(temp_file)
combined += audio_segment
# Export the combined audio to the final output file
combined.export(output_file_path, format='mp3')
print(f'Audio content written to file "{output_file_path}"')
# Clean up temporary files
for temp_file in temp_files:
os.remove(temp_file)
Explanation:
- Text Splitting by Byte Length: The
split_text
function now ensures that each chunk is within the 5000-byte limit. This is achieved by adding characters to the current chunk and checking the byte length after each addition. If the byte length exceeds the limit, the current chunk is added to the list of chunks, and a new chunk is started. - Synthesize and Combine: Each chunk is processed individually, and the resulting audio files are combined into a single MP3 file using
pydub
.
This approach ensures that the text is split properly and avoids the INVALID_ARGUMENT
error due to exceeding the byte limit.
We can modify the code to append audio data directly in memory rather than generating multiple files. This way, we can concatenate the audio chunks as they are generated and then save the final combined audio to a single MP3 file.
Here's the updated code to append the audio chunks in memory:
python!pip install google-cloud-texttospeech pydub
import json
from google.cloud import texttospeech
from pydub import AudioSegment
import os
from io import BytesIO
# Paths to the files
credentials_path = '/content/credentials.json'
text_file_path = '/content/text_input.txt'
output_file_path = '/content/output.mp3'
# Verify the file existence
if os.path.exists(credentials_path):
print("Credentials file found.")
else:
print("Credentials file not found.")
if os.path.exists(text_file_path):
print("Text file found.")
else:
print("Text file not found.")
# Load your Google Cloud project credentials
with open(credentials_path) as f:
credentials = json.load(f)
# Configure the Text-to-Speech client
client = texttospeech.TextToSpeechClient.from_service_account_json(credentials_path)
# Load Telugu text from the file
with open(text_file_path, 'r', encoding='utf-8') as file:
text = file.read().strip()
# Function to split text into chunks based on byte size
def split_text(text, max_bytes=5000):
"""Split text into chunks of max_bytes bytes."""
chunks = []
current_chunk = ""
current_chunk_bytes = 0
for char in text:
char_bytes = len(char.encode('utf-8'))
if current_chunk_bytes + char_bytes > max_bytes:
chunks.append(current_chunk)
current_chunk = char
current_chunk_bytes = char_bytes
else:
current_chunk += char
current_chunk_bytes += char_bytes
if current_chunk:
chunks.append(current_chunk)
return chunks
# Configure voice and speed
voice = texttospeech.VoiceSelectionParams(
language_code='te-IN', # Telugu (India)
name='te-IN-Standard-A', # Female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=1.0 # Normal speed, 1.0 is default
)
# Function to synthesize speech for each chunk
def synthesize_speech(text_chunk):
synthesis_input = texttospeech.SynthesisInput(text=text_chunk)
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
return response.audio_content
# Process text and combine audio files in memory
text_chunks = split_text(text)
combined = AudioSegment.empty()
for chunk in text_chunks:
audio_content = synthesize_speech(chunk)
audio_segment = AudioSegment.from_file(BytesIO(audio_content), format="mp3")
combined += audio_segment
# Export the combined audio to the final output file
combined.export(output_file_path, format='mp3')
print(f'Audio content written to file "{output_file_path}"')
Explanation:
- Text Splitting by Byte Length: The
split_text
function ensures that each chunk is within the 5000-byte limit. - Synthesize and Combine in Memory: The audio chunks are generated and directly appended to an
AudioSegment
object in memory. - Final Export: The combined audio is saved to a single MP3 file.
This approach avoids writing multiple temporary files and directly appends the audio chunks in memory, resulting in a single output file.
Sure, here is a Java implementation that achieves the same functionality as the Python code, using Google Cloud Text-to-Speech. This implementation reads text from a file, chunks it to stay within the 5000-byte limit, synthesizes each chunk, and combines the resulting audio files into one MP3 file.
Java Code:
- First, you need to set up your Java project and include the Google Cloud Text-to-Speech dependencies. Add the following dependencies to your
pom.xml
if you are using Maven:
xml<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-texttospeech</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
</dependencies>
- Create a Java class for the text-to-speech synthesis:
javaimport com.google.cloud.texttospeech.v1.*;
import com.google.protobuf.ByteString;
import org.apache.commons.io.IOUtils;
import javax.sound.sampled.*;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class TextToSpeechSynthesis {
public static void main(String[] args) throws Exception {
String credentialsPath = "path/to/credentials.json";
String textFilePath = "path/to/text_input.txt";
String outputFilePath = "path/to/output.mp3";
// Load credentials and set environment variable
System.setProperty("GOOGLE_APPLICATION_CREDENTIALS", credentialsPath);
// Read text from file
String text = new String(java.nio.file.Files.readAllBytes(java.nio.file.Paths.get(textFilePath)), "UTF-8");
// Split text into chunks based on byte size
List<String> textChunks = splitText(text, 5000);
// Synthesize speech and combine audio files
List<ByteString> audioContents = new ArrayList<>();
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
for (String chunk : textChunks) {
ByteString audioContent = synthesizeSpeech(textToSpeechClient, chunk);
audioContents.add(audioContent);
}
}
// Combine audio files and save to output file
combineAudioFiles(audioContents, outputFilePath);
System.out.println("Audio content written to file \"" + outputFilePath + "\"");
}
private static List<String> splitText(String text, int maxBytes) {
List<String> chunks = new ArrayList<>();
StringBuilder currentChunk = new StringBuilder();
int currentChunkBytes = 0;
for (char c : text.toCharArray()) {
int charBytes = String.valueOf(c).getBytes().length;
if (currentChunkBytes + charBytes > maxBytes) {
chunks.add(currentChunk.toString());
currentChunk = new StringBuilder(String.valueOf(c));
currentChunkBytes = charBytes;
} else {
currentChunk.append(c);
currentChunkBytes += charBytes;
}
}
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString());
}
return chunks;
}
private static ByteString synthesizeSpeech(TextToSpeechClient textToSpeechClient, String text) {
SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();
VoiceSelectionParams voice = VoiceSelectionParams.newBuilder()
.setLanguageCode("te-IN")
.setName("te-IN-Standard-A")
.setSsmlGender(SsmlVoiceGender.FEMALE)
.build();
AudioConfig audioConfig = AudioConfig.newBuilder()
.setAudioEncoding(AudioEncoding.MP3)
.setSpeakingRate(1.0)
.build();
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
return response.getAudioContent();
}
private static void combineAudioFiles(List<ByteString> audioContents, String outputFilePath) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
ByteArrayOutputStream combined = new ByteArrayOutputStream();
for (ByteString audioContent : audioContents) {
combined.write(audioContent.toByteArray());
}
try (FileOutputStream out = new FileOutputStream(outputFilePath)) {
combined.writeTo(out);
}
}
}
Steps to Execute the Code:
Set Up Project:
- Create a new Java project.
- Add the Google Cloud Text-to-Speech and Commons IO dependencies to your project.
Prepare Credentials:
- Download your Google Cloud credentials JSON file and place it in an accessible location.
- Update the
credentialsPath
variable in the code with the path to your credentials file.
Prepare Text File:
- Create a text file with the Telugu text you want to convert to speech.
- Update the
textFilePath
variable in the code with the path to your text file.
Run the Code:
- Compile and run the Java class.
- The synthesized speech will be saved as an MP3 file at the location specified by the
outputFilePath
variable.
This Java code will perform the same steps as the Python code: it reads the text from a file, splits it into manageable chunks, synthesizes speech for each chunk, and combines the resulting audio files into one MP3 file.
Here's the updated Java code with the hard-coded Telugu text, which directly splits the text into manageable chunks, synthesizes speech for each chunk, and combines the resulting audio files into one MP3 file.
Java Code:
javaimport com.google.cloud.texttospeech.v1.*;
import com.google.protobuf.ByteString;
import org.apache.commons.io.IOUtils;
import javax.sound.sampled.*;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class TextToSpeechSynthesis {
public static void main(String[] args) throws Exception {
String credentialsPath = "path/to/credentials.json";
String outputFilePath = "path/to/output.mp3";
// Hard-coded Telugu text
String text = "తీర్పు అనగా తీర్చబడునది అని అర్థము. ఇంకా వివరముగా చెప్పుకొంటే ఒక విషయములో తప్పు ఒప్పులను లెక్కించి, ఏది తప్పో, ఏది ఒప్పో వివరించడమే కాకుండా, ఒప్పుకు అక్షనూ తప్పుకు శిక్షనూ తీర్చి చెప్పు దానిని తీర్పు అంటాము. ఉదాహరణకు న్యాయస్థానమును (కోర్టును) అందరూ చూచియే ఉంటారు. న్యాయస్థానములో (కోర్టులో) ఒక న్యాయమూర్తి (జడ్జి) ఉంటాడు. జడ్జి ముందర ముద్దాయి ఉంటాడు. ముద్దాయి చేసిన పనిలో మంచి చెడులను ముద్దాయి చేతగానీ, ముద్దాయి ఎన్నుకోబడిన న్యాయవాది (అడ్వకేట్) చేతగానీ వివరించబడుతుంది. పూర్తి విచారణ జరిగిన తర్వాత న్యాయమూర్తి (జడ్జి) తనకు తెలిసిన చట్టము ప్రకారము అందులోని తప్పు ఒప్పులను గ్రహించి తీర్పు చెప్పును. బయట కనబడు కోర్టులు మనుషులు తయారు చేసుకొన్నవే, వీటిని ఉదాహరణకు మాత్రము చెప్పాము.\n" +
"\n" +
"వాస్తవానికి మనకు కనిపించని న్యాయస్థానము ఒకటి కలదు. అది అతిపెద్ద న్యాయస్థానము. ఆ న్యాయస్థానములో కుల మత భేదములు లేకుండా అన్ని మతములవారు విచారించబడుదురు. విశేషమేమంటే న్యాయస్థానమూ కనిపించదు, తీర్పు చెప్పు న్యాయమూర్తీ కనిపించడు. కనిపించని న్యాయస్థానములో, కనిపించని దేవుడే న్యాయమూర్తిగా ఉండును. ఏ తీర్పయినా దేవుని ద్వారానే వచ్చును. అందువలన దానిని దేవుని తీర్పు అనవచ్చును. దానినే కొందరు “ది జడ్జిమెంట్ ఆఫ్ గాడ్” అంటారు. కనిపించని న్యాయస్థానములో ఎలా విచారణ జరుగుతుందో, ఎలా తీర్పు చేయబడుతుందో ఎవరికీ తెలియదు. తీర్పులోని శిక్ష ఏమిటి అను విషయము కూడా ఎవరికీ తెలియదు. విచారణగానీ, తీర్పుగానీ, శిక్షగానీ లేక అక్ష గానీ ఎవరికీ తెలియవు. అయితే తీర్పు చేయబడిన శిక్ష మాత్రము తప్పనిసరిగా ఎవరైనా అనుభవించవలసిందే. బయటి కోర్టులో విచారణ, తీర్పు, శిక్ష ముందే తెలియును. ఒక ముద్దాయికి ఉరిశిక్ష విధిస్తే, ఆ ముద్దాయికి అది ముందే తెలిసిపోవును. అలాగే జైలు శిక్షగానీ ముందే తెలియును. అయితే కనిపించని లోపలి కోర్టులో విధించిన శిక్ష అమలు జరుగబోవుటకు ఒక్క క్షణము ముందు కూడా తెలియదు.";
// Load credentials and set environment variable
System.setProperty("GOOGLE_APPLICATION_CREDENTIALS", credentialsPath);
// Split text into chunks based on byte size
List<String> textChunks = splitText(text, 5000);
// Synthesize speech and combine audio files
List<ByteString> audioContents = new ArrayList<>();
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
for (String chunk : textChunks) {
ByteString audioContent = synthesizeSpeech(textToSpeechClient, chunk);
audioContents.add(audioContent);
}
}
// Combine audio files and save to output file
combineAudioFiles(audioContents, outputFilePath);
System.out.println("Audio content written to file \"" + outputFilePath + "\"");
}
private static List<String> splitText(String text, int maxBytes) {
List<String> chunks = new ArrayList<>();
StringBuilder currentChunk = new StringBuilder();
int currentChunkBytes = 0;
for (char c : text.toCharArray()) {
int charBytes = String.valueOf(c).getBytes().length;
if (currentChunkBytes + charBytes > maxBytes) {
chunks.add(currentChunk.toString());
currentChunk = new StringBuilder(String.valueOf(c));
currentChunkBytes = charBytes;
} else {
currentChunk.append(c);
currentChunkBytes += charBytes;
}
}
if (currentChunk.length() > 0) {
chunks.add(currentChunk.toString());
}
return chunks;
}
private static ByteString synthesizeSpeech(TextToSpeechClient textToSpeechClient, String text) {
SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();
VoiceSelectionParams voice = VoiceSelectionParams.newBuilder()
.setLanguageCode("te-IN")
.setName("te-IN-Standard-A")
.setSsmlGender(SsmlVoiceGender.FEMALE)
.build();
AudioConfig audioConfig = AudioConfig.newBuilder()
.setAudioEncoding(AudioEncoding.MP3)
.setSpeakingRate(1.0)
.build();
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
return response.getAudioContent();
}
private static void combineAudioFiles(List<ByteString> audioContents, String outputFilePath) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
ByteArrayOutputStream combined = new ByteArrayOutputStream();
for (ByteString audioContent : audioContents) {
combined.write(audioContent.toByteArray());
}
try (FileOutputStream out = new FileOutputStream(outputFilePath)) {
combined.writeTo(out);
}
}
}
Steps to Execute the Code:
Set Up Project:
- Create a new Java project.
- Add the Google Cloud Text-to-Speech and Commons IO dependencies to your project.
Prepare Credentials:
- Download your Google Cloud credentials JSON file and place it in an accessible location.
- Update the
credentialsPath
variable in the code with the path to your credentials file.
Run the Code:
- Compile and run the Java class.
- The synthesized speech will be saved as an MP3 file at the location specified by the
outputFilePath
variable.
This Java code will read the hard-coded text, split it into manageable chunks, synthesize speech for each chunk, and combine the resulting audio files into one MP3 file.
You can find other available voices by listing them using the Google Cloud Text-to-Speech API. Here's how you can list the available voices using the API:
Java Code to List Available Voices:
javaimport com.google.cloud.texttospeech.v1.ListVoicesRequest;
import com.google.cloud.texttospeech.v1.ListVoicesResponse;
import com.google.cloud.texttospeech.v1.TextToSpeechClient;
import com.google.cloud.texttospeech.v1.Voice;
import java.io.IOException;
public class ListVoices {
public static void main(String[] args) throws IOException {
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
ListVoicesRequest request = ListVoicesRequest.getDefaultInstance();
ListVoicesResponse response = textToSpeechClient.listVoices(request);
for (Voice voice : response.getVoicesList()) {
System.out.println("Name: " + voice.getName());
System.out.println("SSML Gender: " + voice.getSsmlGender());
System.out.println("Natural Sample Rate Hertz: " + voice.getNaturalSampleRateHertz());
System.out.println("Language Codes: " + voice.getLanguageCodesList());
System.out.println();
}
}
}
}
Steps to Execute the Code to List Voices:
Set Up Project:
- Create a new Java project.
- Add the Google Cloud Text-to-Speech dependency to your project.
Prepare Credentials:
- Download your Google Cloud credentials JSON file and place it in an accessible location.
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to your credentials file.
Run the Code:
- Compile and run the Java class.
- The available voices will be printed to the console.
Example Output:
The output will list all available voices along with their language codes, SSML gender, and natural sample rate. For example:
yamlName: te-IN-Standard-A
SSML Gender: FEMALE
Natural Sample Rate Hertz: 240
Google Text-to-Speech Options
Available Voices and Languages
Google Cloud Text-to-Speech offers a wide range of voices and languages. You can list the available voices using the Google Cloud API, which provides details about the supported languages, gender options (male, female, and neutral), and voice names.
Sample Voice Configuration
Here is an example of how you can set up different voice options in a JSON format to be used with the Google Cloud Text-to-Speech API:
json{
"input": {
"text": "Your text here."
},
"voice": {
"languageCode": "te-IN", // Telugu (India)
"name": "te-IN-Standard-A", // Example: Female voice
"ssmlGender": "FEMALE"
},
"audioConfig": {
"audioEncoding": "MP3",
"speakingRate": 1.0, // Default speaking rate
"pitch": 0 // Default pitch
}
}
Available Voices
To get a list of all available voices, you can use the following curl command:
shcurl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
"https://texttospeech.googleapis.com/v1/voices"
This will return a list of all available voices, including their names, language codes, and gender.
Voice Speed and Gender Options
You can customize the voice speed (speaking rate) and pitch according to your needs:
- Speaking Rate: Adjusts the speed at which the text is spoken. The default value is 1.0, and it can range from 0.25 (slower) to 4.0 (faster).
- Pitch: Adjusts the pitch of the voice. The default value is 0.0, and it can range from -20.0 to 20.0.
Example Configuration for Different Voices
Here is how you can specify different configurations:
json{
"input": {
"text": "Your text here."
},
"voice": {
"languageCode": "en-US",
"name": "en-US-Wavenet-D", // Example: Male voice
"ssmlGender": "MALE"
},
"audioConfig": {
"audioEncoding": "MP3",
"speakingRate": 0.9, // Slightly slower
"pitch": -2.0 // Slightly lower pitch
}
}
References
For detailed information on configuring and using the Google Text-to-Speech API, you can refer to the official Google Cloud Text-to-Speech documentation (Unreal Speech) (Qwiklabs).
--------
credentiala.json file , save under
{
"type": "service_account",
"project_id": "visiolink-tts-431014",
"private_key_id": "6e78e82c0622e1d9a5ca6b3d160ced9e77d478e7",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQChV2zK5K2PS+cv\nVWeLQclQDcoTHpvFJ6DFpkjIkbhqfrRabgCuyD1TNa7TZ3kTOyb651xX2dOt9GmM\nNvnFcyzo4GG9hhdWEVZwSESgB9ZRqGqtbMetVqPTYbGZgG/PYaJeig85+a4DXBlJ\nlGeR9kkp0VQjqQtiIjt3nDjYcS+ZxLtcxM+w9Tpi7tzOxVKUgpuxDWfjd6D0HOTk\nQpEfCkC1OxV6ILAclih4jpyqcoi1RrKCNortLiHSYWasTyjTmTnuXmIPEf4AiLIC\n6qkJCXjLFYrkeaBN8igGu6rvoZk0KFU357vpZWNgjEqEAtteoM9rWk6a6Ho7lnIS\nOXhfJPSFAgMBAAECggEABDYCbmc6im9qOMeLXMyaQv8hI4+cdDBdF1uIMcfv3sOw\n1F1joiwywY24oubj95cHJOxN8jMKvdMPwCudKEShyvA0TNyHlmT8cxDHzb92ngn9\nwsoAiislOnqlxoJeKLT9JmeHDzXPj2HjfRKlA1/AuBkxD381f1myYKRQrYGCwwof\nVG16CxoNyLHOFi/t8qz0q6XFkhbTP+hqIRAccDZQIDGVvaX3/smow5/JHrjwIiSB\nrR/97CYXcJWseVXhNU/yiDWuD02Of7lcNNNIHE7sQl3AE73iR72FecHjyDJrH7Uq\ncAYcHIBiVjjMauBqrEp8eZqlNWvG1XJhJrBiT0i1nwKBgQDVyyCR0zWU51dUCBgh\nDs6TSpBi9DDfnGywnD4vE5bwM0h5Un5VXZvVagdiSjGHMV7KY+ruHtImynL2ndoa\nhQ70+GRYUX1u+LQcqLQe4SPVocezHq68U1UgZgpCLVNeG1/zIMMrjrFWyk2PcA2G\nqBe9rQbE/lU+P4Dt0oTTQ+IYBwKBgQDBMW/J8Jd6kSFVJKm7H1j8Qxn75pyJaIA7\nPno+Ogh6j+jkQgS8qaCrBETQyXV8E++3YhFi3Xkm2yo4OwEPU2GCX9UH30/khX6J\ngE3llkTjZhDwK0U8yS3fQGeLUYD+Y6VqN/yogiPydGaYZ5IFgoDfcWGPI6Yg79rC\nRjX0ZzZ0EwKBgQCE3zDH0Me9WqZIwJsWeJhkrHPePDvfzd0zSnfPVFJ7yLOjS2Te\n80W1tqcOLGRpzIJIMR9CjP85Qpt+sI5r6wRntbTPux3FCWcPZL5FbtGRpXFy3n/k\n5bnDYAfCJ19n3oWcmmESQiXvdd4MZqzRYskWhmHtrtAV1uvvYuZQUyFCowKBgCPi\nV95NhwVoIaXb+n7cUr4AhLoKWrcXJwJaF4CpbNu/EUYckR6t7z70uweA9AaI+XS5\njXssog06ljv9kbPUC/+YZ0xgCUnsFdXadB2Z7M/ELI8/qkuhHpkiMFlrWSdnQNla\n7t4p1h15y3ixfCjz1ZYrTDC3MWtawepl+i2cYjvrAoGASmCNzIRr8Wv4q56UxlGY\nZ1KIqyCT9jCHlGZfISNpqPKqbRwqdPog0hMOUo7T1zgPg+9mODP5NseKT5E4MRHW\n8WmmEYrqJRcgdPHn5kTJbr8bL0/ellP2wZdoyFaKEwuYD5SIiARorkbi3xsjmAmf\noaNU3SCo9uVccigM3vd+/oE=\n-----END PRIVATE KEY-----\n",
"client_email": "visiolink-tts-test@visiolink-tts-431014.iam.gserviceaccount.com",
"client_id": "100989289806907241509",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/visiolink-tts-test%40visiolink-tts-431014.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}

--------------
- Get link
- X
- Other Apps