Speech recognition is an interesting task that allows you to recognize the text behind the audio. With the use of voice recognition, we can also extract text from a video with python. In this article, I will walk you through how to extract text from videos using Python.
SpeechRecognition is a Python library for performing speech recognition with support for Google’s API, while moviepy allows to cut, read, and write all the most common audio and video formats. Moreover, moviepy supports various file format: .ogv, .mp4, .mpeg, .avi, .mov.
Extract Text From Videos using Python
In this section, I will take you through how to extract text from a video using Python. The first step is to download a video. After downloading the videos you need to install two Python libraries:
- SpeechRecognition:
pip install SpeechRecognition
 - moviepy:
pip install moviepy
After installing the above two Python libraries you can start with coding. Here is the complete Python program to convert a video into the text:
import speech_recognition as sr
import moviepy.editor as mp
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip
num_seconds_video= 52*60
print("The video is {} seconds".format(num_seconds_video))
l=list(range(0,num_seconds_video+1,60))
diz={}
for i in range(len(l)-1):
ffmpeg_extract_subclip("videorl.mp4", l[i]-2*(l[i]!=0), l[i+1], targetname="chunks/cut{}.mp4".format(i+1))
clip = mp.VideoFileClip(r"chunks/cut{}.mp4".format(i+1))
clip.audio.write_audiofile(r"converted/converted{}.wav".format(i+1))
r = sr.Recognizer()
audio = sr.AudioFile("converted/converted{}.wav".format(i+1))
with audio as source:
r.adjust_for_ambient_noise(source)
audio_file = r.record(source)
result = r.recognize_google(audio_file)
diz['chunk{}'.format(i+1)]=result
After executing the above Python code you need to create a text document to store all the text that has been extracted from the video:
l_chunks=[diz['chunk{}'.format(i+1)] for i in range(len(diz))]
text='\n'.join(l_chunks)
with open('recognized.txt',mode ='w') as file:
file.write("Recognized Speech:")
file.write("\n")
file.write(text)
print("Finally ready!")