13.7 C
Los Angeles
Thursday, June 8, 2023

There is brand-new Xiaomi Wireless AR Smart Glasses

Xiaomi, the Chinese multinational electronics company, has...

The OnePlus 11R has been revealed to use a Snapdragon 8+ Gen 1 processor

OnePlus is having a big event on February...

Zoom Lays Off 1,300 Employees: A Look at the Impact and Reactions

Zoom, the popular video conferencing platform, has...

How to Extract Text From Videos using Python

PythonProgram ExamplesHow to Extract Text From Videos using Python

Speech recognition is an interesting task that allows you to recognize the text behind the audio. With the use of voice recognition, we can also extract text from a video with python. In this article, I will walk you through how to extract text from videos using Python.

SpeechRecognition is a Python library for performing speech recognition with support for Google’s API, while moviepy allows to cut, read, and write all the most common audio and video formats. Moreover, moviepy supports various file format: .ogv, .mp4, .mpeg, .avi, .mov.

Extract Text From Videos using Python

In this section, I will take you through how to extract text from a video using Python. The first step is to download a video. After downloading the videos you need to install two Python libraries:

  1. SpeechRecognition: pip install SpeechRecognition 
  2. moviepy: pip install moviepy

After installing the above two Python libraries you can start with coding. Here is the complete Python program to convert a video into the text:

import speech_recognition as sr 
import moviepy.editor as mp
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip

num_seconds_video= 52*60
print("The video is {} seconds".format(num_seconds_video))
l=list(range(0,num_seconds_video+1,60))

diz={}
for i in range(len(l)-1):
    ffmpeg_extract_subclip("videorl.mp4", l[i]-2*(l[i]!=0), l[i+1], targetname="chunks/cut{}.mp4".format(i+1))
    clip = mp.VideoFileClip(r"chunks/cut{}.mp4".format(i+1)) 
    clip.audio.write_audiofile(r"converted/converted{}.wav".format(i+1))
    r = sr.Recognizer()
    audio = sr.AudioFile("converted/converted{}.wav".format(i+1))
    with audio as source:
      r.adjust_for_ambient_noise(source)  
      audio_file = r.record(source)
    result = r.recognize_google(audio_file)
    diz['chunk{}'.format(i+1)]=result

After executing the above Python code you need to create a text document to store all the text that has been extracted from the video:

l_chunks=[diz['chunk{}'.format(i+1)] for i in range(len(diz))]
text='\n'.join(l_chunks)

with open('recognized.txt',mode ='w') as file: 
   file.write("Recognized Speech:") 
   file.write("\n") 
   file.write(text) 
   print("Finally ready!")

Check out our other content

Check out other tags:

Most Popular Articles