Summary Summarize Podcasts with Indexify: Leveraging Whisper, BART, and FastRecursiveTextSplitter (Youtube) www.youtube.com
493 words - YouTube video - View YouTube video
n/a Hello, everyone. So today, we'll see how you can achieve summarization tasks from audio files at scale with the help of Indexify. So to get started, we'll go to the getting started guide. And, if you are new, you just need to download the indexify binary. However, I have already downloaded it, so I'll move to the second step.
n/a I need to start the indexify service. So in my terminal, I just paste this command. And within seconds, you'll have your indexify server up and running. Right? Now we'll jump to our projects.
n/a Yep. So this is a test upload file that I've created in Python. So what this does is it starts with an extraction policy. Here we have the whisper ASR extraction policy. This transcribes We are naming We are naming that summarization and getting the content source from audio transcription.
n/a So in a way, we are chaining these 2 extractors to work together. Right? We also have the monitored directory code that basically monitors the directory for any file change. And, whenever a file change is observed, it does add that file to the Indexify device server. Right?
n/a So I just pasted the part of this folder since we'll be monitoring this folder here. We'll click save. So we'll move ahead with downloading the whisper ASR extractor first. It's available in the hub. I can just download it like this.
n/a Also, we'll be downloading the summarization extractor. Yep. Both downloads are complete and here you can see clear instructions on how you can start these extractors. We'll copy this. Paste.
n/a Remove the unnecessary white spaces. Right. So the VISPA razor extractor is up and running. We'll do the same for the summarization extractor. Remove the unnecessary white spaces.
n/a Note that our summarization extractor uses our, new chunking algorithm that achieves faster as well as more coherent chunks in order to give far better quality, some of the risk. So as we can see, both these extractors are up and running. I'll just start my Python test upload script. I guess that is it, it has started. So, now, I have this all in podcast m p 3 file.
n/a Just drop this in m p 3 files folder. Move, and you can see new MP 3 file detected. Now let's go to our UI to check out. Local host UI. You can see we have audio transcription and summarization chain.
n/a In audio transcription, you have a 1 hour 20 minute audio of the podcast. In summarization, you have this entire transcript as, input from the whisper ASR extractor. And if you go to search, you can just go to this, and here you go. Yes. Yes.
n/a Summarize. And it has summarized from 15,245 words to just 745 words in a matter of few seconds. You can see the summary ends pretty well, and there is no abrupt ending for the chunks. So that's it. Hope you like this video.