Gradio

How it Works:

Step 1. 📥 A video URL. Step 2. 🔄 Process Video: Download the video and its captions/subtitles from YouTube OR generate captions using Whisper AI. The system will load the video in video player for preview and process the video and extract frames from it. It will then pass the captions and images to the RAG model to store them in the database. The RAG (Lance DB) uses a pre-trained BridgeTower model to generate embeddings that provide pairs of captions and related images. Step 3. 🤖 Analyze video content through:

AI-powered Q&A - Use this functionality to ask questions about the video content. Our system will use the Meta/LLaMA model to analyze the captions and images and provide detailed answers. Step 4. 📊 Results will be displayed in the response section with related images.

Note: Initial processing takes several minutes. Please be patient and monitor the logs for progress updates.