Vaibhav Gurunathan

StudyStream

StudyStream was my hackathon submission for MHacks 2024.

The frontend takes in an input either as a file or as copy/pasted
text and a length for the video. It then sends the text and type of input to the backend which does all the heavy lifting. First it splits up the text to dedicate
a section to each concept based on the length of the video requested. Then it finds the best image for each concept. It scrapes google images and can compare as many images as you want. Then it finds which image is most relevant to the text. To do this, it converts the image to text with a computer vision model. Then it puts all these descriptions in a vector db and then queries the vector db to find the closest match to the request. After it has the correct image and description, it uses all this input to generate the audio with a text to speech conversion. It also creates bullet points to concisely communicate the material. It combines the images and text and stitches this
with the audio to create a video. It does this for each chunk and finally combines everything together for you final video.

Here is a small demo video for the product:

https://www.youtube.com/watch?v=7_QqKZyKKbo

I will edit this to make it look much better when I get the chance.