yt

Utilities for Content Creation From YouTube

YouTube Chapter Creation

Automate pesky chapter creation + description


source

yt_chapters

 yt_chapters (link)

Generate YoutTube Summary and Chapters From A Public Video.

This is what it looks like for Antoine’s Late Interaction Talk:

chp = yt_chapters("https://youtu.be/1x3k0V2IITo")
print(chp)
In this presentation, Antoine Chaffin explains the inherent limitations of single-vector search, such as information loss from pooling and poor out-of-domain performance, and introduces late interaction (multi-vector) models as a superior solution. He demonstrates how these models excel in long-context and reasoning-intensive tasks and presents the PyLate library to make training and evaluating these powerful models more accessible.

00:00 - Introduction
00:32 - About Me
01:40 - Dense (Single) Vector Search Explained
03:08 - Single Vector Models: The Go-To for RAG
03:55 - Performance Evaluation & MTEB Leaderboard
04:17 - The BEIR Benchmark & Goodhart's Law
05:36 - Limitations Beyond Benchmarks: The Long Context Problem
06:33 - Limitations Beyond Benchmarks: Reasoning-Intensive Retrieval
07:50 - The Role of BM25
08:24 - Pooling: The Intrinsic Flaw of Dense Models
11:32 - Replacing Pooling with Late Interaction
12:17 - Why Not Just Use a Bigger Single Vector?
13:51 - Late Interaction: A Simple, Yet Effective, Difference
16:48 - Interpretability: A Nice Little Bonus
17:42 - Why Are People Still Using Dense Models?
18:43 - PyLate: Extending Sentence Transformers for Multi-Vector Models
21:28 - Training is Cool, Show Me the Evals
22:49 - What Are the Future Avenues?
24:36 - Conclusion & QR Codes
25:52 - Q&A: Latency of Late Interaction vs. Dense Vector Models
31:00 - Q&A: Does Fine-Tuning Close the Performance Gap?
33:20 - Q&A: How Easy is it to Fine-Tune with PyLate?
34:22 - Q&A: Common Mistakes When Moving from Single to Multi-Vector?

Fetch YouTube Transcript

Fetch the youtube transcript from public videos.


source

transcribe

 transcribe (url, seconds_only=False)

Download YouTube transcript.

t = transcribe("https://youtu.be/1x3k0V2IITo")
print(t[:500])
[00:00:00] Hello everyone, my name is Chapan and I
[00:00:02] am a research engineer at Leighton and
[00:00:05] today I will detail some of the limits
[00:00:08] of single vector search that have been
[00:00:10] highlighted by recent usages and
[00:00:13] evaluations and then I will introduce
[00:00:16] multi vector models also known as late
[00:00:18] interaction models and how they can
[00:00:21] overcome this and to finish I will
[00:00:24] briefly present the pilot library that
[00:00:26] al