Split PDF files into individual slide images. Requires poppler-utils installed (brew install poppler on macOS or apt-get install poppler-utils on Ubuntu).
Split a PDF file into individual slide images using poppler’s pdftoppm.
For example, you can split the NewFrontiersInIR.pdf file into individual slide images:
# Split NewFrontiersInIR.pdf into individual slidesoutput_folder ="slides_output"image_files = pdf2imgs("NewFrontiersInIR.pdf", output_dir=output_folder)# Show number of slides createdprint(f"Created {len(image_files)} slide images in {output_folder}/")
Created 65 slide images in slides_output/
!rm -rf slides_output/
Gather Context From Webpages
I often want to gather context from a set of web pages.
<examples>
<example-1>
Title:
URL Source: https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p1-intro.md
Markdown Content:
---
title: "P1: I don't use RAG, I just retrieve documents"
description: "Ben Clavié's introduction to advanced retrieval techniques"
image: p1-images/slide_12.png
date: 2025-06-25
---
As part of our [LLM Evals course](https://bit.ly/evals-ai){target="_blank"}, I hosted [Benjamin Clavié](https://ben.clavie.eu/){target="_blank"} to kick
Here is a one-sentence summary for each slide:
1. This slide introduces the presentation "New Frontiers in IR: Instruction Following and Reasoning" by Orion Weller from Johns Hopkins Whiting School of Engineering.
2. This slide shows a "Message ChatGPT" interface with a prominent "Search" button,