Split PDF files into individual slide images. Requires poppler-utils installed (brew install poppler on macOS or apt-get install poppler-utils on Ubuntu).
Split a PDF file into individual slide images using poppler’s pdftoppm.
For example, you can split the NewFrontiersInIR.pdf file into individual slide images:
# Split NewFrontiersInIR.pdf into individual slidesoutput_folder ="slides_output"image_files = pdf2imgs("NewFrontiersInIR.pdf", output_dir=output_folder)# Show number of slides createdprint(f"Created {len(image_files)} slide images in {output_folder}/")
Created 65 slide images in slides_output/
!rm -rf slides_output/
Gather Context From Webpages
I often want to gather context from a set of web pages.
Here is a one-sentence summary for each slide:
1. This is a title slide introducing "New Frontiers in IR: Instruction Following and Reasoning" by Orion Weller from Johns Hopkins.
2. This slide shows a screenshot of a ChatGPT-like interface with a "Search" button, implying a new way of interacting
Annotated Posts From Talk
Create comprehensive annotated blog posts from talks, supporting both YouTube videos and local MP4 files.
# Example with YouTube URLpost = generate_annotated_talk_post(slide_path='context_rot/context_rot.pdf', video_source='context_rot/context_rot.mp4', # Can also be 'path/to/video.mp4' image_dir='context_rot/context_rot_imgs', transcript_path='context_rot/transcript.txt')