Writing Utils

Various utilities to help with writing

PDF to Images

Split PDF files into individual slide images. Requires poppler-utils installed (brew install poppler on macOS or apt-get install poppler-utils on Ubuntu).


source

pdf2imgs

 pdf2imgs (pdf_path, output_dir='.', prefix='slide')

Split a PDF file into individual slide images using poppler’s pdftoppm.

For example, you can split the NewFrontiersInIR.pdf file into individual slide images:

# Split NewFrontiersInIR.pdf into individual slides
output_folder = "slides_output"
image_files = pdf2imgs("NewFrontiersInIR.pdf", output_dir=output_folder)

# Show number of slides created
print(f"Created {len(image_files)} slide images in {output_folder}/")
Created 65 slide images in slides_output/
!rm -rf slides_output/

Gather Context From Webpages

I often want to gather context from a set of web pages.


source

gather_urls

 gather_urls (urls, tag='example')

Gather contents from URLs.


source

jina_get

 jina_get (url)

Get a website as md with Jina.

For example, these are what I might use as context for annotated posts

_annotated_post_content = gather_urls(_annotated_post_urls)
print(_annotated_post_content[:500])
<examples>
<example-1>
Title: 

URL Source: https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/evals/inspect.qmdhttps:/raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p1-intro.md

Warning: Target URL returned error 404: Not Found

Markdown Content:
404: Not Found

</example-0>
<example-2>
Title: 

URL Source: https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p2-evals.md

Markdown Content:
---
title: 

source

outline_slides

 outline_slides (slide_path)
_o = outline_slides('NewFrontiersInIR.pdf')
print(_o[:300])
Here is a one-sentence summary for each slide:

1.  This is a title slide introducing "New Frontiers in IR: Instruction Following and Reasoning" by Orion Weller from Johns Hopkins.
2.  This slide shows a screenshot of a ChatGPT-like interface with a "Search" button, implying a new way of interacting

Annotated Posts From Talk

Create comprehensive annotated blog posts from talks, supporting both YouTube videos and local MP4 files.


source

generate_annotated_talk_post

 generate_annotated_talk_post (slide_path, video_source, image_dir,
                               transcript_path=None, example_urls=['https:
                               //raw.githubusercontent.com/hamelsmu/hamel-
                               site/refs/heads/master/notes/llm/evals/insp
                               ect.qmdhttps://raw.githubusercontent.com/ha
                               melsmu/hamel-site/refs/heads/master/notes/l
                               lm/rag/p1-intro.md', 'https://raw.githubuse
                               rcontent.com/hamelsmu/hamel-site/refs/heads
                               /master/notes/llm/rag/p2-evals.md', 'https:
                               //raw.githubusercontent.com/hamelsmu/hamel-
                               site/refs/heads/master/notes/llm/rag/p3_rea
                               soning.qmd', 'https://raw.githubusercontent
                               .com/hamelsmu/hamel-site/refs/heads/master/
                               notes/llm/rag/p4_late_interaction.qmd', 'ht
                               tps://raw.githubusercontent.com/hamelsmu/ha
                               mel-site/refs/heads/master/notes/llm/rag/p5
                               _map.qmd'])

Assemble the prompt for the annotated post.

Type Default Details
slide_path
video_source YouTube link or local MP4 path
image_dir
transcript_path NoneType None
example_urls L [‘https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/evals/inspect.qmdhttps://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p1-intro.md’, ‘https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p2-evals.md’, ‘https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p3_reasoning.qmd’, ‘https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p4_late_interaction.qmd’, ‘https://raw.githubusercontent.com/hamelsmu/hamel-site/refs/heads/master/notes/llm/rag/p5_map.qmd’]

Example Post

!open context_rot/
# Example with YouTube URL
post = generate_annotated_talk_post(slide_path='context_rot/context_rot.pdf',
                                    video_source='context_rot/context_rot.mp4',  # Can also be 'path/to/video.mp4'
                                    image_dir='context_rot/context_rot_imgs',
                                    transcript_path='context_rot/transcript.txt')
26.67% [8/30 01:21<03:44]
23.33% [7/30 01:11<03:54]
Path('context_rot/context_rot.qmd').write_text(post)
14573

Example with local MP4 video

lesson6 = generate_annotated_talk_post(
    slide_path='eval_course_examples/lesson6.pdf',
    video_source='eval_course_examples/lesson6.mp4',  # Local MP4 file
    image_dir='eval_course_examples/lesson6_images',
    transcript_path='eval_course_examples/lesson6_transcript.txt'  # Optional
)
33.33% [10/30 01:41<03:23]
30.00% [9/30 01:31<03:33]
Path('eval_course_examples/lesson_6.qmd').write_text(lesson6)
20369
!cursor eval_course_examples
!open eval_course_examples/