Published on

Beyond Text: Integrating Images and Videos into NotebookLM

"Doesn't NotebookLM only work with text?" Many people think so and give up on using image or video materials with it. But by leveraging various methods to convert images and videos into text, you can integrate multimedia content into NotebookLM's knowledge base. This post covers how to extract text from images using OCR, convert video transcripts, and connect with multimodal AI tools.


Table of Contents

  1. Current Multimedia Support in NotebookLM
  2. Converting Image Sources to Text
  3. Integrating Video Sources into NotebookLM
  4. Connecting with Gemini's Multimodal Capabilities
  5. Multimedia Analysis Scenarios for the Classroom

Current Multimedia Support in NotebookLM

Officially Supported Sources

NotebookLM can directly process the following source types:

  • PDFs (with a text layer)
  • Google Docs / Slides
  • Webpage URLs
  • YouTube video links (caption-based)
  • Directly entered text

Limitations with Images and Video

Pure image files (PNG, JPG) and videos without captions cannot be added as sources directly. PDFs that contain only images also cannot be read as text. Getting around these limitations is the core of this post.

The Basic Principle of the Workaround

The key idea is simple: convert images and videos to text first, then upload to NotebookLM. A variety of tools are used in this preprocessing step.


Converting Image Sources to Text

Processing Scanned Images with OCR

Lesson materials or textbooks scanned as image PDFs require OCR (Optical Character Recognition) to extract the text.

Free tools:

  • Adobe Acrobat Reader (free version): Open the PDF and use the "Recognize Text" feature
  • iLovePDF (web-based): Offers free PDF OCR functionality
  • Google Drive: Upload an image file or scanned PDF to Google Drive, then open it with Google Docs for automatic OCR processing

Step-by-step method (using Google Drive):

  1. Upload the scanned PDF to Google Drive
  2. Right-click the file → "Open with" → "Google Docs"
  3. Once the text is extracted in Docs, review and save it
  4. Add that Google Docs file as a NotebookLM source

Generating Text Descriptions of Charts and Graphs

To analyze statistical graphs or educational data visualizations, first use the image analysis feature in Gemini or ChatGPT.

Steps:

  1. Upload the image to Gemini
  2. Ask: "Convert the data in this graph into a text table. Include all values and labels."
  3. Save the generated text in Google Docs
  4. Add it as a NotebookLM source

Analyzing Infographics

The same method works when analyzing infographics from government agencies or research institutions.

  1. Upload the infographic image to Gemini
  2. Ask: "Organize all the text and key content in this infographic in a structured format."
  3. Add the converted text as a NotebookLM source

Integrating Video Sources into NotebookLM

Using YouTube Captions (the Easiest Method)

YouTube videos can be added to NotebookLM as a source using just the link — as long as they have captions.

  • Korean-captioned educational content, EBS lectures, TED videos with Korean subtitles
  • English videos with Korean captions are also recognized
  • Auto-generated captions work but may have lower accuracy

How to check caption quality: Open the video on YouTube and click the caption button (CC) to preview the captions beforehand.

Using Tools to Extract Video Transcripts

Videos without captions require a separate tool to extract the script.

Recommended tools:

  • Tactiq (Chrome extension): Extracts captions in real time from YouTube, Zoom, and Meet sessions
  • Otter.ai: Converts English video audio to text (free tier: 300 minutes/month)
  • Clova Note (Naver): Specialized for Korean speech-to-text conversion

Steps:

  1. Convert video audio to text using Clova Note or Otter.ai
  2. Organize the converted text in Google Docs
  3. Add it as a NotebookLM source

Using Recorded Training and Meeting Videos

Recorded teacher professional development sessions or staff meetings can be integrated into NotebookLM the same way. Once training content is converted to text, you can quickly search for specific information whenever you need it later.


Connecting with Gemini's Multimodal Capabilities

Gemini's Role: Multimedia-to-Text Converter

Gemini is a multimodal AI that can directly analyze images, videos, and audio files. The most efficient approach is to use Gemini as a preprocessing tool for NotebookLM.

Workflow:

Multimedia source → Gemini (analyze and convert to text)Save in Google DocsAdd as NotebookLM source

Example: Analyzing a Classroom Video

When analyzing a video of your own teaching:

  1. Upload the video to Gemini (Gemini Advanced required)
  2. Ask: "Classify the types of questions the teacher asks in this lesson video and summarize the student response patterns."
  3. Export the analysis results to Google Docs
  4. Add them as a NotebookLM source and compare with other lesson records

Multimedia Analysis Scenarios for the Classroom

Scenario 1: Comprehensive Analysis of Educational YouTube Channels

Integrate and analyze 10 videos from EBS or education-related YouTube channels in NotebookLM.

  1. Add each video's YouTube URL as a source
  2. Ask: "What teaching methodology is most commonly emphasized across these videos?"
  3. Ask: "Pull out one idea from each video that I can apply directly in my lessons."

Scenario 2: Systematic Management of Educational Material Images

Systematically organize policy infographics from education agencies and academic achievement analysis graphs.

  1. Convert the images to text using Gemini
  2. Save them with clear titles like "Korea PISA 2022 Results Graph"
  3. Upload them to NotebookLM for year-by-year and category-by-category comparative analysis

Scenario 3: Professional Development Video Archive

Build a database of training content by converting professional development videos to text.

  1. Convert the audio from training videos using Clova Note
  2. Organize the content by topic and upload to NotebookLM
  3. Search: "Summarize all content related to school violence prevention from the training sessions I attended this year."

Integrating multimedia sources into NotebookLM may feel somewhat cumbersome at first. But once you have built the system, you can consolidate knowledge from text, video, and images into a single library. What makes this approach particularly powerful is the ability to cross-analyze materials in completely different formats within the same context.

Is there multimedia material you would like to integrate into NotebookLM? Which format — video, image, or audio — do you find most limiting? Let us know in the comments and we can explore solutions together.


Related Posts

Beyond Text: Integrating Images and Videos into NotebookLM | MINSSAM.COM