Loading Data

Before you can start indexing your documents, you need to load them into memory. A reader is a module that loads data from a file into a Document object.

To install readers call:

We offer readers for different file formats.

import { CSVReader } from '@llamaindex/readers/csv';
import { DocxReader } from '@llamaindex/readers/docx';
import { HTMLReader } from '@llamaindex/readers/html';
import { ImageReader } from '@llamaindex/readers/image';
import { JSONReader } from '@llamaindex/readers/json';
import { MarkdownReader } from '@llamaindex/readers/markdown';
import { ObsidianReader } from '@llamaindex/readers/obsidian';
import { PDFReader } from '@llamaindex/readers/pdf';
import { TextFileReader } from '@llamaindex/readers/text';

SimpleDirectoryReader

Open in StackBlitz

LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader class.

It is a simple reader that reads all files from a directory and its subdirectories and delegates the actual reading to the reader specified in the fileExtToReader map.

import { SimpleDirectoryReader } from "@llamaindex/readers/directory";

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");

documents.forEach((doc) => {
  console.log(`document (${doc.id_}):`, doc.getText());
});

Currently, the following readers are mapped to specific file types:

TextFileReader: .txt
PDFReader: .pdf
CSVReader: .csv
MarkdownReader: .md
DocxReader: .docx
HTMLReader: .htm, .html
ImageReader: .jpg, .jpeg, .png, .gif

You can modify the reader three different ways:

overrideReader overrides the reader for all file types, including unsupported ones.
fileExtToReader maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.
defaultReader sets a fallback reader for files with unsupported extensions. By default it is TextFileReader.

SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.

Example

import {
  FILE_EXT_TO_READER,
  SimpleDirectoryReader,
} from "@llamaindex/readers/directory";
import { TextFileReader } from "@llamaindex/readers/text";
import type { Document, Metadata } from "llamaindex";
import { FileReader } from "llamaindex";

class ZipReader extends FileReader {
  loadDataAsContent(fileContent: Uint8Array): Promise<Document<Metadata>[]> {
    throw new Error("Implement me");
  }
}

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
  directoryPath: "../data",
  defaultReader: new TextFileReader(),
  fileExtToReader: {
    ...FILE_EXT_TO_READER,
    zip: new ZipReader(),
  },
});

documents.forEach((doc) => {
  console.log(`document (${doc.id_}):`, doc.getText());
});

Tips when using in non-Node.js environments

When using @llamaindex/readers in a non-Node.js environment (such as Vercel Edge, Cloudflare Workers, etc.) Some classes are not exported from top-level entry file.

The reason is that some classes are only compatible with Node.js runtime, (e.g. PDFReader) which uses Node.js specific APIs (like fs, child_process, crypto).

If you need any of those classes, you have to import them instead directly through their file path in the package.

As the PDFReader is not working with the Edge runtime, here's how to use the SimpleDirectoryReader with the LlamaParseReader to load PDFs:

import { SimpleDirectoryReader } from "@llamaindex/readers/directory";
import { LlamaParseReader } from "@llamaindex/cloud";

export const DATA_DIR = "./data";

export async function getDocuments() {
  const reader = new SimpleDirectoryReader();
  // Load PDFs using LlamaParseReader
  return await reader.loadData({
    directoryPath: DATA_DIR,
    fileExtToReader: {
      pdf: new LlamaParseReader({ resultType: "markdown" }),
    },
  });
}

Note: Reader classes have to be added explicitly to the fileExtToReader map in the Edge version of the SimpleDirectoryReader.

You'll find a complete example with LlamaIndexTS here: https://github.com/run-llama/create_llama_projects/tree/main/nextjs-edge-llamaparse

Load file natively using Node.js Customization Hooks

We have a helper utility to allow you to import a file in Node.js script.

node --import @llamaindex/readers/node ./script.js

import csv from './path/to/data.csv';

const text = csv.getText()

API Reference

SimpleDirectoryReader