Skip to content
Back to home

Document Ingestion

Upload and process your documents with state-of-the-art AI. Supports PDFs, DOCX, XLSX, PPTX, HTML, CSV, images (PNG/JPG/GIF/BMP/TIFF/WEBP), and audio files (MP3/WAV/M4A/MP4/WebM).

New to the platform? Check out our comprehensive documentation to learn how to get API keys and optimize your ingestion pipeline.

1. Select Files

Upload documents, images, or audio

Upload Your Documents

Upload documents, images, or audio files to create your vector database

2. Configure

Customize processing options

Pinecone Configuration

Lowercase letters, numbers, and hyphens only

Organize data by namespace (e.g., "public", "internal", "confidential")

OpenAI API Key (Required)

Enhanced pipeline uses: text-embedding-3-large (3072 dimensions) and GPT-4 for responses

💡 Pro Tip: API-powered pipeline uses token-aware ~1000 char chunks with GPT-4o Vision for images and Whisper for audio. All embeddings and AI generation handled automatically!

Enhanced Pipeline Options

⚠️ Advanced Options

WARNING: This will delete the existing index and all its data before ingesting.

⚡ API-Powered Pipeline: All processing uses OpenAI APIs (GPT-4o Vision for images, Whisper for audio, text-embedding-3-large for text). Hybrid search with sparse vectors included.

🚀 API-Powered Pipeline Summary

  • Index: Not set
  • Namespace: default
  • Embeddings: OpenAI text-embedding-3-large (3072 dimensions)
  • Chunk size: ~1000 characters (token-aware)
  • Images: ✅ GPT-4o Vision API (always enabled)
  • Audio: ✅ Whisper API transcription (always enabled)
  • Hybrid search: ✅ Dense + sparse vectors (BM25-style)
  • Architecture: 100% API-based, zero GPU dependencies

Select files above to continue

Multimodal Support

Process PDFs, Word docs, images (PNG/JPG), and audio files (MP3/WAV) with GPT-4o Vision and Whisper

API-Powered Reliability

100% OpenAI API processing ensures consistent, reliable results without local model complexity

Production Ready

Retry logic, error handling, parallel processing, and comprehensive progress tracking