Skip to main content

Document Lineage

What this page is for

Browse documents within a collection, grouped by source, with quality aggregation and lineage metadata.

Endpoint

  • GET /api/collections/:collectionId/documents

How it works

Every chunk stores its source document metadata in the doc.* payload fields. The documents endpoint aggregates chunks by source_id to provide a document-level view of your collection, including chunk counts and average quality scores.

Document metadata

Each document summary includes:

FieldDescription
sourceIdUnique identifier (MD5 of source URL)
titleDocument title
sourceTypeweb, manual, or file
sourceUrlOriginal source URL
filenameOriginal filename (file uploads only)
mimeTypeFile MIME type (file uploads only)
chunkCountNumber of chunks from this document
avgQualityScoreAverage quality score across chunks
ingestDateWhen the document was first ingested
lastModifiedAtMost recent modification timestamp

Example

curl http://localhost:3000/api/collections/<collectionId>/documents \
-H "X-User-ID: user-1"

Response

{
"documents": [
{
"sourceId": "a1b2c3d4",
"title": "API Authentication Guide",
"sourceType": "web",
"sourceUrl": "https://example.com/api-auth-guide",
"filename": null,
"mimeType": null,
"chunkCount": 8,
"avgQualityScore": 85.5,
"ingestDate": "2025-01-10T09:00:00.000Z",
"lastModifiedAt": "2025-01-15T14:30:00.000Z"
}
],
"total": 1
}

Verify

  • Response lists all documents in the collection grouped by source.
  • chunkCount matches the actual number of chunks per document.
  • avgQualityScore reflects quality scores set via the collection editor.

Next steps