Sources

Every text, dictionary, parser, and model the project depends on, with its license and the role it plays. The canonical, machine- parseable copy is ATTRIBUTION.md in the repository; this page mirrors it for readers.

Sanskrit primary sources

GRETIL — Göttingen Register of Electronic Texts in Indian Languages

gretil.sub.uni-goettingen.de

License: per-file, mostly CC-BY 4.0; some files carry stricter terms parsed from the header at ingestion.

Role: primary source for the bulk of the canon. Per-text source revision is recorded in texts.source_revision.

Muktabodha Indological Research Institute (MIRI)

muktabodha.org · library: muktalib7.com

License: written redistribution permission pending. No Muktabodha-derived text is published in our dataset or upstreamed to Ambuda until permission lands. For reader display we operate under Muktabodha's scholarly-use terms.

Role: primary source for Kashmir Shaivism / Trika / Kaula texts not covered elsewhere.

sanskritdocuments.org

sanskritdocuments.org

License: non-commercial scholarly use; redistribution requires per-text permission.

Role: fallback source for texts not covered by GRETIL or Muktabodha. Per-text volunteer transcriber credits preserved in texts.attribution_html.

Wikisource (Sanskrit)

sa.wikisource.org

License: CC-BY-SA 4.0 + GFDL.

Role: license-clean fallback source.

SARIT — South Asian Resources for Indic Texts

sarit.indology.info

License: CC-BY-SA per text, with TEI-XML markup preserved upstream.

Role: reference TEI structure and selected texts.

Public-domain English translations

These are passed to the LLM-as-judge as reference signals only, not as ground truth. See methodology. All translators below either died before 1956 or their cited works are unambiguously pre-1930 US-public-domain as of 2026.

John Woodroffe (Arthur Avalon)

The Great Liberation (Mahānirvāṇa Tantra, 1913); Principles of Tantra (1914–1916); Śakti and Śākta (1918); The Garland of Letters (1922); Karpūrādi Stotra.

Ralph T. H. Griffith

Hymns of the Rigveda (1889–1896); White Yajurveda (1899); Sāmaveda (1893); Atharvaveda (1895–1896).

George Thibaut

Brahma Sūtra with Śaṅkara Bhāṣya (Sacred Books of the East 34, 38; 1890–1896).

Max Müller

Principal Upaniṣads (Sacred Books of the East 1, 15; 1879, 1884).

W. D. Whitney

Atharvaveda (1905).

J. H. Woods

Yoga Sūtras of Patañjali (Harvard Oriental Series 17, 1914).

Computational resources

Vidyut

github.com/ambuda-org/vidyut · Rust toolkit from the Ambuda project.

License: MIT / Apache-2.0.

Role: Sanskrit morphological segmentation — lemmas, case, number, gender, sandhi splits.

DCS — Digital Corpus of Sanskrit

sanskrit-linguistics.org/dcs

License: CC-BY 3.0. Cite Hellwig, Oliver (2010–present).

Role: lemmatised, POS-tagged corpus used where coverage exists.

Cologne C-SALT Sanskrit Dictionaries

cceh.github.io/c-salt_sanskrit_data

License: CC-BY-SA.

Role: Monier-Williams, Apte, and 20+ other Sanskrit dictionaries via REST and GraphQL. Provides per-lemma glosses fed to the translator and the judge.

Skrutable

github.com/tylergneill/skrutable

License: MIT.

Role: meter identification per verse (anuṣṭubh, triṣṭubh, etc.) feeding the meter tag in verses.meter.

Sanscript.js

github.com/indic-transliteration/sanscript.js

License: MIT.

Role: client-side script conversion between Devanāgarī, IAST, SLP1, and other Indic scripts.

Aksharamukha

aksharamukha.com

License: MIT / AGPL per component.

Role: server-side script conversion (~120 scripts) for build-time transliterations.

Translation models

Anthropic Claude Sonnet

Used for translation generation. The exact model and version is recorded per row in translations.model and translations.model_version; the prompt hash is recorded in translations.prompt_version.

OpenAI text-embedding-3-large

Used to embed verses and glosses for the semantic-search index (search workstream, V1.x).

Last revised: 2026-05-31 · Source: ATTRIBUTION.md.