Three steps to start your analysis
Drop a CSV, Excel, XML, or PDF file on the upload page. Choose the columns containing the text or the pages to analyze.
Find your corpora in the library. Open them to browse sentences, lexicon, and analyses.
Concordances, co-occurrences, n-grams, patterns, keywords, named entities... Everything is accessible from the document page.
Supported file types and import process
Comma or semicolon-separated text files. Column preview with content examples. Choose one or more text columns to analyze.
Excel files. Same column selection interface as CSV. All sheets are read.
Frantext export or structured XML. Columns are extracted automatically from tags. Compatible with standard Frantext exports.
PDF documents. Built-in page browser with 2-page spread (1 page on mobile). Select start page, end page, or maximum number of sentences.
Import process
Import is a two-step process. First, drop your file: the platform detects the type and shows configuration options (columns for tabular files, pages for PDFs). Then start processing: the NLP engine analyzes each sentence with two specialized models (fr_dep_news_trf for grammar, fr_core_news_md for named entities). Processing runs in the background, and you can track its progress in real time.
If you close the page during processing, the document appears in the library with a "Resume" button to reopen the configuration.
PDF options
| Option | Description |
|---|---|
| Start page | First page to analyze (required) |
| End page | Last page to analyze. If set, max sentences is ignored |
| Max sentences | Maximum sentences to extract (default: 1,000, max: 20,000). Ignored if end page is set |
Generated linguistic annotations
| Option | Description |
|---|---|
| Lemma | Canonical form of the word (e.g. "eating" -> "eat") |
| POS | Part-of-speech tag (17 Universal Dependencies tags: NOUN, VERB, ADJ, ADV, DET, etc.) |
| Dependency | Syntactic relation to head word (nsubj, obj, det, ROOT, etc.) |
| Morphology | Morphological features (Gender, Number, Tense, Mood, Person, etc.) |
| Entity | Named entity type: PER (person), LOC (location), ORG (organization), MISC (miscellaneous) |
Browse and explore your text
Find exactly what you're looking for in your corpus
Advanced linguistic analysis tools
Classify and annotate occurrences in the concordancer
Save and find your searches
Refine your results with filters and sorting
Download your data in one click
Every page offers a CSV export button (download icon). Files are UTF-8 encoded with BOM for direct opening in Excel without accent issues. Headers are translated based on the interface language (French or English). Active filters (POS, morphology, metadata) are respected in all exports.
Full sentence list with numbers. Columns: #, Sentence
All lemmas with POS, frequency, per-million, and most common form. Columns: Lemma, POS, Frequency, Per million, Example
Each occurrence with context. Columns: #, Left context, Word, Right context, POS + annotation columns at their position
All détected entities. Columns: Entity, Type, Frequency, Per million
Matching sentences. Columns: Left context, Pattern, Right context, Sentence, Distance
Corpus-specific words. Columns: Lemma, POS, Corpus freq, Corpus per million, Reference per million, Keyness (G2), Effect size, Direction
Navigation, languages, and display
Tips to get the most out of the platform