Documentation

Complete platform guide. See also the FAQ.

Quick start

Three steps to start your analysis

1

Upload

Drop a CSV, Excel, XML, or PDF file on the upload page. Choose the columns containing the text or the pages to analyze.

2

Explore

Find your corpora in the library. Open them to browse sentences, lexicon, and analyses.

3

Analyze

Concordances, co-occurrences, n-grams, patterns, keywords, named entities... Everything is accessible from the document page.

Formats & import

Supported file types and import process

CSV

Comma or semicolon-separated text files. Column preview with content examples. Choose one or more text columns to analyze.

XLSX / XLS

Excel files. Same column selection interface as CSV. All sheets are read.

XML

Frantext export or structured XML. Columns are extracted automatically from tags. Compatible with standard Frantext exports.

PDF

PDF documents. Built-in page browser with 2-page spread (1 page on mobile). Select start page, end page, or maximum number of sentences.

Import process

Import is a two-step process. First, drop your file: the platform detects the type and shows configuration options (columns for tabular files, pages for PDFs). Then start processing: the NLP engine analyzes each sentence with two specialized models (fr_dep_news_trf for grammar, fr_core_news_md for named entities). Processing runs in the background, and you can track its progress in real time.

If you close the page during processing, the document appears in the library with a "Resume" button to reopen the configuration.

PDF options

OptionDescription
Start pageFirst page to analyze (required)
End pageLast page to analyze. If set, max sentences is ignored
Max sentencesMaximum sentences to extract (default: 1,000, max: 20,000). Ignored if end page is set

Generated linguistic annotations

OptionDescription
LemmaCanonical form of the word (e.g. "eating" -> "eat")
POSPart-of-speech tag (17 Universal Dependencies tags: NOUN, VERB, ADJ, ADV, DET, etc.)
DependencySyntactic relation to head word (nsubj, obj, det, ROOT, etc.)
MorphologyMorphological features (Gender, Number, Tense, Mood, Person, etc.)
EntityNamed entity type: PER (person), LOC (location), ORG (organization), MISC (miscellaneous)

Explore corpus

Browse and explore your text

Analysis

Advanced linguistic analysis tools

Manual annotations

Classify and annotate occurrences in the concordancer

Saved searches

Save and find your searches

Filters & sorting

Refine your results with filters and sorting

CSV Export

Download your data in one click

Every page offers a CSV export button (download icon). Files are UTF-8 encoded with BOM for direct opening in Excel without accent issues. Headers are translated based on the interface language (French or English). Active filters (POS, morphology, metadata) are respected in all exports.

Sentences

Full sentence list with numbers. Columns: #, Sentence

Lexicon

All lemmas with POS, frequency, per-million, and most common form. Columns: Lemma, POS, Frequency, Per million, Example

Concordances

Each occurrence with context. Columns: #, Left context, Word, Right context, POS + annotation columns at their position

Entities

All détected entities. Columns: Entity, Type, Frequency, Per million

Patterns

Matching sentences. Columns: Left context, Pattern, Right context, Sentence, Distance

Keywords

Corpus-specific words. Columns: Lemma, POS, Corpus freq, Corpus per million, Reference per million, Keyness (G2), Effect size, Direction

Interface

Navigation, languages, and display

Best practices

Tips to get the most out of the platform

  • 1Start with a small corpus (a few hundred sentences) to validate your pipeline before importing the full file.
  • 2For PDFs, adjust the start page to skip title pages and tables of contents.
  • 3Use multi-POS filtering to compare nouns and verbs within the same corpus.
  • 4The concordance view is ideal for observing collocations and recurring syntactic patterns. Sort by KWIC to spot them.
  • 5Use n-grams to find recurring expressions, then pattern search for more complex séquences.
  • 6Keywords show what makes your corpus unique compared to general French. Compare overreprésented and underreprésented words.
  • 7If your file has author or date columns, metadata filters let you compare sub-corpora without re-importing.
  • 8Add annotation columns in the concordancer (gear icon) to classify occurrences by your own research criteria.
  • 9For co-occurrences, start with T-score (frequent associations) then refine with G2 (reliable even for low frequencies).
  • 10All CSV exports respect your active filters and are available in French and English. Annotation columns are included.
  • 11Save your important searches (clock icon) to find them quickly. History automatically keeps your last 20 searches.
  • 12Copy any page address to share your exact view with a colleague. All filters are in the URL.
  • 13NLP processing uses two specialized models: a transformer for grammar (POS, dependencies, morphology) and a statistical model for named entities.