Question
#1SA7BY® is a French corpus analysis platform. Upload your texts, explore word-level linguistic annotations, browse sentences in context, and run semantic analysis, all from a single interface.
Question
#2Researchers, professors, doctoral students, and advanced students working in corpus linguistics, textometry, and digital humanities.
Question
#3CSV, Excel (.xlsx), XML (Frantext export), and PDF. For tabular files, choose the columns containing the text. For PDFs, select the first and last page to analyze, or let the maximum phrases setting do the work.
Question
#4After import, the text is split into sentences and automatically analyzed: lemmatization, part-of-speech tagging, dependency parsing. You can track progress in real time. Processing uses a high-accuracy French transformer model.
Question
#5The concordance view shows every occurrence of a word with its left and right context. You can also open a reading mode to see surrounding sentences and better understand how the word is used in the text.
Question
#6Yes. The POS filter supports multi-select: choose for example Nouns + Verbs to only see those categories in the lexicon or concordances. You can also refine by sub-category: verb tenses, gender, number, determiner type (definite, possessive, demonstrative), and more.
Question
#7Enter a French word and the platform combines two sources: WOLF and FastText, to reveal synonyms, hypernyms, hyponyms, and nearest neighbors in an interactive graph.
Question
#8It depends on the corpus size. A 100-page PDF typically takes 2 to 5 minutes. You can browse the platform freely while processing runs in the background.
Question
#9Yes. The analysis workspace is protected by a PIN code. Public pages (home, FAQ, documentation) remain accessible without login.
Question
#10Yes. Your corpora and analyses are private and only accessible with your session. Nothing is shared or made public.
Question
#11The Co-occurrences page shows words that frequently appear with a given word. You can choose from five statistical measures (MI, T-score, Z-score, Dice, Frequency) and adjust the search distance. An info box explains each measure.
Question
#12N-grams reveal recurring word sequences in your corpus (from 2 to 5 words). For example, 'il y a' or 'c'est un'. You can switch between canonical form (lemma) and the form as written.
Question
#13Enter two words and find all sentences where they appear near each other. You can set the maximum distance, enforce an order (A before B or the reverse), and filter by grammatical category for each word.
Question
#14If your file contains columns like author, date, or genre, you can use them to filter your corpus. Filters appear automatically on every analysis page. Select one or more authors, a period, or a genre, and all statistics recalculate instantly. Your filters are saved when you navigate between pages.
Question
#15Yes. In the concordance table, click the gear icon to add your own annotation columns (free text or dropdown). Annotate each occurrence, then export everything as CSV with your annotations.
Question
#16Yes. Every page has a download button that exports data as CSV, compatible with Excel. If filters are active, only filtered data is exported. Annotation columns are included in the concordance export.
Question
#17The word cloud is a visualization in the Lexicon that displays the most frequent words. The bigger a word, the more frequent it is. Colors match grammatical categories. Click a word to see its concordances.
Question
#18Log in, upload a file on the Upload page, then explore the lexicon and sentences in the Library. For deeper analysis, check co-occurrences, n-grams, or run a semantic search.