FAQ

Everything you need to get started. See also the Documentation.
  • Question

    #1

    What is SA7BY® used for?

    SA7BY® is a French corpus analysis platform. Upload your texts, explore word-level linguistic annotations, browse sentences in context, and run semantic analysis, all from a single interface.

  • Question

    #2

    Who is SA7BY® for?

    Researchers, professors, doctoral students, and advanced students working in corpus linguistics, textometry, and digital humanities.

  • Question

    #3

    Which file formats can I import?

    CSV, Excel (.xlsx), XML (Frantext export), and PDF. For tabular files, choose the columns containing the text. For PDFs, select the first and last page to analyze, or let the maximum phrases setting do the work.

  • Question

    #4

    How does corpus processing work?

    After import, the text is split into sentences and automatically analyzed: lemmatization, part-of-speech tagging, dependency parsing. You can track progress in real time. Processing uses a high-accuracy French transformer model.

  • Question

    #5

    What is the concordance view?

    The concordance view shows every occurrence of a word with its left and right context. You can also open a reading mode to see surrounding sentences and better understand how the word is used in the text.

  • Question

    #6

    Can I filter by multiple word categories?

    Yes. The POS filter supports multi-select: choose for example Nouns + Verbs to only see those categories in the lexicon or concordances. You can also refine by sub-category: verb tenses, gender, number, determiner type (definite, possessive, demonstrative), and more.

  • Question

    #7

    How does semantic analysis work?

    Enter a French word and the platform combines two sources: WOLF and FastText, to reveal synonyms, hypernyms, hyponyms, and nearest neighbors in an interactive graph.

  • Question

    #8

    How long does processing take?

    It depends on the corpus size. A 100-page PDF typically takes 2 to 5 minutes. You can browse the platform freely while processing runs in the background.

  • Question

    #9

    Do I need to log in?

    Yes. The analysis workspace is protected by a PIN code. Public pages (home, FAQ, documentation) remain accessible without login.

  • Question

    #10

    Is my data secure?

    Yes. Your corpora and analyses are private and only accessible with your session. Nothing is shared or made public.

  • Question

    #11

    What is the Co-occurrences page?

    The Co-occurrences page shows words that frequently appear with a given word. You can choose from five statistical measures (MI, T-score, Z-score, Dice, Frequency) and adjust the search distance. An info box explains each measure.

  • Question

    #12

    What are N-grams used for?

    N-grams reveal recurring word sequences in your corpus (from 2 to 5 words). For example, 'il y a' or 'c'est un'. You can switch between canonical form (lemma) and the form as written.

  • Question

    #13

    How does proximity search work?

    Enter two words and find all sentences where they appear near each other. You can set the maximum distance, enforce an order (A before B or the reverse), and filter by grammatical category for each word.

  • Question

    #14

    How do I filter by author or date?

    If your file contains columns like author, date, or genre, you can use them to filter your corpus. Filters appear automatically on every analysis page. Select one or more authors, a period, or a genre, and all statistics recalculate instantly. Your filters are saved when you navigate between pages.

  • Question

    #15

    Can I annotate word occurrences?

    Yes. In the concordance table, click the gear icon to add your own annotation columns (free text or dropdown). Annotate each occurrence, then export everything as CSV with your annotations.

  • Question

    #16

    Can I export results?

    Yes. Every page has a download button that exports data as CSV, compatible with Excel. If filters are active, only filtered data is exported. Annotation columns are included in the concordance export.

  • Question

    #17

    What is the word cloud?

    The word cloud is a visualization in the Lexicon that displays the most frequent words. The bigger a word, the more frequent it is. Colors match grammatical categories. Click a word to see its concordances.

  • Question

    #18

    Where should I start?

    Log in, upload a file on the Upload page, then explore the lexicon and sentences in the Library. For deeper analysis, check co-occurrences, n-grams, or run a semantic search.