TEI-CORPO

Conversion tool from Elan, Clan, Transcriber and Praat files to TEI files and back

Java library and Swing user interface

Conversions can be made at this address without using the commande line interface: http://ct3.ortolang.fr/teiconvert/

The Java conversion tool (formats TEI_CORPO, CLAN, ELAN, Transcriber, Praat) can be downloaded here: teicorpo.jar

(Note: The filename teicorpo.jar has changed since version 1.40. Previous name was conversion.jar)

Warning : Java (version >= 8) has to be installed first on your computer to execute commands: Download Java

The source code can be found here https://github.com/christopheparisse/teicorpo The github website contains only the source of the project, not the compiled jar file.

Using the command line conversion tool

The tool can be used as a command line tool. There are several subprograms in the jar file. The main commands are grouped together in a general command which is called TeiCorpo. Other specific command can be useful to execute part of speech tagging or to edit the TEI files. The same general set of parameters applies to all command. Some parameters are command specific, however. The general command has the following form:

java -cp teicorpo.jar fr.ortolang.teicorpo.TeiCorpo -from input-format -to output-format input_files ... -o output [parameters]

All commands use the same input and output parameters:

The number of files to be converted (input) is not limited. However, only one output parameter can be set. If -o is not set, or if there are more than one input file, the name of the output file will be derived from the name of the input. If no output directory is specified, the output files will be in the same repertory as the input files. The input and output parameters can be repertory names. If the input parameter is a repertory, all files in the file subtree will be converted and placed accordingly in the output file tree.

The use of -from and -to takes precedence on information provided by file extensions. These options (-from and -to) can take the following arguments (all these options correspond to the default format used by the tools):

The -to option can also take the following arguments:

Other parameters that apply to all commands:

Other parameters for exports towards Txm and Lexico

Other parameters for exports towards text

Parameter for import from text

Conversion from Praat can use some specific parameters

Other commands (all are part of TeiCorpo command) :

Other commands to edit automatically TEI files

Use of TreeTagger to tag in part of speech a TEI file

TREE_TAGGER=/projets/syntax
        export TREE_TAGGER
        java -cp /projets/plceforlibraries/teicorpo.jar fr.ortolang.teicorpo.TeiTreeTagger -syntaxformat conll -model perceo_oral/spoken-french.par -rawline $1

Stanford Natural Language Processing (SNLP)

The Stanford parser, part of speech tagger, and other tools can be called to process the content of the TEI file. The results, as for the TreeTagger program come in three formats.

History version