Linguateca's infrastructure for Portuguese and how it allows the detailed study of language varieties

Sammendrag

In this paper I present briefly Linguateca 1 , an infrastructure project for Portuguese which isten years old, and will show how it provides several possibilities to study grammatical and semantical differences between varieties of the language. After a short history of Portuguese corpus linguistics, presenting the main projects in the area, I discuss in some detail the AC/DC project (Santos & Bick, 2000), the Floresta Sintáctica treebank (Afonso et al., 2002, Freitas et al., 2008, Bick, 2004), and sketch some ideas for parallel corpora as started in CorTrad 2 (Tagnin et al., 2009). I will use three different kinds of examples: those related to known differences between variants, in both grammar and lexis, those related to diachronic differences, in that respect describing in detail Silva (2008, in press) model of Quantitative Lexicology and Variational Linguistics 3 in CONDIVport, and those that are in a way corpus-driven and for which novel functionalities of AC/DC have been devised, namely the comparison of two search expressions; and the pattern database.

Kategori

Vitenskapelig foredrag

Språk

Engelsk

Forfatter(e)

Diana Santos

Institusjon(er)

SINTEF Digital / Sustainable Communication Technologies

Presentert på

Workshop on research infrastructure for linguistic variation

Sted

Oslo

Dato

17.09.2009 - 18.09.2009

Arrangør

University of Oslo

År

2009

Eksterne ressurser

https://www.linguateca.pt/diana/download/acetsantosrilivs2009.pdf

Vis denne publikasjonen hos Cristin

Kontakt oss

Tjenester

Rapporter og publikasjoner

Forskningssenter og samarbeid

Karriere

Bærekraft

Institutter

Andre enheter

Ledelse og organisering

Om oss

Følg oss