Research areas


Computational LinguisticsLinguistics, Sociolinguistics and History of language
Lexicography and PhilologyLanguage teaching





Computational Linguistics


Fields of research: Corpus linguistics and tagging, information retrieval and text analysis, natural language processing, parsing and generation, morphological analysers, transliteration tools

Ongoing works:

  • ArMExLeR: Arabic Meaning Extraction through Lexical Resources. A general-purpose data mining model for Arabic texts is proposed which employs a chained pipeline of existing public domain and published lexical resources (Stanford Parser, WordNet, Arabic WordNet, SUMO, AraMorph, A Frequency Dictionary of Arabic) in order to extract a weakly hierarchised, single-predicate level, representation of meaning. This kind of model would be of high impact on the study of the computational analysis of Arabic for there is no such comparable tool for this language, and will be a challenge for the nature of its specificities. One should, in fact, cope with the unique writing system that is mostly consonant-based and doesn't always mark vowels explicitly. This is crucial when you want to analyze an Arabic corpus for the same consonantal ductus may be read in several ways.


  • The SALAH Project: Segmentation and Linguistic Analysis of ḥadīṯ Arabic Texts. A model for the unsupervised segmentation and linguistic analysis of Arabic texts of Prophetic tradition (ḥadīṯs), SALAH, is proposed. The model automatically segments each text unit in a transmitter chain (isnād) and a text content (matn) and further analyses each segment according to two distinct pipelines: a set of regular expressions chunks transmitter chains in a graph labeled with the relation between transmitters, while a tailored, augmented version of the AraMorph morphological analyzer (RAM) analyzes and annotates lexically and morphologically the text content. A graph with relations among transmitters and a lemmatized text corpus, both in XML format, are the final output of the system, which can further feed the automatic generation of con-cordances of the texts with variable-sized windows. The model results can be useful for a variety of purposes, including retrieving information from ḥadīṯ texts, verify the relations between transmitters, finding variant readings, supplying lexical information to specialized dictionaries.
    More info and contacts: See the pages of Giuliano Lancioni and Marco Boella.


  • Categorial grammar for information retrieval in Arabic:  this project aims to  explore the 'computational' fitting of some grammar models, such as the Combinatory-Categorial Grammar, in order to design information retrieval tools for Arabic texts.
    More info and contacts: See the pages of Giuliano Lancioni and Marco Boella.


  • Computational  analysis of Alchemic corpora from the work of Jabir Ibn Hayyan
    More info and contacts: see the page of Ilaria Cicola 




Linguistics, Sociolinguistics and History of language



Ongoing works:

  • Rhetorical functions and loci of diglossic code-switching in Arabic - the project deals with the issue of the diglossic code-switching in  the Arabic spoken language and especially in Christian religious discourse. The main aim is of describing rhetorical inherent value, the rhetorical functions and loci in the diglossic code-switching in the spoken language.
    More info and contacts: see the page of Marco Hamam 
  • Words use in arabic comics focused on semantic fields and on lexical structures. Focus on the main features of the arabic comics in order to work on tagging.
    More info and contacts: see the page of Milena Di Canio 



Lexicography and Philology


Ongoing works:

  • Tagging models of Classical Arabic medical texts: the projects aims to define a reasonably complete tagset, compliant with the Text Encoding Initiative standards, to tag all relevant information in  Classical Arabic medical texts.
    More info and contacts: see the page of Francesca Romana Romani 
  • Defining a wordlist of Arabic medical terms: the projects aims to compile a wordlist of currently used medical technical terms together with definitions and English (and Italian) translations.
    More info and contacts: see the page of Francesca Romana Romani 
  • Ghafiqi Project (launched by McGill’s Institute of Islamic Studies and The Osler Library): the aim of the project is to produce a critical edition of the Arabic text, with translation and commentary, of Kitāb 'l-'adwiya 'l-mufrada by al-Ġāfiqī.
    More info and contacts: see the page of Eleonora di Vincenzo 
  • Critical edition of Sirāǧ fī ‘ilm al-falak by 'Abū Zayd al-'Akhdarī(16th c.).
    More info and contacts: see the page of Eleonora di Vincenzo





Language teaching


Ongoing works:

  • Teaching ESA? This project aims to develop a learning model of spoken varieties that could be useful for Arabic learners who already have some knowledge of Modern Standard Arabic (MSA).
    More info and contacts: see the page of Anjela Al-Raies








Site Map Powered by gp|Easy CMS