course aims in Estonian
Kursuse eesmärk on anda ülevaade ja praktiline kogemus tekstikaeve meetoditest. Kursus käsitleb järgnevaid tekstikaevega seotud alamvaldkondi: loomuliku keele masintöötlus, infootsing, informatsiooni ekstraheerimine (sh süntagmaatiliste ja paradigmaatiliste seoste eraldamine, võtmesõnade ja -fraaside eraldamine), tekstide klasterdamine (sh teemade-põhine klasterdamine), klassifitseerimine ja tekstikokkuvõtted.
course aims in English
The aim of the course is to provide an overview and practical experience of text mining methods. The course covers the following subfields related to text mining: natural language processing, information retrieval, information extraction (incl. syntagmatic and paradigmatic relationships, and keywords and key phrases extraction), text clustering (incl. topic-based clustering), text classification and text summarization.
learning outcomes in the course in Est.
Kursuse läbinud üliõpilane:
- teab erinevaid teksti analüüsi liike;
- teab ja oskab rakendada tekstieeltöötlus operatsioone;
- oskab esmatasandil valida sobivat meetodit praktiliste ülesannete lahendamiseks;
- oskab kohandada ülesande lähteinfot sobivaks valitud meetodi kasutamiseks;
- oskab rakendada lihtsamaid tekstikaeva meetodeid struktureerimata tekstidel;
- omab esmast kogemust meetodi rakendamisest saadud tulemuse tõlgendamisel;
- oskab hinnata meetodi efektiivsust vastavalt valitud eesmärgile.
learning outcomes in the course in Eng.
On completion of the course, the student:
- knows different types of text analysis;
- knows and is able to apply word processing operations;
- is able to choose a suitable method for solving practical tasks at the primary level;
- is able to adapt the source information of the task to the appropriate use of the chosen method;
- is able to apply elementary text mining methods to unstructured texts;
- has primary experience in interpreting the result obtained from the application of the method;
- is able to evaluate the effectiveness of the method according to the chosen goal.
brief description of the course in Estonian
Tekstikaeve on kiiresti arenev uurimisvaldkond. Ligi 80% loodavastest andmetest on esitatud struktureerimata tekstina ja selle maht kasvab pidevalt. Tarvis on tööriistu (meetodeid), mis aitavad struktureerimata tekstist leida asjakohast, uudset ja/või huvipakkuvat infot. Kursuse eesmärgiks on anda ülevaade põhilistest tekstikaeve meetoditest. Õppetegevus keskendub kõige levinumatele teksti uurimise meetoditele, mis on leidnud tunnustust akadeemilise kommuuni poolt ja vaadeldakse mõningaid perspektiivikamaid lähenemisi. Lisaks antakse esmased praktilised oskused kõnealuses valdkonnas ning luuakse eeldused jätkukursustele, mis keskenduvad konkreetselt kindlate tekstianalüüsi meetodite käsitlemisele. Kursuse läbimiseks on tarvilik keskmisel tasemel programmeerimise oskus. Praktilistes harjutustes kasutatavaks programmeerimise keeleks on Python. Kursuse läbimiseks tuleb lahendada tunnis antud harjutusülesnadeid ja rakendada õpitud meetodeid tudengi enda poolt valitud tekstikorpusel.
brief description of the course in English
Text mining is a rapidly growing field of research. Approximately 80% of the generated data is unstructured text, and its volume is continuously growing. Tools (methods) are needed to help find relevant, novel, or interesting information in unstructured text. The course aims to give an overview of the primary text mining methods. The study focuses on the most common text research methods that have been recognized by the academic community and look at some of the more promising approaches. Besides, initial practical skills in this area are provided, and prerequisites are created for advanced courses that specifically focus on specific text analysis methods. Intermediate programming skills are required to complete the course. The programming language used in practical exercises is Python. The tasks given in the lessons must be solved, and the learned methods must be applied to the text corpus chosen by the student.
type of assessment in Estonian
Eksam koosneb kahest osast: teoreetilisest ja praktilisest. Teoreetiline osa eksamist seisneb suulisest vastamisest kursusel käsitletud suvaliste tekstikaeve meetodite liigituse, sisu ja praktilise rakendusvaldkonna kohta. Punktid praktilise osa kohta saadakse iseseisvate ülesannete lahenduse kaitsmisel saadud punktidest. Praktiliste tööde esitamine on kohustuslik.
type of assessment in English
The exam consists of two parts – theoretical and practical. The theoretical part of the exam consists of oral answers about the classification, content and practical fields of application of random text mining methods covered in the course. Points for the practical part are received from the points for defending solutions of independent exercises. Presenting practical work is mandatory.
independent study in Estonian
2* 16 tundi loenguid + 2*16 praktikume + 92 tundi iseseisvat tööd = 156 tundi . 92 tundi iseseisvat tööd sisaldab kolme iseseisvat kodutööd ja teooria õppimist.
independent study in English
2* 16 h of lectures + 2*16 practicums + 92 h of independent work = 156 h. 92 h of independent work includes three independent homeworks and learning the theory.
study literature
1. Dipanjan Sarkar (2016). „Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data“
2. Gary Miner, John Elder IV, Thomas Hill, Robert Nisbet, Dursun Delen, Andrew Fast (2012). „Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications“
study forms and load
daytime study: weekly hours
4.0
session-based study work load (in a semester):
practices
2.0
practices
16.0