Monday, November 20, 2006
A Cleaner City: Courtesy "Clean Kerala" Initiative?
Tuesday, November 14, 2006
Gathering more Knowledge is nothing but inflating Ego and narrowing one's Vision, says Amritanandamayi !!!!!!
AMMA
Wednesday, November 01, 2006
Malayalam Machine Transation: How Long Should We Wait?
Its a sad fact that we dont have any ongoing efforts towards building a English-Malayalam Machine Translation system (to the best of my knowledge, and I believe that I have done enough Googling to confrim the assertion). Such a system could play a big role in bridging the gap and to enable the common Keralite to keep abreast of the recent technical advancements by providing him an interface to the Web in his own language.
This post is to implicitly show how much we lag behind as compared to the other states in India regarding MT, and to provide a set of links which would possibly ease the literature survey part (and possibly, other parts too) of an effort to build a machine translation system for malayalam.
Disclaimer: I am no expert in machine translation or even the broader area of language technologies, but am one who would like to see an english to malayalam machine translation system in the near future.
Possible Impacts and Application Areas of an English to Malayalam Machine Translation System
- An english to malayalam machine translation system embedded in an email client would enable conversion of english mails to malayalam, which could be read out to the user by a text-to-speech system (there are various efforts on building text-to-speech conversion systems in various organizations in Kerala, the most notable one being the efforts at C-DIT, Thiruvananthapuram)
- A browser plugin would enable automatic conversion of the displayed web page to malayalam. This would open up the english content in the web (which as is obvious is fairly large) to almost all Keralites (as we have a high literacy rate, we could assume that almost evrybody would be able to read Malayalam)
Workshops/Conferences/Associations/Research Centers/Resources on Related Topics (Not comprehensive) - The more useful links are starred
- **A good overview of the various Indian Machine Translation efforts in India appears as a ppt at http://www.au-kbc.org/dfki/igws/Machine_Translation.ppt (I would say that going thru this one would give a good overview of the state of the art)
- Language Technologies Research Center, IIIT Hyderabad http://ltrc.iiit.net/showfile.php?filename=research/
- Modeling and Shallow Parsing of Indian Languages, Workshop in 2006 at IIT Bombay http://www.cfilt.iitb.ac.in/~mspil-06/ A paper on Malayalam handful of papers on Malayalam appeared there. See http://www.cfilt.iitb.ac.in/~mspil-06/id25.htm
- Natural Language Processing Association, India http://nlpai.iiit.ac.in/
- **Shakti-MT Kit: A tool for rapidly producing machine translation toolkits in Indian Languages, http://shakti.iiit.net/ (This system has already been used by a Chennai group to build an MT system from English to their Language)
- R.M.K. Sinha, `A Sanskrit based Word-expert model for machine translation among Indian languages',. http://ieeexplore.ieee.org/iel5/8421/26537/01182306.pdf
- Technology Development for Indian Languages - Department of IT, Goverment of India has a page on Indian Language Processing Resources at http://tdil.mit.gov.in/corpora/ach-corpora.htm
- C-DIT, Thiruvananthapuram has a Computational Linguistics Group who have built a Machine Translation System for the Hindi-Malayalam pair http://www.cdit.org/computionallinguistic.htm
- Prof. RMK Sinha at IIT Kanpur has been leading the effort at IIT Kanpur. A brief history of IIT Kanpur research on the same appears at http://www.cse.iitk.ac.in/users/langtech/hist.htm This includes details about the early 90s Anglabharathi System for the same
- Prof. Pushpak Bhattacharya has been leading the efforts at IIT Bombay. His homepage is at http://www.cse.iitb.ac.in/~pb/
- State and Role of Machine Translation in India - Article http://www.bcs-mt.org.uk/mtreview/11/mtr-11-10.htm
- Machine Translation set for Quantum Leap in India - Article http://www.cse.iitb.ac.in/~pb/indtrend2.htm
- Gyannidhi: A parallel corpus for Indian Languages http://www.cdacnoida.in/technicalpapers/PaperNepal.pdf
- Indian Language Corpora from the Central Institute of Indian Langauges - http://www.ciilcorpora.net/
- Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - NLP Group at the Stanford University - http://nlp.stanford.edu/links/statnlp.html
- STRANS 2001/2 - Symposium on Translation Support Systems - http://www.cse.iitk.ac.in/users/langtech/strans2002/index2002.htm An anaphora resolution system for malayalam is described in one of the papers ("Vasisht"-An anaphora resolution system for Malayalam and Hindi , Sobha L. and B.N.Patnaik, M.G.University Kottayam )
- ICON - International Conference on Natural Language Processing is a yearly event hosted in India ICON 2007 home is at http://www.iiit.net/icon2007/
- IJCAI 2007 Workshop on Cross-Lingual Information Access http://www.iiit.ac.in/CLIA2007
- IJCAI 2007 Workshop on Shallow Parsing in South Asian Languages http://shiva.iiit.ac.in/SPSAL2007/
According to what I understand, there are two possible approaches for Machine Translation
- Rule-Based: It involves using the knowledge about the two languages and coming up with a set of rules for translation. This may involve (shallow) parsing to some extent as well. The quality is limited by the quality of the language knowledge
- Statistical: This is the more recent and popular method of using aligned parallel corpora (i.e., for a A-B pair, it would need to have documents in A and the corresponding documents in B), but may be more extendable to similar language pairs as compared to the Rule-Based Approach. A good resource (including tutorials for download) appear at http://www.statmt.org/
The information posted above is limited to my knowledge of the subject (which is pretty low since I have never worked on language technologies). But, hope that this post provides a good resource which hopefully will aid efforts in the development of Malayalam machine translation systems (atleast in the initial stages).
Some expertise in this area (in the Malayalam context) rests with the Computational Linguistics Group at C-DIT Thiruvananthapuram. Infact, I believe that any effort in this direction has to be co-ordinated with the efforts at organizations like the below to get visibility
- C-DIT Thiruvananthapuram http://www.cdit.org/
- OSSICS http://www.ossics.com/
If any of the readers know of any efforts in this direction, kindly feel free to add the links to them in comments to this post.