ഈ മനോഹര തീരം: November 2006

A scientist gathers more knowledge, thereby narrowing his vision and inflating his ego. Well, I am sure that you are thinking as to who would be so stupid to vomit such nonsense; it was none other than the revered Amritanandamayi of Kerala. Atleast thats what the note pasted proudly in front of the "Mata Amritanandamayi Muth" shop at the Nedumbassery Airport Shopping Arcade says.

Yes, now what you want is proof of that. And thats exactly what I have in the picture below (Taken using my fone cam). And for those who dont want to enlarge the image and read it, I paste the stuff below as well .

A person who always thinks intellectually cannot understand the feelings of the heart, the meaning of meditation and love. Love is the force, the power and inspiration behind every word and every action. Love is behind all scientific experimentation and invention - behind work. But that love is limited to a narrow channel. It is directed only to the scientific field in which he works. It doesn't embrace all crations. No work can be performed without concentration. Concentration is nothing but the stillness of mind. Stillness of the mind comes only as a result of love. A Rishi is a real lover because he has dived into his own self, the very core of life and love. A Rishi is a real scientist. He experiments in the inner laboratory of his own being. A scientist keeps on adding more to his existing ego. He gathers more and more knowledge and more and more information which is nothing but the act of inflating the ego. However a Rishi is completely empty. He becomes like a corpse in the river. He lets the river of life carry him anywhere it likes. The scientist is externally full, full of knowledge about the world. The Rishi is internally full, full of experience of oneness with the supreme absolute. The scientist sees many; the Rishi sees one. The scientist is only a part of existence, while the Rishi is the whole of existence. While the scientist burdens himself with facts and figures, the Rishi becomes empty so that all knowledge can pass through him but cannot affect his experience of oneness. While the scientist limits and narrows his vision, the Rishi expands and embraces the whole universe.
AMMA

So, this is what she believes. She believes that being empty is great. And she believes sthat scientists are narrow-minded creatures who are filled with ego and interested in nothing but inflating his/her ego. The message from "Amma" is loud and clear; gather no more knowledge because it inflates your ego and makes you narrow minded, do nothing and have an empty mind.

Machine Translation is an active area of research, esp. in the Indian context. Machine translation technologies are those that convert text from one language to another. For instance, Google Translate is one system for Machine Translation.

Its a sad fact that we dont have any ongoing efforts towards building a English-Malayalam Machine Translation system (to the best of my knowledge, and I believe that I have done enough Googling to confrim the assertion). Such a system could play a big role in bridging the gap and to enable the common Keralite to keep abreast of the recent technical advancements by providing him an interface to the Web in his own language.

This post is to implicitly show how much we lag behind as compared to the other states in India regarding MT, and to provide a set of links which would possibly ease the literature survey part (and possibly, other parts too) of an effort to build a machine translation system for malayalam.

Disclaimer: I am no expert in machine translation or even the broader area of language technologies, but am one who would like to see an english to malayalam machine translation system in the near future.

Possible Impacts and Application Areas of an English to Malayalam Machine Translation System

An english to malayalam machine translation system embedded in an email client would enable conversion of english mails to malayalam, which could be read out to the user by a text-to-speech system (there are various efforts on building text-to-speech conversion systems in various organizations in Kerala, the most notable one being the efforts at C-DIT, Thiruvananthapuram)
A browser plugin would enable automatic conversion of the displayed web page to malayalam. This would open up the english content in the web (which as is obvious is fairly large) to almost all Keralites (as we have a high literacy rate, we could assume that almost evrybody would be able to read Malayalam)

Workshops/Conferences/Associations/Research Centers/Resources on Related Topics (Not comprehensive) - The more useful links are starred

**A good overview of the various Indian Machine Translation efforts in India appears as a ppt at http://www.au-kbc.org/dfki/igws/Machine_Translation.ppt (I would say that going thru this one would give a good overview of the state of the art)
Language Technologies Research Center, IIIT Hyderabad http://ltrc.iiit.net/showfile.php?filename=research/
Modeling and Shallow Parsing of Indian Languages, Workshop in 2006 at IIT Bombay http://www.cfilt.iitb.ac.in/~mspil-06/ A paper on Malayalam handful of papers on Malayalam appeared there. See http://www.cfilt.iitb.ac.in/~mspil-06/id25.htm
Natural Language Processing Association, India http://nlpai.iiit.ac.in/
**Shakti-MT Kit: A tool for rapidly producing machine translation toolkits in Indian Languages, http://shakti.iiit.net/ (This system has already been used by a Chennai group to build an MT system from English to their Language)
R.M.K. Sinha, `A Sanskrit based Word-expert model for machine translation among Indian languages',. http://ieeexplore.ieee.org/iel5/8421/26537/01182306.pdf
Technology Development for Indian Languages - Department of IT, Goverment of India has a page on Indian Language Processing Resources at http://tdil.mit.gov.in/corpora/ach-corpora.htm
C-DIT, Thiruvananthapuram has a Computational Linguistics Group who have built a Machine Translation System for the Hindi-Malayalam pair http://www.cdit.org/computionallinguistic.htm
Prof. RMK Sinha at IIT Kanpur has been leading the effort at IIT Kanpur. A brief history of IIT Kanpur research on the same appears at http://www.cse.iitk.ac.in/users/langtech/hist.htm This includes details about the early 90s Anglabharathi System for the same
Prof. Pushpak Bhattacharya has been leading the efforts at IIT Bombay. His homepage is at http://www.cse.iitb.ac.in/~pb/
State and Role of Machine Translation in India - Article http://www.bcs-mt.org.uk/mtreview/11/mtr-11-10.htm
Machine Translation set for Quantum Leap in India - Article http://www.cse.iitb.ac.in/~pb/indtrend2.htm
Gyannidhi: A parallel corpus for Indian Languages http://www.cdacnoida.in/technicalpapers/PaperNepal.pdf
Indian Language Corpora from the Central Institute of Indian Langauges - http://www.ciilcorpora.net/
Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - NLP Group at the Stanford University - http://nlp.stanford.edu/links/statnlp.html
STRANS 2001/2 - Symposium on Translation Support Systems - http://www.cse.iitk.ac.in/users/langtech/strans2002/index2002.htm An anaphora resolution system for malayalam is described in one of the papers ("Vasisht"-An anaphora resolution system for Malayalam and Hindi , Sobha L. and B.N.Patnaik, M.G.University Kottayam )
ICON - International Conference on Natural Language Processing is a yearly event hosted in India ICON 2007 home is at http://www.iiit.net/icon2007/
IJCAI 2007 Workshop on Cross-Lingual Information Access http://www.iiit.ac.in/CLIA2007
IJCAI 2007 Workshop on Shallow Parsing in South Asian Languages http://shiva.iiit.ac.in/SPSAL2007/

According to what I understand, there are two possible approaches for Machine Translation

Rule-Based: It involves using the knowledge about the two languages and coming up with a set of rules for translation. This may involve (shallow) parsing to some extent as well. The quality is limited by the quality of the language knowledge
Statistical: This is the more recent and popular method of using aligned parallel corpora (i.e., for a A-B pair, it would need to have documents in A and the corresponding documents in B), but may be more extendable to similar language pairs as compared to the Rule-Based Approach. A good resource (including tutorials for download) appear at http://www.statmt.org/

The information posted above is limited to my knowledge of the subject (which is pretty low since I have never worked on language technologies). But, hope that this post provides a good resource which hopefully will aid efforts in the development of Malayalam machine translation systems (atleast in the initial stages).

Some expertise in this area (in the Malayalam context) rests with the Computational Linguistics Group at C-DIT Thiruvananthapuram. Infact, I believe that any effort in this direction has to be co-ordinated with the efforts at organizations like the below to get visibility

C-DIT Thiruvananthapuram http://www.cdit.org/
OSSICS http://www.ossics.com/

If any of the readers know of any efforts in this direction, kindly feel free to add the links to them in comments to this post.

ഈ മനോഹര തീരം

Monday, November 20, 2006

A Cleaner City: Courtesy "Clean Kerala" Initiative?

Tuesday, November 14, 2006

Gathering more Knowledge is nothing but inflating Ego and narrowing one's Vision, says Amritanandamayi !!!!!!

Wednesday, November 01, 2006

Malayalam Machine Transation: How Long Should We Wait?

ബ്ലോഗ്കുറ്റ്

Links