Wednesday, November 01, 2006

Malayalam Machine Transation: How Long Should We Wait?

Machine Translation is an active area of research, esp. in the Indian context. Machine translation technologies are those that convert text from one language to another. For instance, Google Translate is one system for Machine Translation.

Its a sad fact that we dont have any ongoing efforts towards building a English-Malayalam Machine Translation system (to the best of my knowledge, and I believe that I have done enough Googling to confrim the assertion). Such a system could play a big role in bridging the gap and to enable the common Keralite to keep abreast of the recent technical advancements by providing him an interface to the Web in his own language.

This post is to implicitly show how much we lag behind as compared to the other states in India regarding MT, and to provide a set of links which would possibly ease the literature survey part (and possibly, other parts too) of an effort to build a machine translation system for malayalam.

Disclaimer: I am no expert in machine translation or even the broader area of language technologies, but am one who would like to see an english to malayalam machine translation system in the near future.

Possible Impacts and Application Areas of an English to Malayalam Machine Translation System

  1. An english to malayalam machine translation system embedded in an email client would enable conversion of english mails to malayalam, which could be read out to the user by a text-to-speech system (there are various efforts on building text-to-speech conversion systems in various organizations in Kerala, the most notable one being the efforts at C-DIT, Thiruvananthapuram)
  2. A browser plugin would enable automatic conversion of the displayed web page to malayalam. This would open up the english content in the web (which as is obvious is fairly large) to almost all Keralites (as we have a high literacy rate, we could assume that almost evrybody would be able to read Malayalam)

Workshops/Conferences/Associations/Research Centers/Resources on Related Topics (Not comprehensive) - The more useful links are starred

  1. **A good overview of the various Indian Machine Translation efforts in India appears as a ppt at http://www.au-kbc.org/dfki/igws/Machine_Translation.ppt (I would say that going thru this one would give a good overview of the state of the art)
  2. Language Technologies Research Center, IIIT Hyderabad http://ltrc.iiit.net/showfile.php?filename=research/
  3. Modeling and Shallow Parsing of Indian Languages, Workshop in 2006 at IIT Bombay http://www.cfilt.iitb.ac.in/~mspil-06/ A paper on Malayalam handful of papers on Malayalam appeared there. See http://www.cfilt.iitb.ac.in/~mspil-06/id25.htm
  4. Natural Language Processing Association, India http://nlpai.iiit.ac.in/
  5. **Shakti-MT Kit: A tool for rapidly producing machine translation toolkits in Indian Languages, http://shakti.iiit.net/ (This system has already been used by a Chennai group to build an MT system from English to their Language)
  6. R.M.K. Sinha, `A Sanskrit based Word-expert model for machine translation among Indian languages',. http://ieeexplore.ieee.org/iel5/8421/26537/01182306.pdf
  7. Technology Development for Indian Languages - Department of IT, Goverment of India has a page on Indian Language Processing Resources at http://tdil.mit.gov.in/corpora/ach-corpora.htm
  8. C-DIT, Thiruvananthapuram has a Computational Linguistics Group who have built a Machine Translation System for the Hindi-Malayalam pair http://www.cdit.org/computionallinguistic.htm
  9. Prof. RMK Sinha at IIT Kanpur has been leading the effort at IIT Kanpur. A brief history of IIT Kanpur research on the same appears at http://www.cse.iitk.ac.in/users/langtech/hist.htm This includes details about the early 90s Anglabharathi System for the same
  10. Prof. Pushpak Bhattacharya has been leading the efforts at IIT Bombay. His homepage is at http://www.cse.iitb.ac.in/~pb/
  11. State and Role of Machine Translation in India - Article http://www.bcs-mt.org.uk/mtreview/11/mtr-11-10.htm
  12. Machine Translation set for Quantum Leap in India - Article http://www.cse.iitb.ac.in/~pb/indtrend2.htm
  13. Gyannidhi: A parallel corpus for Indian Languages http://www.cdacnoida.in/technicalpapers/PaperNepal.pdf
  14. Indian Language Corpora from the Central Institute of Indian Langauges - http://www.ciilcorpora.net/
  15. Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - NLP Group at the Stanford University - http://nlp.stanford.edu/links/statnlp.html
  16. STRANS 2001/2 - Symposium on Translation Support Systems - http://www.cse.iitk.ac.in/users/langtech/strans2002/index2002.htm An anaphora resolution system for malayalam is described in one of the papers ("Vasisht"-An anaphora resolution system for Malayalam and Hindi , Sobha L. and B.N.Patnaik, M.G.University Kottayam )
  17. ICON - International Conference on Natural Language Processing is a yearly event hosted in India ICON 2007 home is at http://www.iiit.net/icon2007/
  18. IJCAI 2007 Workshop on Cross-Lingual Information Access http://www.iiit.ac.in/CLIA2007
  19. IJCAI 2007 Workshop on Shallow Parsing in South Asian Languages http://shiva.iiit.ac.in/SPSAL2007/

According to what I understand, there are two possible approaches for Machine Translation

  • Rule-Based: It involves using the knowledge about the two languages and coming up with a set of rules for translation. This may involve (shallow) parsing to some extent as well. The quality is limited by the quality of the language knowledge
  • Statistical: This is the more recent and popular method of using aligned parallel corpora (i.e., for a A-B pair, it would need to have documents in A and the corresponding documents in B), but may be more extendable to similar language pairs as compared to the Rule-Based Approach. A good resource (including tutorials for download) appear at http://www.statmt.org/

The information posted above is limited to my knowledge of the subject (which is pretty low since I have never worked on language technologies). But, hope that this post provides a good resource which hopefully will aid efforts in the development of Malayalam machine translation systems (atleast in the initial stages).

Some expertise in this area (in the Malayalam context) rests with the Computational Linguistics Group at C-DIT Thiruvananthapuram. Infact, I believe that any effort in this direction has to be co-ordinated with the efforts at organizations like the below to get visibility

  1. C-DIT Thiruvananthapuram http://www.cdit.org/
  2. OSSICS http://www.ossics.com/

If any of the readers know of any efforts in this direction, kindly feel free to add the links to them in comments to this post.

10 comments:

yetanother.softwarejunk said...

Good to see this link.

I am not sure how far I can travel. Let me think about it.

Thanks for you comment too.

-YaSJ.

Sobha said...

There are many issues in this area. There is no good department in Kerala University or centers who can take such an initiative. The way to English- Malayalam machine Translation system is very far. If things workout well you can see some MT from Tamil to Malayalam and back (not a generic system) in 24 months time, This too not from Kerala but from Tanjavore and Chennai.

deepak said...

sobha. could you please post the links about the same. infact, there are ongoing efforts on machine translation in Kerala also. There could be enough synergy between such groups working towards the same end.

Anonymous said...

WE in CUSAT department of computer applications are stepping into this area. We are searching for information on Malayalam linguistcs and related areas.
Kannan Balakrishnan

deepak said...

i had a chance to give a light tutorial intro to machine translation at a meetup of techies in kerala - barcamp kerala 8 at thiruvalla. have uploaded the slides and a handout doc at http://sites.google.com/site/deepakp7/barcamps

its very much an example oriented prez. so, shud be easy to follow.

Viagra Online said...

I love this machines those are perfect to learn vocabulary because to translate they doesn't work everything is different. Generic Viagra Buy Viagra

Nisk said...

Deepak, There are some ways by which we can translate English to Malayam..
One of the way is SMT(Statistical Machine Translation,another one is Rule based Translation which is little hard .
http://nlp.amrita.edu:8080/Eng2Mal/
visit this site and see the miracle

Anonymous said...

We, the Students from Govt. College of Engineering Kannur , built a Statistical Machine Translation System which converts English to Malayalam and Malayalam to Englis. Which works fine.

ammu said...

anybody knows about any corpus for malayalam to english translation..?

Anonymous said...

My brother suggested I might like this web site. He was once totally right.
This post truly made my day. You can not believe simply how much time I had spent for this info!
Thank you!

Feel free to visit my webpage :: airplane games simulator