LebGeeks

A community for technology geeks in Lebanon.

You are not logged in.

#1 June 17 2019

hishjwnz
Member

Method to romanize Arabic texts

Hey fellas,

Are you aware of a convenient way to romanize/latinize arabic words ie. (مرحبا=> Marhaba) for a full text. In other words, I'm looking for a tool that does the exact opposite of Yamli.
I've come across lexilogos.com and mylanguages.org, but after a few tests, the converted text is not really usable as the results are very different from our Latin way or writing Arabic (as in texting). illustrated failure below using mylanguages.org:
Original text:

"كل الكتب المنزلة
كل الرسل المرسلة
قالوا الحب هوي الدوا
وحلوا كل المسألة"

is converted to

"kl alktb almnzlh
kl alrsl almrslh
qalwa alhb hwy aldwa
whlwa kl almsalh"

I need this tool to transform Arabic song lyrics into Latin and adding chords to them for use in jamming sessions with friends. It's very messed up to do that with Arabic text mainly due to Right To Left scripts.

Thanks!

Offline

#2 June 18 2019

chosen2k
Member

Re: Method to romanize Arabic texts

https://translate.google.com

Converts it to

kl alkutub almunzila
kli alrusul almursila
qaluu alhabu hawia alduwaa
wahuluu kla almas'ala

Offline

#3 June 18 2019

rolf
Member

Re: Method to romanize Arabic texts

The problem is that for example كل  can be kul or kil depending on the grammar, because in Arabic you have to guess the haraket, so you need something that is a bit aware for the grammar if you want to do it properly.

My guess is your solution (if it does not exist) will be comprised of 2 parts
1. Analyze the text and add the haraket
2. Transliterate the text including the haraket
Then you should get easy to read output.

Offline

#4 June 18 2019

hishjwnz
Member

Re: Method to romanize Arabic texts

chosen2k wrote:

https://translate.google.com

Converts it to

kl alkutub almunzila
kli alrusul almursila
qaluu alhabu hawia alduwaa
wahuluu kla almas'ala

Wasn't aware that's an available option on Gtranslate so thanks! it's a bit better but still not readily usable without some extensive editing.

rolf wrote:

The problem is that for example كل  can be kul or kil depending on the grammar, because in Arabic you have to guess the haraket, so you need something that is a bit aware for the grammar if you want to do it properly.

My guess is your solution (if it does not exist) will be comprised of 2 parts
1. Analyze the text and add the haraket
2. Transliterate the text including the haraket
Then you should get easy to read output.

You're right. Too much work. It would probably be easier to just retype the songs myself.

Offline

#5 June 18 2019

rolf
Member

Re: Method to romanize Arabic texts

Well if you type "كل الكتب المنزلة" in google translate and hit the speaker button, it gets read out properly. So this functionality exists - the question is how to get access to it - which is your original question.
Sorry if I am not being very useful, I just thought I would give some insight as to why it is not so straightforward and easy to find, for what it is worth.

Last edited by rolf (June 18 2019)

Offline

#6 June 18 2019

Joe
Member

Re: Method to romanize Arabic texts

A couple of years ago I wrote this script to do exactly this.
Couple of issues that make it difficult to use:

Buckwalter Arabic is actually pretty strict in how to romanize arabic text, so you avoid any ambiguity. If you work with it for long enough, your eyes will learn to read it pretty well, so the readability problem kinda goes away after a while.

Now for the emacs portion of the script, you're going to have to do with it. Or, if interested, I wouldn't mind porting the script to python if it makes it easier to use.

Let me know.

Offline

#7 June 18 2019

hishjwnz
Member

Re: Method to romanize Arabic texts

rolf wrote:

Well if you type "كل الكتب المنزلة" in google translate and hit the speaker button, it gets read out properly. So this functionality exists - the question is how to get access to it - which is your original question.
Sorry if I am not being very useful, I just thought I would give some insight as to why it is not so straightforward and easy to find, for what it is worth.

Actually the way it reads it is very far from the way i'm aiming for, which is the typical Lebanese dialect (ie. Kil el kutob l munzaleh). so my guess is that it's a lost cause. very difficult to achieve.

Joe wrote:

A couple of years ago I wrote this script to do exactly this.
Couple of issues that make it difficult to use:

    It's an emacs module.

    It transliterates to the Buckwalter transliteration, which is not the easiest to read.

Buckwalter Arabic is actually pretty strict in how to romanize arabic text, so you avoid any ambiguity. If you work with it for long enough, your eyes will learn to read it pretty well, so the readability problem kinda goes away after a while.

If i understood correctly, this script replaces Arabic letters with their corresponding from the Buckwalter transliteration table. However, i don't think plain replacement will solve the issue as i was hoping to get a more accurate representation since i won't be the only one reading the end result and there might be new people joining every time (thus readability will remain a problem).
Thanks for sharing it though. I'm sure others might find it useful in a different application.

Offline

#8 June 19 2019

rolf
Member

Re: Method to romanize Arabic texts

hishjwnz wrote:
rolf wrote:

Well if you type "كل الكتب المنزلة" in google translate and hit the speaker button, it gets read out properly. So this functionality exists - the question is how to get access to it - which is your original question.
Sorry if I am not being very useful, I just thought I would give some insight as to why it is not so straightforward and easy to find, for what it is worth.

Actually the way it reads it is very far from the way i'm aiming for, which is the typical Lebanese dialect (ie. Kil el kutob l munzaleh). so my guess is that it's a lost cause. very difficult to achieve.

Yes I would not bet be on finding software that can understand Lebanese dialect.
Unless you have huge volumes of text, I think it would be best to quickly do it manually.
If you have a sentence that repeats itself many times you can use search and replace.

Offline

Board footer