default charset and alt-charset

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

default charset and alt-charset

Davor Spasoski-2
Following-up on our conversation regaring the encodings, I noticed one behviour of kannel that in my opinion colides with alt-charset=utf8 directive.

The SMSC supports UTF-8 with DCS=0 and it all works fine as long as I'm using characters of the GSM-7 bit (url-encoded in UTF-8). No need to use charset and dcs with smsbox.
But when using a character beyond GSM-7 bit, it's no longer automatically converted to UTF-8 by bearerbox, but to "?", unless I set coding=2 and charset=utf-8

What's the use of utf-8 as smsc alphabet if content provider still needs to watch the content of the message and set coding=2 and charset-utf-8 if necessary. Shouldn't kannel in this case assume:
- all input is utf-8  unless noted otherwise (dcs, charset)
- forward the text unchanged to SMSC, while preserving DCS=0

Similarly, even with SMSC with default alphabet of GSM 7-bit, isn't it better to assume:
- all input is utf-8  unless noted otherwise with charset
- transcode to UCS-2 with DCS=8 if input text (properly encoded in utf-8) contains anything outside GSM-7 bit alphabet.

If this is possible, please let me know how. If not, please consider adding this "auto-detect" option, I think it's going to be much more user friendly to content providers. Otherwise, smart quotes or similar, results in garbage on screen, because sender forgot to inspect and set coding=2 and charset=utf-8.

Davor Spasoski
VAS Manager /  Online and VAS Development

-----Original Message-----
From: Davor Spasoski
Sent: 07 April 2017 16:58
To: 'Stipe Tolj' <[hidden email]>
Cc: [hidden email]
Subject: RE: Encodings

Hi Stipe,

Thank you for your reply. I apologize for cross-posting. This is really usuful information.
I made some tests with few versions ftom 1.4.4 to SVN and it is consistent. Kill me if I'm wrong, but I remember that way long ago and with older versions and browsers I was able to url-encode the GSM characters with their hex value and get them properly on the handset, usually using alt-dcs=1 on our Comverse SMSC. My mistake with my tests is that I was doing the same now, but the browser

The enlitement for me is the alt-charset setting which was not clear to me from the userguide.

One more question: alt-addr-charset is there to prevent PDU breaking if 0x00 is in the address. But how come 0x00 in the short_message does not break it with GSM 7-bit?

Thanks a lot again!

Davor Spasoski

-----Original Message-----
From: devel [mailto:[hidden email]] On Behalf Of Stipe Tolj
Sent: 06 April 2017 11:18
Cc: [hidden email]
Subject: Re: Encodings

Am 02.04.17 21:57, schrieb Davor Spasoski:
> Dear kannel users&developers,

Hi Davor,

please don't cross-post into several mailing list, we consider this spaming.

Your questions is more related to internals, so devel@ should be the right place to ask.

> Can someone give precise information what happens encoding wise from
> smsbox to SMSC. I understand that as of 1.4.1:
> Smsbox i expecting utf-8 by default

correct, the sendsms HTTP interface assumes UTF-8 encoding as input, (if not otherwise indicated via the 'coding' and 'charset' HTTP GET variables).

> Communication smsbox ßàbearerbox is only via utf-8

IF the message is considered to be textual (coding=0), yes, UTF-8 is the internal encoding.

IF coding=1 is indicated then it's raw byte stream, with no encoding implicated.

IF coding=2 then the internal encoding will leave UCS-2.

> Bearerbox ßàSMSC is supposed to be ISO-8859-1

nop, that's latin1. Depending on the SMSC type there are different upstream encodings used as default.

I.e. for SMPP the default encoding (aka data coding scheme, DCS 0x00) is GSM 03.38.

> But then we have alt-dcs and alt-addr-charset that are supposed to
> enable GSM-7 alphabet between SMSC and bearerbox, but although
> documented, they both don’t seem to work from 1.4.2 onwards. There is
> a slight difference when I add alt-charset=GSM, but it certainly is
> not sending GSM. (I get a lot of question marks until I get to 0x28
> character)

The config 'alt-charset' in the SMPP config groups defines which default alphabet the SMSC assumes for it's DCS 0x00 encoding.

Keep in mind that 'alt-charset' relies on the iconv() library, and this does NOT include GSM 03.38, so there is no value for GSM 03.38 encoding that can be defined via 'alt-charset', which is also not required since it is default. Only all other default encodings can be switched to via this config directive.

> What if I have specific SMSC that is using GSM-7 or even something
> more weird like Escaped ISO-8859-1 that combines ISO and GSM 7-bit.
> Is SMSC – bearerbox in UTF-8 possible?

yes, 'alt-charset = UTF-8' would simply send the payload as UTF-8 encoded text. AFAIR, the HTTP SMSC types do this.

Best Regards,
Stipe Tolj

Düsseldorf, NRW, Germany

Kannel Foundation        system architecture  

[hidden email]                  [hidden email]


Disclaimer: one.Vip DOO Skopje
This e-mail (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, you should not copy it, re-transmit it, use it or disclose its contents, but should return it to the sender immediately and delete your copy from your system. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. Please note that e-mails are susceptible to change. one.Vip DOO Skopje shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system.
Please, do not print this e-mail unless it is necessary! Think about saving the environment!

Напомена: оне.Вип ДОО Скопје
Оваа електронска порака (вклучувајќи ги и прилозите) е доверлива и може да биде заштитена со правни привилегии. Доколку не сте лицето на кое таа му е наменета пораката, не треба да ја копирате, дистрибуирате или да ја откривате нејзината содржина, туку веднаш да ја препратите до испраќачот и да ја избришете оригиналната порака и сите нејзини копии од Вашиот компјутерски систем. Секое неовластено користење на оваа порака во целост или делови од истата е строго забрането. Ве молиме да забележите дека електронските пораки се подложни на промени. оне.Вип ДОО Скопје не презема одговорност за несоодветно или нецелосно пренесување на информациите содржани во оваа комуникација, ниту пак за било какво задоцнување на приемот или оштетувања на вашиот систем.
Ве молиме не ја печатете оваа порака освен ако не е неопходно! Зачувајте ја природата!