[PATCH] alt-charset handling in HTTP SMSC module

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH] alt-charset handling in HTTP SMSC module

Stipe Tolj
Hi all,

here is small issue that I resolved some days ago for a client that uses the
HTTP SMSC towards an own HTTP API (via the generic type).

In the abstractive layer call httpsmsc_send() we handle the conversion to an
alternative character encoding, based on the value of 'alt-charset' of the
corresponding 'group = smsc' context. So far so good.

The point is: the function ASSUMES that all MTs have our internal encoding
(UTF-8) in the msg->sms.msgdata payload. Which is NOT the case if the smsbox
connection passed a coding=2, hence we have msg->sms.coding = 2 indicating that
the msgdata is UCS-2 and NOT UTF-8. That's why we need to handle both cases
here. The patch does this, and also ensures that the msg->sms.coding is also
reset to DC_UNDEF to ensure that any specific API functions don't indicate a
"wrong assumptive" encoding.

Please review and vote for commitment, should be pretty obvious.

Stipe

--
-------------------------------------------------------------------
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------

smsc_http-alt-charset.diff (2K) Download Attachment
smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

amalysh
Hi Stipe,

as far as I know, we handled coding=2 as binary data and never de- or encoded this? why do you try to do it now?
IMHO you have to check for coding == DC_7BIT and if it is not the case send it as is without any recoding.

Thanks,
Alexander Malysh


Am 09.03.2011 um 00:45 schrieb Stipe Tolj:

> Hi all,
>
> here is small issue that I resolved some days ago for a client that uses the
> HTTP SMSC towards an own HTTP API (via the generic type).
>
> In the abstractive layer call httpsmsc_send() we handle the conversion to an
> alternative character encoding, based on the value of 'alt-charset' of the
> corresponding 'group = smsc' context. So far so good.
>
> The point is: the function ASSUMES that all MTs have our internal encoding
> (UTF-8) in the msg->sms.msgdata payload. Which is NOT the case if the smsbox
> connection passed a coding=2, hence we have msg->sms.coding = 2 indicating that
> the msgdata is UCS-2 and NOT UTF-8. That's why we need to handle both cases
> here. The patch does this, and also ensures that the msg->sms.coding is also
> reset to DC_UNDEF to ensure that any specific API functions don't indicate a
> "wrong assumptive" encoding.
>
> Please review and vote for commitment, should be pretty obvious.
>
> Stipe
>
> --
> -------------------------------------------------------------------
> Kölner Landstrasse 419
> 40589 Düsseldorf, NRW, Germany
>
> tolj.org system architecture      Kannel Software Foundation (KSF)
> http://www.tolj.org/              http://www.kannel.org/
>
> mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
> -------------------------------------------------------------------
> <smsc_http-alt-charset.diff>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

Stipe Tolj
In reply to this post by Stipe Tolj
Am 09.03.2011 00:45, schrieb Stipe Tolj:

> Hi all,
>
> here is small issue that I resolved some days ago for a client that uses the
> HTTP SMSC towards an own HTTP API (via the generic type).
>
> In the abstractive layer call httpsmsc_send() we handle the conversion to an
> alternative character encoding, based on the value of 'alt-charset' of the
> corresponding 'group = smsc' context. So far so good.
>
> The point is: the function ASSUMES that all MTs have our internal encoding
> (UTF-8) in the msg->sms.msgdata payload. Which is NOT the case if the smsbox
> connection passed a coding=2, hence we have msg->sms.coding = 2 indicating that
> the msgdata is UCS-2 and NOT UTF-8. That's why we need to handle both cases
> here. The patch does this, and also ensures that the msg->sms.coding is also
> reset to DC_UNDEF to ensure that any specific API functions don't indicate a
> "wrong assumptive" encoding.
>
> Please review and vote for commitment, should be pretty obvious.
committed to svn trunk:

2011-03-11  Stipe Tolj  <stolj at kannel.org>
    * gw/smsc/smsc_http.c: ensure we handle 'alt-charset' correctly, in case
      we get DC_UCS2 coding in the payload.
      [Message-Id: <[hidden email]>]

Stipe

--
-------------------------------------------------------------------
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

Stipe Tolj
Am 11.03.2011 18:45, schrieb Stipe Tolj:

> committed to svn trunk:
>
> 2011-03-11  Stipe Tolj  <stolj at kannel.org>
>     * gw/smsc/smsc_http.c: ensure we handle 'alt-charset' correctly, in case
>       we get DC_UCS2 coding in the payload.
>       [Message-Id: <[hidden email]>]

Hi list,

I have reverted this patch due to Alex's veto. Alex tends that we do NOT
re-encode if the coding=[1|2], meaning only msg payloads with coding=0 should be
re-encoded.

I don't see that actually. Looking into gw/smsbox.c code we see that we have 3
options that a msg struct is bassed to bearerbox:

a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded

ok, let's assume this call to sendsms HTTP interface:

  http://...&coding=2&text=<url-encoded UCS data>

which is a legal injection of a MT message, resulting in a msg passed to
bearerbox which is NOT re-encoded at this stage.

Now, if this hits the smsc_http and we have an 'alt-charset' set, which means
the user wants a re-encoding to a specific charset, then the OLD code won't work
out in the smsc_http module.

AFAIK, Alex argues that anything coming in with coding=2 should be untouched.
Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this way:

  http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2

to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.

IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP interface?

Comments please.

Stipe

--
-------------------------------------------------------------------
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

Stipe Tolj
Am 12.03.2011 16:52, schrieb Stipe Tolj:

>
> I have reverted this patch due to Alex's veto. Alex tends that we do NOT
> re-encode if the coding=[1|2], meaning only msg payloads with coding=0 should be
> re-encoded.
>
> I don't see that actually. Looking into gw/smsbox.c code we see that we have 3
> options that a msg struct is bassed to bearerbox:
>
> a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
> b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
> c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded
>
> ok, let's assume this call to sendsms HTTP interface:
>
>   http://...&coding=2&text=<url-encoded UCS data>
>
> which is a legal injection of a MT message, resulting in a msg passed to
> bearerbox which is NOT re-encoded at this stage.
>
> Now, if this hits the smsc_http and we have an 'alt-charset' set, which means
> the user wants a re-encoding to a specific charset, then the OLD code won't work
> out in the smsc_http module.
>
> AFAIK, Alex argues that anything coming in with coding=2 should be untouched.
> Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this way:
>
>   http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2
>
> to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.
>
> IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP interface?
>
> Comments please.
ok, digging a bit more, I reviewed how our smsc_smpp code does things in this
regard.

For the MT side, msg_to_pd(), the coding == DC_UCS2 is not re-encoded in any
case, means we send UCS2 payload in the .short_message field.

Now, on the MO side, pdu_to_msg(), we catch in a case statement data_coding ==
0x08 (ucs2), and don't re-encode. We set coding == DC_UCS2 here.

So, ergo: IF we expect the MT user to pass a UCS2 message the way I mentioned
above to be re-encoded to UTF-8 internally, then we MUST assume the same for the
MO side, which we don't do.

So, that's why I wanted to handle the coding == DC_UCS2 in the smsc_http to be
able to re-encode that too for an alt-charset.

Stipe

--
-------------------------------------------------------------------
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: [PATCH] alt-charset handling in HTTP SMSC module

Rene Kluwen
I see in both cases UCS-2 data doesn't get re-encoded in the smsc_smpp case,
according to what you just wrote.
What's the problem?

For the same matter, in the http-case, I think re-encoding can be left up to
the sender, just like in the smpp-case.

== Rene


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf
Of Stipe Tolj
Sent: Saturday, 12 March, 2011 17:08
To: kannel_dev_mailinglist
Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module

Am 12.03.2011 16:52, schrieb Stipe Tolj:
>
> I have reverted this patch due to Alex's veto. Alex tends that we do NOT
> re-encode if the coding=[1|2], meaning only msg payloads with coding=0
should be
> re-encoded.
>
> I don't see that actually. Looking into gw/smsbox.c code we see that we
have 3

> options that a msg struct is bassed to bearerbox:
>
> a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
> b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
> c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded
>
> ok, let's assume this call to sendsms HTTP interface:
>
>   http://...&coding=2&text=<url-encoded UCS data>
>
> which is a legal injection of a MT message, resulting in a msg passed to
> bearerbox which is NOT re-encoded at this stage.
>
> Now, if this hits the smsc_http and we have an 'alt-charset' set, which
means
> the user wants a re-encoding to a specific charset, then the OLD code
won't work
> out in the smsc_http module.
>
> AFAIK, Alex argues that anything coming in with coding=2 should be
untouched.
> Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this
way:
>
>   http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2
>
> to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.
>
> IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP
interface?
>
> Comments please.

ok, digging a bit more, I reviewed how our smsc_smpp code does things in
this
regard.

For the MT side, msg_to_pd(), the coding == DC_UCS2 is not re-encoded in any
case, means we send UCS2 payload in the .short_message field.

Now, on the MO side, pdu_to_msg(), we catch in a case statement data_coding
==
0x08 (ucs2), and don't re-encode. We set coding == DC_UCS2 here.

So, ergo: IF we expect the MT user to pass a UCS2 message the way I
mentioned
above to be re-encoded to UTF-8 internally, then we MUST assume the same for
the
MO side, which we don't do.

So, that's why I wanted to handle the coding == DC_UCS2 in the smsc_http to
be
able to re-encode that too for an alt-charset.

Stipe

--
-------------------------------------------------------------------
Kölner Landstrasse 419
40589 Düsseldorf, NRW, Germany

tolj.org system architecture      Kannel Software Foundation (KSF)
http://www.tolj.org/              http://www.kannel.org/

mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
-------------------------------------------------------------------


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

amalysh
100% agree with Rene...

Alex

Am 12.03.2011 um 20:11 schrieb Rene Kluwen:

> I see in both cases UCS-2 data doesn't get re-encoded in the smsc_smpp case,
> according to what you just wrote.
> What's the problem?
>
> For the same matter, in the http-case, I think re-encoding can be left up to
> the sender, just like in the smpp-case.
>
> == Rene
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Stipe Tolj
> Sent: Saturday, 12 March, 2011 17:08
> To: kannel_dev_mailinglist
> Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module
>
> Am 12.03.2011 16:52, schrieb Stipe Tolj:
>>
>> I have reverted this patch due to Alex's veto. Alex tends that we do NOT
>> re-encode if the coding=[1|2], meaning only msg payloads with coding=0
> should be
>> re-encoded.
>>
>> I don't see that actually. Looking into gw/smsbox.c code we see that we
> have 3
>> options that a msg struct is bassed to bearerbox:
>>
>> a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
>> b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
>> c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded
>>
>> ok, let's assume this call to sendsms HTTP interface:
>>
>>  http://...&coding=2&text=<url-encoded UCS data>
>>
>> which is a legal injection of a MT message, resulting in a msg passed to
>> bearerbox which is NOT re-encoded at this stage.
>>
>> Now, if this hits the smsc_http and we have an 'alt-charset' set, which
> means
>> the user wants a re-encoding to a specific charset, then the OLD code
> won't work
>> out in the smsc_http module.
>>
>> AFAIK, Alex argues that anything coming in with coding=2 should be
> untouched.
>> Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this
> way:
>>
>>  http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2
>>
>> to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.
>>
>> IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP
> interface?
>>
>> Comments please.
>
> ok, digging a bit more, I reviewed how our smsc_smpp code does things in
> this
> regard.
>
> For the MT side, msg_to_pd(), the coding == DC_UCS2 is not re-encoded in any
> case, means we send UCS2 payload in the .short_message field.
>
> Now, on the MO side, pdu_to_msg(), we catch in a case statement data_coding
> ==
> 0x08 (ucs2), and don't re-encode. We set coding == DC_UCS2 here.
>
> So, ergo: IF we expect the MT user to pass a UCS2 message the way I
> mentioned
> above to be re-encoded to UTF-8 internally, then we MUST assume the same for
> the
> MO side, which we don't do.
>
> So, that's why I wanted to handle the coding == DC_UCS2 in the smsc_http to
> be
> able to re-encode that too for an alt-charset.
>
> Stipe
>
> --
> -------------------------------------------------------------------
> Kölner Landstrasse 419
> 40589 Düsseldorf, NRW, Germany
>
> tolj.org system architecture      Kannel Software Foundation (KSF)
> http://www.tolj.org/              http://www.kannel.org/
>
> mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
> -------------------------------------------------------------------
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: [PATCH] alt-charset handling in HTTP SMSC module

Rene Kluwen
I think the feature "an-sich" (for the ones who speak German) is nice to
have.
But in that case, all smsc drivers should handle the case in the same way.

== Rene

-----Original Message-----
From: Alexander Malysh [mailto:[hidden email]] On Behalf Of
Alexander Malysh
Sent: Sunday, 13 March, 2011 13:59
To: Rene Kluwen
Cc: 'Stipe Tolj'; 'kannel_dev_mailinglist'
Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module

100% agree with Rene...

Alex

Am 12.03.2011 um 20:11 schrieb Rene Kluwen:

> I see in both cases UCS-2 data doesn't get re-encoded in the smsc_smpp
case,
> according to what you just wrote.
> What's the problem?
>
> For the same matter, in the http-case, I think re-encoding can be left up
to

> the sender, just like in the smpp-case.
>
> == Rene
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Stipe Tolj
> Sent: Saturday, 12 March, 2011 17:08
> To: kannel_dev_mailinglist
> Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module
>
> Am 12.03.2011 16:52, schrieb Stipe Tolj:
>>
>> I have reverted this patch due to Alex's veto. Alex tends that we do NOT
>> re-encode if the coding=[1|2], meaning only msg payloads with coding=0
> should be
>> re-encoded.
>>
>> I don't see that actually. Looking into gw/smsbox.c code we see that we
> have 3
>> options that a msg struct is bassed to bearerbox:
>>
>> a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
>> b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
>> c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded
>>
>> ok, let's assume this call to sendsms HTTP interface:
>>
>>  http://...&coding=2&text=<url-encoded UCS data>
>>
>> which is a legal injection of a MT message, resulting in a msg passed to
>> bearerbox which is NOT re-encoded at this stage.
>>
>> Now, if this hits the smsc_http and we have an 'alt-charset' set, which
> means
>> the user wants a re-encoding to a specific charset, then the OLD code
> won't work
>> out in the smsc_http module.
>>
>> AFAIK, Alex argues that anything coming in with coding=2 should be
> untouched.
>> Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this
> way:
>>
>>  http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2
>>
>> to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.
>>
>> IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP
> interface?
>>
>> Comments please.
>
> ok, digging a bit more, I reviewed how our smsc_smpp code does things in
> this
> regard.
>
> For the MT side, msg_to_pd(), the coding == DC_UCS2 is not re-encoded in
any
> case, means we send UCS2 payload in the .short_message field.
>
> Now, on the MO side, pdu_to_msg(), we catch in a case statement
data_coding
> ==
> 0x08 (ucs2), and don't re-encode. We set coding == DC_UCS2 here.
>
> So, ergo: IF we expect the MT user to pass a UCS2 message the way I
> mentioned
> above to be re-encoded to UTF-8 internally, then we MUST assume the same
for
> the
> MO side, which we don't do.
>
> So, that's why I wanted to handle the coding == DC_UCS2 in the smsc_http
to

> be
> able to re-encode that too for an alt-charset.
>
> Stipe
>
> --
> -------------------------------------------------------------------
> Kölner Landstrasse 419
> 40589 Düsseldorf, NRW, Germany
>
> tolj.org system architecture      Kannel Software Foundation (KSF)
> http://www.tolj.org/              http://www.kannel.org/
>
> mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
> -------------------------------------------------------------------
>


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

amalysh
Hi,

I don't think there is some possibility to handle this the same way except to don't touch
UCS2 and binary data. There are even some http interfaces where you have to set UCS2
exactly and don't re-encode anything.

Thanks,
Alexander Malysh

Am 13.03.2011 um 17:04 schrieb Rene Kluwen:

> I think the feature "an-sich" (for the ones who speak German) is nice to
> have.
> But in that case, all smsc drivers should handle the case in the same way.
>
> == Rene
>
> -----Original Message-----
> From: Alexander Malysh [mailto:[hidden email]] On Behalf Of
> Alexander Malysh
> Sent: Sunday, 13 March, 2011 13:59
> To: Rene Kluwen
> Cc: 'Stipe Tolj'; 'kannel_dev_mailinglist'
> Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module
>
> 100% agree with Rene...
>
> Alex
>
> Am 12.03.2011 um 20:11 schrieb Rene Kluwen:
>
>> I see in both cases UCS-2 data doesn't get re-encoded in the smsc_smpp
> case,
>> according to what you just wrote.
>> What's the problem?
>>
>> For the same matter, in the http-case, I think re-encoding can be left up
> to
>> the sender, just like in the smpp-case.
>>
>> == Rene
>>
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf
>> Of Stipe Tolj
>> Sent: Saturday, 12 March, 2011 17:08
>> To: kannel_dev_mailinglist
>> Subject: Re: [PATCH] alt-charset handling in HTTP SMSC module
>>
>> Am 12.03.2011 16:52, schrieb Stipe Tolj:
>>>
>>> I have reverted this patch due to Alex's veto. Alex tends that we do NOT
>>> re-encode if the coding=[1|2], meaning only msg payloads with coding=0
>> should be
>>> re-encoded.
>>>
>>> I don't see that actually. Looking into gw/smsbox.c code we see that we
>> have 3
>>> options that a msg struct is bassed to bearerbox:
>>>
>>> a) msg->sms.coding == 0 (aka DC_7BIT), .msgdata is UTF-8 encoded
>>> b) msg->sms.coding == 1 (aka DC_8BIT), .msgdata has binary data
>>> c) msg->sms.coding == 2 (aka DC_UCS2), .msgdata is UCS-2 encoded
>>>
>>> ok, let's assume this call to sendsms HTTP interface:
>>>
>>> http://...&coding=2&text=<url-encoded UCS data>
>>>
>>> which is a legal injection of a MT message, resulting in a msg passed to
>>> bearerbox which is NOT re-encoded at this stage.
>>>
>>> Now, if this hits the smsc_http and we have an 'alt-charset' set, which
>> means
>>> the user wants a re-encoding to a specific charset, then the OLD code
>> won't work
>>> out in the smsc_http module.
>>>
>>> AFAIK, Alex argues that anything coming in with coding=2 should be
>> untouched.
>>> Well, this ASSUMES then that a UCS-2 payload can ONLY be injected this
>> way:
>>>
>>> http://...&coding=0&text=<url-encoded UCS data>&charset=UCS2
>>>
>>> to ensure smsbox re-encodes the UCS2 data to UTF-8 internally.
>>>
>>> IF so, why the heck do we have then coding=2 exposed at the sendsms HTTP
>> interface?
>>>
>>> Comments please.
>>
>> ok, digging a bit more, I reviewed how our smsc_smpp code does things in
>> this
>> regard.
>>
>> For the MT side, msg_to_pd(), the coding == DC_UCS2 is not re-encoded in
> any
>> case, means we send UCS2 payload in the .short_message field.
>>
>> Now, on the MO side, pdu_to_msg(), we catch in a case statement
> data_coding
>> ==
>> 0x08 (ucs2), and don't re-encode. We set coding == DC_UCS2 here.
>>
>> So, ergo: IF we expect the MT user to pass a UCS2 message the way I
>> mentioned
>> above to be re-encoded to UTF-8 internally, then we MUST assume the same
> for
>> the
>> MO side, which we don't do.
>>
>> So, that's why I wanted to handle the coding == DC_UCS2 in the smsc_http
> to
>> be
>> able to re-encode that too for an alt-charset.
>>
>> Stipe
>>
>> --
>> -------------------------------------------------------------------
>> Kölner Landstrasse 419
>> 40589 Düsseldorf, NRW, Germany
>>
>> tolj.org system architecture      Kannel Software Foundation (KSF)
>> http://www.tolj.org/              http://www.kannel.org/
>>
>> mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
>> -------------------------------------------------------------------
>>
>
>


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH] alt-charset handling in HTTP SMSC module

Stipe Tolj-2
In reply to this post by Stipe Tolj
Am 09.03.2011 00:45, schrieb Stipe Tolj:

> Hi all,
>
> here is small issue that I resolved some days ago for a client that uses the
> HTTP SMSC towards an own HTTP API (via the generic type).
>
> In the abstractive layer call httpsmsc_send() we handle the conversion to an
> alternative character encoding, based on the value of 'alt-charset' of the
> corresponding 'group = smsc' context. So far so good.
>
> The point is: the function ASSUMES that all MTs have our internal encoding
> (UTF-8) in the msg->sms.msgdata payload. Which is NOT the case if the smsbox
> connection passed a coding=2, hence we have msg->sms.coding = 2 indicating that
> the msgdata is UCS-2 and NOT UTF-8. That's why we need to handle both cases
> here. The patch does this, and also ensures that the msg->sms.coding is also
> reset to DC_UNDEF to ensure that any specific API functions don't indicate a
> "wrong assumptive" encoding.
>
> Please review and vote for commitment, should be pretty obvious.

reconsidered this patchset as we came across the problem one more time.

We NEED a way (by setting the alt-charset) to ensure that we CAN define
a unique encoding torwards the upstream, otherwise any HTTP API that has
no way to indicate the encoding get's UTF-8 (for coding=0) and UCS-2
(for coding=2) as payload, which messes things up semantically.

Stipe

--
Best Regards,
Stipe Tolj

-------------------------------------------------------------------
Düsseldorf, NRW, Germany

Kannel Foundation                 tolj.org system architecture
http://www.kannel.org/            http://www.tolj.org/

stolj at kannel.org               st at tolj.org
-------------------------------------------------------------------

Loading...