SQLite and pike strings in bindings

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SQLite and pike strings in bindings

Marc Dirix-2
Is there a reason that pike strings are converted to "blob" when used in
bindings for an SQLite query?
It seems k->val.u.string->size_shift always resolves to being 0,
although I'm not sure where size_shift stands for, I've deduced from
modules/_Roxen_roxen.c, line 318, that it should stand for 8-bit
strings, correct?

But if it's an 8-bit string, why doesn't it get converted to TEXT?
What does size_shift=0,1 or 2 mean?

 > object db = Sql.Sql("sqlite://test.db");
 > db->query("create table abc (bla TEXT);");
(1) Result: 0
 > db->query("insert into abc (bla) values (:bar);",([":bar":"bar"]));
(2) Result: 0
 > db->query("select typeof(bla) from abc;");
(3) Result: ({ /* 1 element */
                 ([ /* 1 element */
                   "typeof(bla)": "blob"
                 ])
             })

I've found in sqlite.cmod:

       case T_STRING:
       {
         struct pike_string *s = k->val.u.string;
         switch(s->size_shift) {
         case 0:
           ERR( sqlite3_bind_blob(stmt, idx, s->str, s->len,
                                  SQLITE_STATIC),
                db);
           break;
         case 1:
         case 2:
           ref_push_string(s);
           f_string_to_utf8(1);
           s = Pike_sp[-1].u.string;
           ERR( sqlite3_bind_text(stmt, idx, s->str, s->len,
                                  SQLITE_TRANSIENT),
                db);
           pop_stack();
           break;

/Marc


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SQLite and pike strings in bindings

Leif Stensson, Lysator @ Pike  importmöte för mailinglistan
Can't tell you why it's done the way it is, but I can tell you what
the shift is for.

The shift indicates how much room each character in a string
uses. Thus shift 0 => 1 byte/character, shift 1 => 2 bytes, shift 3 =>
4 bytes/character.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Marc Dirix-2
In reply to this post by Marc Dirix-2

Op 18-04-12 19:30, [hidden email] schreef:
> The shift indicates how much room each character in a string
> uses. Thus shift 0 =>  1 byte/character, shift 1 =>  2 bytes, shift 3 =>
> 4 bytes/character.
I suspect that the original implementor has chosen BLOB as an easy way
to store both binary data as well as strings contained in pike-strings.
But forgot to consider the mixed results one later get's when using
selected with in-query strings.

On the SQLite website [1] TEXT is defined as being string which is
encodable in UTF-8, UTF-16BE or UTF-16LE.
In my understanding 1 byte/character should be encodable in those three,
so I propose changing this to always use
TEXT instead of blob when the pike string is nonbinary (8bit).

The Mysql module has the functions listed below, are they enough to
determine if a pike string contains binary data ?

If these checks are valid dectection I think they should be reused the
SQLite module.


#define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do {                     \
     if (sp[ARG-1-args].type != T_STRING ||                              \
         sp[ARG-1-args].u.string->size_shift ||                          \
         string_has_null(sp[ARG-1-args].u.string))                       \
       SIMPLE_BAD_ARG_ERROR (FUNC, ARG, "string (nonbinary 8bit)");      \
   } while (0)

#define CHECK_8BIT_STRING(FUNC, ARG) do {                               \
     if (sp[ARG-1-args].type != T_STRING ||                              \
         sp[ARG-1-args].u.string->size_shift)                            \
       SIMPLE_BAD_ARG_ERROR (FUNC, ARG, "string (8bit)");        \
   } while (0)

/Marc


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

H. William Welliver III
I've run into similar problems; I have a note typed up at home that
discusses this (and bindings in general). I'll try to forward those
thoughts on when I get home.

I think in general, that it would be nice to have some examples of how
bindings are supposed to work, as I found some odd behavior that was
unexpected (for me, at least).

Bill

On Thu, 19 Apr 2012, Marc Dirix wrote:

>
> Op 18-04-12 19:30, [hidden email] schreef:
>> The shift indicates how much room each character in a string
>> uses. Thus shift 0 =>  1 byte/character, shift 1 =>  2 bytes, shift 3 =>
>> 4 bytes/character.
> I suspect that the original implementor has chosen BLOB as an easy way to
> store both binary data as well as strings contained in pike-strings. But
> forgot to consider the mixed results one later get's when using selected with
> in-query strings.
>
> On the SQLite website [1] TEXT is defined as being string which is encodable
> in UTF-8, UTF-16BE or UTF-16LE.
> In my understanding 1 byte/character should be encodable in those three, so I
> propose changing this to always use
> TEXT instead of blob when the pike string is nonbinary (8bit).
>
> The Mysql module has the functions listed below, are they enough to determine
> if a pike string contains binary data ?
>
> If these checks are valid dectection I think they should be reused the SQLite
> module.
>
>
> #define CHECK_8BIT_NONBINARY_STRING(FUNC, ARG) do {                     \
>    if (sp[ARG-1-args].type != T_STRING ||                              \
>        sp[ARG-1-args].u.string->size_shift ||                          \
>        string_has_null(sp[ARG-1-args].u.string))                       \
>      SIMPLE_BAD_ARG_ERROR (FUNC, ARG, "string (nonbinary 8bit)");      \
>  } while (0)
>
> #define CHECK_8BIT_STRING(FUNC, ARG) do {                               \
>    if (sp[ARG-1-args].type != T_STRING ||                              \
>        sp[ARG-1-args].u.string->size_shift)                            \
>      SIMPLE_BAD_ARG_ERROR (FUNC, ARG, "string (8bit)");        \
>  } while (0)
>
> /Marc
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Stephen R. van den Berg
In reply to this post by Marc Dirix-2
Marc Dirix wrote:
>The Mysql module has the functions listed below, are they enough to
>determine if a pike string contains binary data ?

I'm wondering, why don't I run into this problem in the pgsql driver?
Does the SQLite driver do something fundamentally different with respect
to handling data than PostgreSQL?
--
Stephen.

Safe sex is in your hands.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Marc Dirix-2
> I'm wondering, why don't I run into this problem in the pgsql driver?
> Does the SQLite driver do something fundamentally different with respect
> to handling data than PostgreSQL?

PostgreSQL datatypes are determined during table creation, and fixed
thereafter.
So if you define a column as TEXT, or VARCHAR(n) you can only insert
these values having these types (or a typeconversion takes place).
I don't now the ins and outs of the PSQL module, but it probably lets
the database decide wether an string is data (BYTEA) or not

In SQLite this is not fixed. If you define a column to be TEXT you can
just as well insert other datatypes, it simple changes
the datatype for the record. The SQLite glue code in the module defines
whether a string is data or not, using sqlite3_bind_blob and
sqlit3_bind_text functions.

The main problem I find is the mixed behaviour with inserted string. If
my string is 1-byte, the SQLite module inserts it as "BLOB"
in the SQLite database and therefor I can't select it using a prepared
SELECT, (e.g. "SELECT * from abc WHERE bla='foo'"), whereas for 2 or
4-byte strings this works.

Using bindings for both SELECT and INSERT works as expected.

/Marc

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Stephen R. van den Berg
Marc Dirix wrote:
>The main problem I find is the mixed behaviour with inserted string.
>If my string is 1-byte, the SQLite module inserts it as "BLOB"
>in the SQLite database and therefor I can't select it using a
>prepared SELECT, (e.g. "SELECT * from abc WHERE bla='foo'"), whereas
>for 2 or 4-byte strings this works.

What would be the (if any) proper syntax/quoting to make the above SELECT
work with the way the data is in the current DB?
--
Stephen.

Safe sex is in your hands.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Marc Dirix-2
> What would be the (if any) proper syntax/quoting to make the above SELECT
> work with the way the data is in the current DB?

I think none in this case. Because the column is defined as TEXT, the
column has TEXT affinity, meaning that the
column can store NULL, TEXT or BLOB. However if given "unknown" type (in
SQL usually quoted with ''), or integer etc, they are automatically
converted to the affined type: TEXT. The conversion takes place before
the comparison.

A cast will work:
"SELECT * from abc WHERE bla=CAST('foo' AS BLOB);"

/Marc

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Henrik Grubbström-2
In reply to this post by Marc Dirix-2
On Thu, 19 Apr 2012, Marc Dirix wrote:

> Op 18-04-12 19:30, [hidden email] schreef:
>> The shift indicates how much room each character in a string
>> uses. Thus shift 0 =>  1 byte/character, shift 1 =>  2 bytes, shift 3 =>
>> 4 bytes/character.
> I suspect that the original implementor has chosen BLOB as an easy way to
> store both binary data as well as strings contained in pike-strings. But
> forgot to consider the mixed results one later get's when using selected with
> in-query strings.
>
> On the SQLite website [1] TEXT is defined as being string which is encodable
> in UTF-8, UTF-16BE or UTF-16LE.
> In my understanding 1 byte/character should be encodable in those three, so I
> propose changing this to always use
> TEXT instead of blob when the pike string is nonbinary (8bit).
If I remember correctly, the Oracle glue uses strings wrapped in multisets
to indicate BLOBS in bindings. This should be doable in the SQLite case as
well.

--
Henrik Grubbström [hidden email]
Roxen Internet Software AB
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

H. William Welliver III-2
This old conversation got re-opened recently…

I’ve committed a fix for the SQLite binding in 8.1 and 8.0 that stores string values as unicode strings. This should cause the binding to store data in a way that works properly with bound and direct queries. I believe the commit to 8.0 was warranted because any reasonable attempt to use the SQLite bindings would have failed without using extremely contorted logic.

Additionally, I’ve added support for using single-value multisets to store values as binary (BLOB) values, as found in the Oracle glue. I haven’t committed tests, but will do so in the next day or two.

Please let me know if there are any questions or concerns about this change.

Bill

> On May 4, 2012, at 11:11 AM, Henrik Grubbström <[hidden email]> wrote:
>
> On Thu, 19 Apr 2012, Marc Dirix wrote:
>
>> Op 18-04-12 19:30, [hidden email] schreef:
>>> The shift indicates how much room each character in a string
>>> uses. Thus shift 0 =>  1 byte/character, shift 1 =>  2 bytes, shift 3 =>
>>> 4 bytes/character.
>> I suspect that the original implementor has chosen BLOB as an easy way to store both binary data as well as strings contained in pike-strings. But forgot to consider the mixed results one later get's when using selected with in-query strings.
>>
>> On the SQLite website [1] TEXT is defined as being string which is encodable in UTF-8, UTF-16BE or UTF-16LE.
>> In my understanding 1 byte/character should be encodable in those three, so I propose changing this to always use
>> TEXT instead of blob when the pike string is nonbinary (8bit).
>
> If I remember correctly, the Oracle glue uses strings wrapped in multisets to indicate BLOBS in bindings. This should be doable in the SQLite case as well.
>
> --
> Henrik Grubbström [hidden email]
> Roxen Internet Software AB

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SQLite and pike strings in bindings

Stephen R. van den Berg
H. William Welliver III wrote:
>Additionally, I???ve added support for using single-value multisets to store values as binary (BLOB) values, as found in the Oracle glue.

I just now added support for this to the pgsql driver as well, including support
to use Stdio.Buffer() objects without converting them to strings first.
--
Stephen.

Loading...