make_static_string()

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

make_static_string()

Stephen R. van den Berg
Is it allowed/supported to put a string created with make_static_string()
into an svalue and then give it to say the write() method of an arbitrary
Stdio object?
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Arne Goedeke
No that is not allowed. make_static_string() is an internal API to
create string with static storage (e.g. compile time constant strings).
If you want to pass this string to pike code you need to use
make_shared_static_string() which creates a "normal" Pike string.

What use-case do you have?

Arne

On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:

> Is it allowed/supported to put a string created with make_static_string()
> into an svalue and then give it to say the write() method of an arbitrary
> Stdio object?
> --
> Stephen.
>
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Stephen R. van den Berg
>No that is not allowed. make_static_string() is an internal API to
>create string with static storage (e.g. compile time constant strings).
>If you want to pass this string to pike code you need to use
>make_shared_static_string() which creates a "normal" Pike string.

>What use-case do you have?

Simple, actually.  I'm doing some Shuffler fixing again, and noticed
that I have to create very temporary strings out of the various
source objects and then pass them to a write() method.
Those strings are not hashed in most cases and it would be a waste to
make them a shared string only to remove them again from the shared
string pool a few microseconds later.

I.e. this is the code I am contemplating:

        struct pike_string *s =
         make_static_string(iov[i].iov_base, iov[i].iov_len, 0);
        push_string(s);
        apply (t->file_obj, "write", 1);
        pop_stack();

It seems to be a valid use case, since I provide the iov array
and the source of the strings.  The source will stay put while
the write method runs, and I'll take care of freeing the source
myself after the pop_stack().
So, yes, the only thing I need is some window dressing so that the string
can be accessed from the Pike stack, and supposedly can be freed there, which
should not touch the strings itself.
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Tobias S. Josefowitz
On Mon, Mar 15, 2021 at 5:05 PM Stephen R. van den Berg <[hidden email]> wrote:
>
> It seems to be a valid use case, since I provide the iov array
> and the source of the strings.  The source will stay put while
> the write method runs, and I'll take care of freeing the source
> myself after the pop_stack().
> So, yes, the only thing I need is some window dressing so that the string
> can be accessed from the Pike stack, and supposedly can be freed there, which
> should not touch the strings itself.

This would need so much more than just window-dressing. You're asking
for trouble. The GC might decide to run, I wouldn't necessarily expect
that to work out - but then maybe it does, another thread might
request a backtrace of your thread and non-hashed strings might leak
into the system like that, or your write call might fail (say OOM),
leaving your very own thread with a backtrace leaking the non-hashed
string into the system.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Stephen R. van den Berg
Tobias S. Josefowitz wrote:
>for trouble. The GC might decide to run, I wouldn't necessarily expect
>that to work out - but then maybe it does, another thread might

I'd think that wouldn't be a problem, since all refs are accounted for.

>request a backtrace of your thread and non-hashed strings might leak

I'm not quite sure what happens if someone copies that string for
a backtrace.  For this to work, the string should then be copied into
a new shared string.

>into the system like that, or your write call might fail (say OOM),
>leaving your very own thread with a backtrace leaking the non-hashed
>string into the system.

That has been taken care of.  My non-hashed strings are accounted for and
will be cleaned up properly, even in the midst of exceptions.
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Tobias S. Josefowitz
On Mon, Mar 15, 2021 at 5:26 PM Stephen R. van den Berg <[hidden email]> wrote:
>
> I'd think that wouldn't be a problem, since all refs are accounted for.

I'd love to share your optimism, but the GC also detects leaks and
such stuffs, calls Pike level code (destroy()/_destruct()) which would
all sit on top of your write() in a backtrace requested by those
LFUNs).

>
> >request a backtrace of your thread and non-hashed strings might leak
>
> I'm not quite sure what happens if someone copies that string for
> a backtrace.  For this to work, the string should then be copied into
> a new shared string.

Yeah except absolutely no part of Pike or user code is aware of this
requirement, and so naturally nothing does it, and that's why your
non-hashed string would leak into the system this way and cause all
sorts of mayhem later.

>
> >into the system like that, or your write call might fail (say OOM),
> >leaving your very own thread with a backtrace leaking the non-hashed
> >string into the system.
>
> That has been taken care of.  My non-hashed strings are accounted for and
> will be cleaned up properly, even in the midst of exceptions.

Nothing is taken care of, they will leak to whoever catches the
exception and cause all sorts of mayhem as Pike is not equipped to
encounter non-hashed+non-linked strings in the wild in arbitrary,
pike-code-reachable svalues.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Tobias S. Josefowitz
On Mon, Mar 15, 2021 at 5:31 PM Tobias S. Josefowitz
<[hidden email]> wrote:
>
> Nothing is taken care of, they will leak to whoever catches the
> exception and cause all sorts of mayhem as Pike is not equipped to
> encounter non-hashed+non-linked strings in the wild in arbitrary,
> pike-code-reachable svalues.

Actually, maybe you're the one catching. In which case, nevermind, it
would then indeed be taken care of in that very case, given that you
make sure to not "leak" the original error/backtrace.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Arne Goedeke
In reply to this post by Stephen R. van den Berg
Maybe a better strategy would be to introduce a new type of buffer
object (supported by the get_memory_object_* APIs) which can be used
to (cheaply) carry some arbitrary binary data. That is in principle
already supported by Stdio.File()->write().

You could also use Stdio.Buffer() for that today, but you would want to
make sure to keep the buffer objects for longer than just one read and
write pair.

Arne



On Mon, 15 Mar 2021, Stephen R. van den Berg wrote:

> Tobias S. Josefowitz wrote:
>> for trouble. The GC might decide to run, I wouldn't necessarily expect
>> that to work out - but then maybe it does, another thread might
>
> I'd think that wouldn't be a problem, since all refs are accounted for.
>
>> request a backtrace of your thread and non-hashed strings might leak
>
> I'm not quite sure what happens if someone copies that string for
> a backtrace.  For this to work, the string should then be copied into
> a new shared string.
>
>> into the system like that, or your write call might fail (say OOM),
>> leaving your very own thread with a backtrace leaking the non-hashed
>> string into the system.
>
> That has been taken care of.  My non-hashed strings are accounted for and
> will be cleaned up properly, even in the midst of exceptions.
> --
> Stephen.
>
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Stephen R. van den Berg
Arne Goedeke wrote:
>Maybe a better strategy would be to introduce a new type of buffer
>object (supported by the get_memory_object_* APIs) which can be used
>to (cheaply) carry some arbitrary binary data. That is in principle
>already supported by Stdio.File()->write().

Well, three things come to mind:

a. The Shuffler was broken so much that any practical use of it was
   impossible until I fixed it in 8.1 around a year ago, so that means
   that any backward compatibility with its API could be disregarded.

b. Using a Stdio.Buffer or memory object with the write method
   on the shuffler target will break the shuffler on targets that
   only support writing strings.

c. I don't think there is a clever way to detect if the target
   supports Stdio.Buffer/memory objects without triggering an
   exception (which probably is too much of a performance hit).
   So it would require changing the API to require support
   to write Stdio.Buffer/memory objects for shuffler targets.

Thoughts?
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Martin Nilsson (Coppermist) @ Pike (-) developers forum
Rather than change the API you could extend it to allow the capability
to be queried more cheaply perhaps?  If the target does not implement
the extended API you'd just fall back to using strings.
Reply | Threaded
Open this post in threaded view
|

Re: make_static_string()

Stephen R. van den Berg
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
>Rather than change the API you could extend it to allow the capability
>to be queried more cheaply perhaps?  If the target does not implement
>the extended API you'd just fall back to using strings.

Well, refactoring shuffler code is complicated enough as it is.

So, I took the easy way out, and simply reverted to plain shared strings again
for the degenerate case.
The highest gain for the shuffler is by ending up on as-few-as-possible
writev() calls anyway, so if it has to fallback to the write method
of an object, using the shuffler is questionable in the first place.
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Stdio.Buffer()->__set_on_write() broken

Arne Goedeke
In reply to this post by Stephen R. van den Berg
The Stdio.Buffer output callback is required even for buffers which are
not malloced. Now, the following test case

Stdio.Buffer b = Stdio.Buffer("foo");
b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") });
b->add("bar");

does not work anymore. I believe this is requird by the Stdio.File buffer
mode.

You also removed the __output variable which means that the GC cannot
discover cycles anymore.

I think it would be best to simply revert that change.

Arne
Reply | Threaded
Open this post in threaded view
|

Re: Stdio.Buffer()->__set_on_write() broken

Stephen R. van den Berg
Arne Goedeke wrote:
>The Stdio.Buffer output callback is required even for buffers which are
>not malloced. Now, the following test case

>Stdio.Buffer b = Stdio.Buffer("foo");
>b->__set_on_write(lambda(mixed ... args) { werror("New Data added.\n") });
>b->add("bar");

>does not work anymore. I believe this is requird by the Stdio.File buffer
>mode.

Indeed.  Forgot about this one.

>You also removed the __output variable which means that the GC cannot
>discover cycles anymore.

Can you (or anyone) explain how the GC uses this variable to detect
those?
I'm guessing that the GC only works with svalues, not with plain objects?
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: Stdio.Buffer()->__set_on_write() broken

Tobias S. Josefowitz
On Sat, Mar 20, 2021 at 7:40 PM Stephen R. van den Berg <[hidden email]> wrote:
>
> Can you (or anyone) explain how the GC uses this variable to detect
> those?
> I'm guessing that the GC only works with svalues, not with plain objects?

That is precisely why it needs to be mapped, otherwise the GC cycle
checks can't see it. If it is mapped, it will (presumably) be picked
up by real_gc_cycle_check_object(), among possibly other places. You
removed the map, the GC won't see it now, and that's a regression.

By the way, do you have any examples for when

commit 7def00d27ac7a31be123aaf7b63299d612d5802c
Author: Stephen R. van den Berg <[hidden email]>
Date:   Mon Mar 15 00:27:52 2021 +0100

    Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.

would cause any

"generic data corruption bugs"? As far as I could see having a quick
glance, all changes to the data and/or allocation should be guarded by
io_add_space() (and thus result in io_add_space_do_something()
throwing an error via io_ensure_unlocked()), thus even if ->len and
->offset are set to zero, no changes to data or allocation should
occur. Maybe you actually did notice corruption in Shuffler instead?
Reply | Threaded
Open this post in threaded view
|

Re: Stdio.Buffer()->__set_on_write() broken

Stephen R. van den Berg
Tobias S. Josefowitz wrote:
>By the way, do you have any examples for when

>commit 7def00d27ac7a31be123aaf7b63299d612d5802c
>    Stdio.Buffer: Do not optimise the buffer to empty if there are subbuffers.

>"generic data corruption bugs"? As far as I could see having a quick
>glance, all changes to the data and/or allocation should be guarded by
>io_add_space() (and thus result in io_add_space_do_something()
>throwing an error via io_ensure_unlocked()), thus even if ->len and
>->offset are set to zero, no changes to data or allocation should
>occur. Maybe you actually did notice corruption in Shuffler instead?

Well, retracing my steps I can probably say the following:
- Yes, you are correct, that would prevent most issues.
- The reason I actually noticed corruption was because I extended the
  code first to allow appending (only) to buffers that are large enough
  already, even if they have subbuffers.

The code was/is not robust to begin with though.
In io_add_space() the code was:

    if( io->len == io->offset )
      io->offset = io->len = 0;
    if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated &&
        (!bytes || io->len+bytes > io->len))
      return io->buffer+io->len;

Checking for locked in the second if, and not in the first seems sloppy at
best; defensive programming prescribes more robust checks here.
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: Stdio.Buffer()->__set_on_write() broken

Tobias S. Josefowitz
On Sat, Mar 20, 2021 at 10:00 PM Stephen R. van den Berg <[hidden email]> wrote:

>
> The code was/is not robust to begin with though.
> In io_add_space() the code was:
>
>     if( io->len == io->offset )
>       io->offset = io->len = 0;
>     if( !force && io->malloced && !io->locked && io->len+bytes < io->allocated &&
>         (!bytes || io->len+bytes > io->len))
>       return io->buffer+io->len;
>
> Checking for locked in the second if, and not in the first seems sloppy at
> best; defensive programming prescribes more robust checks here.

I don't mind that change, I just wanted to be sure there was no
corruption observed that could not actually be explained with this
code.