Maximum string size?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Maximum string size?

Chris Angelico
Okay, I know this is sloppy and inefficient code, but bear with me :)

I'm building up an animated GIF by constructing frames in PPM format
internally, and then doing this:

array(string) animation = allocate(frame_count);
//populate array with data
Process.run(({"ffmpeg", "-y", "-f", "image2pipe", "-i", "-", filename
+ ".gif"}),
    (["stdin": animation * ""]));

Or rather i was, until I found that a 2GB string sometimes terminates
Pike (or throws an exception, not sure) with the message "String too
long". This happened consistently after rendering frame 331, which is
the point at which the big stdin string hits 2GB. However, I have
managed to generate larger strings in testing. I've now changed to
piping the data using a secondary thread, avoiding the creation of the
temporary string, but am curious as to what this limit actually is.
And it's not a RAM size limit; "/usr/bin/time -v" shows the max RSS to
have been no more than a few gig, and I've easily exceeded that on
other occasions.

The trouble is that this is perfectly consistent when I do the full
job, but doesn't seem to replicate when I try to short-hand the
process - and the full job takes hours. (I just made a mistake and
threw away six hours of processing, and it was about a third of the
way through.) Where can I go looking to find the cause of this?

ChrisA

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maximum string size?

Arne Goedeke
Hi,

internally the string size is represented by a ptrdiff_t, so in
principle you should be able to handle more than 2GB (at least on 64 bit
systems). However, there are some APIs which do not handle large strings
correctly, e.g. Stdio.read_file() on a file which is larger than INT_MAX
will only return the first INT_MAX bytes.

Probably what you are seeing here is an API, which is somehow using
'int' internally. Or something similar. In my opinion, all those cases
are bugs and should be fixed. So if you could find out what specifically
went wrong in your case, that would be useful.

On 07/10/17 16:48, Chris Angelico wrote:

> Okay, I know this is sloppy and inefficient code, but bear with me :)
>
> I'm building up an animated GIF by constructing frames in PPM format
> internally, and then doing this:
>
> array(string) animation = allocate(frame_count);
> //populate array with data
> Process.run(({"ffmpeg", "-y", "-f", "image2pipe", "-i", "-", filename
> + ".gif"}),
>     (["stdin": animation * ""]));
>
> Or rather i was, until I found that a 2GB string sometimes terminates
> Pike (or throws an exception, not sure) with the message "String too
> long". This happened consistently after rendering frame 331, which is
> the point at which the big stdin string hits 2GB. However, I have
> managed to generate larger strings in testing. I've now changed to
> piping the data using a secondary thread, avoiding the creation of the
> temporary string, but am curious as to what this limit actually is.
> And it's not a RAM size limit; "/usr/bin/time -v" shows the max RSS to
> have been no more than a few gig, and I've easily exceeded that on
> other occasions.
>
> The trouble is that this is perfectly consistent when I do the full
> job, but doesn't seem to replicate when I try to short-hand the
> process - and the full job takes hours. (I just made a mistake and
> threw away six hours of processing, and it was about a third of the
> way through.) Where can I go looking to find the cause of this?
>
> ChrisA
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maximum string size?

Chris Angelico
On Tue, Jul 11, 2017 at 2:13 AM, Arne Goedeke <[hidden email]> wrote:

> Hi,
>
> internally the string size is represented by a ptrdiff_t, so in
> principle you should be able to handle more than 2GB (at least on 64 bit
> systems). However, there are some APIs which do not handle large strings
> correctly, e.g. Stdio.read_file() on a file which is larger than INT_MAX
> will only return the first INT_MAX bytes.
>
> Probably what you are seeing here is an API, which is somehow using
> 'int' internally. Or something similar. In my opinion, all those cases
> are bugs and should be fixed. So if you could find out what specifically
> went wrong in your case, that would be useful.

Hmm, okay. For reference, here's the entire program:

https://github.com/Rosuav/RollingShutter

I'll try to create a minimal test case that uses Process.run() and see
if I can make it fail. It's possible there's a limit inside
Shuffler.Shuffler, which Process.run uses internally to send stdin to
the subprocess.

ChrisA

Loading...