Optimised map

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Optimised map

Martin Karlgren
Hi,

Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).

I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).

What do you all think? Good idea or did I overlook something?

Without optimisation:
map: 1.660797
array index: 1.335115
array append: 1.17917

With optimisation:
map: 0.877659
array index: 1.351158
array append: 1.189812

Test program:

int main()
{
  array base = allocate(10000000, 1);

  float gmap = gauge {
      array res = map (base, lambda(int i) { return i + 2; });
    };

  float garrayindex = gauge {
      array res = allocate(sizeof(base));
      foreach (base; int idx; int i) {
        res[i] = i + 2;
      }
    };

  float garrayappend = gauge {
      array res = ({});
      foreach (base, int i) {
        res += ({ i + 2 });
      }
    };

  werror ("map: %O\n", gmap);
  werror ("array index: %O\n", garrayindex);
  werror ("array append: %O\n", garrayappend);
}

/Marty

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Chris Angelico
On Wed, Apr 6, 2016 at 8:12 AM, Martin Karlgren <[hidden email]> wrote:
> Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
>

Crucially, you're looking at optimizing this code.

  float gmap = gauge {
      array res = map (base, lambda(int i) { return i + 2; });
    };

For comparison, Python handles this with a list comprehension, replacing this:

res = list(map(lambda i: i + 2, base))

with this:

res = [i + 2 for i in base]

This compiles to a single operation that loops over the base and
constructs a new list.

I think your optimization is a good one to do (although I can't speak
to Pike's internals to prove correctness), but what I'd like to see is
a way to not need the lambda function at all. Automap syntax is great
for simple calls like this, so I added it to your timing test:

  float gautomap = gauge {
      array res = base[*] + 2;
    };

Without the optimization:

$ bin/pike ../mapspeed.pike
automap: 0.230016378
map: 0.327757117
array index: 0.256829669
array append: 0.339447457

With optimization:

$ bin/pike ../mapspeed.pike
automap: 0.256595633
map: 0.189304826
array index: 0.24930893
array append: 0.314897908

Interestingly, your optimization actually comes out faster than
automap. I'm not surprised it's faster than array append, and even the
array index one (it's quicker to do the whole job in C), but automap
would normally be my go-to answer for stuff like this.

So yeah, I'd support this 100%, unless it can be proven to be
incorrect in some way.

ChrisA
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Optimised map

Martin Nilsson (Coppermist) @ Pike (-) developers forum
In reply to this post by Martin Karlgren
I threw your branch on the Pike-experimental Coverity stream. You
might want to have a look at CID 1358354:


*** CID 1358354:  Null pointer dereferences  (FORWARD_NULL)
/home/covscan/pike/pike-git/src/builtin.cmod: 4280 in low_automap()
4274     push_svalue(ITEM(tmpargs[e].u.array)+x);
4275           }else{
4276     push_svalue(tmpargs+e);
4277           }
4278         }
4279    
>>>     CID 1358354:  Null pointer dereferences  (FORWARD_NULL)
>>>     Comparing "reuse_ctx" to null implies that "reuse_ctx" might be null.
4280         if(reuse_ctx != NULL)
4281           apply_svalue_reuse_context (reuse_ctx);
4282         else
4283           low_automap(d+1,depth,fun,real_args,args);
4284         stack_pop_to_no_free (ITEM(ret) + x);
4285         types |= 1 << TYPEOF(ITEM(ret)[x]);
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Martin Karlgren
Well, that was a stupid mistake. Thanks.
/Marty

> On 07 Apr 2016, at 02:25 , Peter Bortas @ Pike developers forum <[hidden email]> wrote:
>
> I threw your branch on the Pike-experimental Coverity stream. You
> might want to have a look at CID 1358354:
>
>
> *** CID 1358354:  Null pointer dereferences  (FORWARD_NULL)
> /home/covscan/pike/pike-git/src/builtin.cmod: 4280 in low_automap()
> 4274     push_svalue(ITEM(tmpargs[e].u.array)+x);
> 4275           }else{
> 4276     push_svalue(tmpargs+e);
> 4277           }
> 4278         }
> 4279    
>>>>    CID 1358354:  Null pointer dereferences  (FORWARD_NULL)
>>>>    Comparing "reuse_ctx" to null implies that "reuse_ctx" might be null.
> 4280         if(reuse_ctx != NULL)
> 4281           apply_svalue_reuse_context (reuse_ctx);
> 4282         else
> 4283           low_automap(d+1,depth,fun,real_args,args);
> 4284         stack_pop_to_no_free (ITEM(ret) + x);
> 4285         types |= 1 << TYPEOF(ITEM(ret)[x]);

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
In reply to this post by Martin Karlgren
I have looked into this exact problem a while ago. One thing I noticed,
which might be relevant for your code (which I have not looked at too
closely, yet), is that in case of tail recursion frames can be replaced.
This might be something that your code needs to handle (for instance
when calling a recursive function with map).

When I attempted to improve the performance of function calls (and in
particular map/automap/filter/...) I started by refactoring the API for
handling frames and doing function calls. I came up with something like
frame_prepare(), frame_execute(), frame_return(), frame_pop(), etc.

I kept all the apply variants as fallbacks, which simply call the right
combination of these new functions. I then implemented the map
optimization similar to the one that you did. At some point I mentioned
this work on the list (or maybe also just in person) but I never pushed
it anywhere. I also never got around to finish this, but it is something
I would still like to find some spare time for. I will push my branch
now, maybe it is of interest.

Arne

On 04/06/16 00:12, Martin Karlgren wrote:

> Hi,
>
> Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
>
> I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
>
> What do you all think? Good idea or did I overlook something?
>
> Without optimisation:
> map: 1.660797
> array index: 1.335115
> array append: 1.17917
>
> With optimisation:
> map: 0.877659
> array index: 1.351158
> array append: 1.189812
>
> Test program:
>
> int main()
> {
>   array base = allocate(10000000, 1);
>
>   float gmap = gauge {
>       array res = map (base, lambda(int i) { return i + 2; });
>     };
>
>   float garrayindex = gauge {
>       array res = allocate(sizeof(base));
>       foreach (base; int idx; int i) {
>         res[i] = i + 2;
>       }
>     };
>
>   float garrayappend = gauge {
>       array res = ({});
>       foreach (base, int i) {
>         res += ({ i + 2 });
>       }
>     };
>
>   werror ("map: %O\n", gmap);
>   werror ("array index: %O\n", garrayindex);
>   werror ("array append: %O\n", garrayappend);
> }
>
> /Marty
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Martin Karlgren
Ah, very interesting. Now that you mention it, I recall seeing something about this on the list but apparently I failed to recall it before implementing something on my own… ;) Your refactoring looks much more appealing though, so I’ll definitely vote for your branch. What’s left to be sorted out?

Another idea I had, which I don’t know whether it’s feasible or not, was to populate a frame cache in the parent frame when a function is likely to be called repeatedly (e.g. inside foreach) – but let’s sort out the basic stuff first...

(I have very little spare time because of kids, but every once in a while I’ll make a “frustration hack” ;) )

/Marty

> On 07 Apr 2016, at 10:49 , Arne Goedeke <[hidden email]> wrote:
>
> I have looked into this exact problem a while ago. One thing I noticed,
> which might be relevant for your code (which I have not looked at too
> closely, yet), is that in case of tail recursion frames can be replaced.
> This might be something that your code needs to handle (for instance
> when calling a recursive function with map).
>
> When I attempted to improve the performance of function calls (and in
> particular map/automap/filter/...) I started by refactoring the API for
> handling frames and doing function calls. I came up with something like
> frame_prepare(), frame_execute(), frame_return(), frame_pop(), etc.
>
> I kept all the apply variants as fallbacks, which simply call the right
> combination of these new functions. I then implemented the map
> optimization similar to the one that you did. At some point I mentioned
> this work on the list (or maybe also just in person) but I never pushed
> it anywhere. I also never got around to finish this, but it is something
> I would still like to find some spare time for. I will push my branch
> now, maybe it is of interest.
>
> Arne
>
> On 04/06/16 00:12, Martin Karlgren wrote:
>> Hi,
>>
>> Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
>>
>> I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
>>
>> What do you all think? Good idea or did I overlook something?
>>
>> Without optimisation:
>> map: 1.660797
>> array index: 1.335115
>> array append: 1.17917
>>
>> With optimisation:
>> map: 0.877659
>> array index: 1.351158
>> array append: 1.189812
>>
>> Test program:
>>
>> int main()
>> {
>>  array base = allocate(10000000, 1);
>>
>>  float gmap = gauge {
>>      array res = map (base, lambda(int i) { return i + 2; });
>>    };
>>
>>  float garrayindex = gauge {
>>      array res = allocate(sizeof(base));
>>      foreach (base; int idx; int i) {
>>        res[i] = i + 2;
>>      }
>>    };
>>
>>  float garrayappend = gauge {
>>      array res = ({});
>>      foreach (base, int i) {
>>        res += ({ i + 2 });
>>      }
>>    };
>>
>>  werror ("map: %O\n", gmap);
>>  werror ("array index: %O\n", garrayindex);
>>  werror ("array append: %O\n", garrayappend);
>> }
>>
>> /Marty
>>
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
If you feel like frustration hacking on that branch, you are very welcome!

My plan would be something like that:

1) Rebase to current 8.1 (will do that now)
2) Fix the current state. I do not remember what the problem really was,
but I do recall that it does not pass the testsuite.
3) Cleanup
4) Optimize
5) New features. With a better API it might be feasible to implement
continuations and possibly other things. Its worth a discussion. By the
way, when is the next pike conference?

Some details:
3)
- As I mentioned in my previous email, the tail recursion optimization
creates a new frame and then replaces the previous one on the stack.
This could be improved by resetting the current frame, so that execution
resumes at the beginning of the function. I suspect that it works the
way it does currently because the previous API makes it hard to do the
call _without_ setting up a new frame.
- That new branch introduces frame->type, which did not previously
exist. It makes struct pike_frame even bigger than it already is. Not
all members of the frame struct are used for all frame types, so it
would make sense to place some of them inside of a union.

4)
- Use the new API in more places like filter and automap.
- I also really like your idea of caching the last frame. I was
thinking about something similar, except that I did not think of using
the frame for that, but instead cache it globally. Also, in many
situations it is actually not enough to cache only one frame, instead we
need 2 or more:

        foreach (foo; mixed key; mixed val) bar();

During one iteration, this calls both next() in the iterator of foo and
bar(). Its probably a good idea to experiment with that.


Arne

On 04/11/16 20:41, Martin Karlgren wrote:
> Ah, very interesting. Now that you mention it, I recall seeing something about this on the list but apparently I failed to recall it before implementing something on my own… ;) Your refactoring looks much more appealing though, so I’ll definitely vote for your branch. What’s left to be sorted out?
>
> Another idea I had, which I don’t know whether it’s feasible or not, was to populate a frame cache in the parent frame when a function is likely to be called repeatedly (e.g. inside foreach) – but let’s sort out the basic stuff first...
>
> (I have very little spare time because of kids, but every once in a while I’ll make a “frustration hack” ;) )
>
> /Marty
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
I have been able to fix a couple of bugs in the branch by now. There are
some left, which I could use some help to understand why they occur. One
is triggered by the DHT.test in the bittorrent library. When running
that test in the 'faster_calls' branch with debug enabled, the following
fatal occurs:

------
/home/el/code/rw/pike/src/gc.c:3864: GC fatal:
GC destructed parent prematurely.
**Block: 0x7f0d52594180  Type: object  Refs: 5
**Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0
saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil)
next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff
link_top/last_cycle_piece=0xffffffffffffffff]
**Parent identifier: 20
**Program id: 66099
**Object variables:
** rtt: mixed     name: state                 off:   56  value: 1
** rtt: mixed     name: ping_fails            off:   72  value: 0
** rtt: mixed     name: last_ping             off:   88  value: 1461419197
** rtt: mapping   name: pings_inflight        off:  104  value: mapping
of size 0
** rtt: mixed     name: last_response         off:  112  value: 1461419362
** rtt: mixed     name: check_node_callout    off:  128  value: array of
size 1
**  === In inherit "Node", program 66100:
**  rtt: string    name: node_id               off:   16  value:
"O\366\355\207%n\277j4fI\323\231;\332|9Vu\30"
**  rtt: string    name: address               off:   24  value: "127.0.0.1"
**  rtt: string    name: token                 off:   32  value:
"\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311"
**  rtt: mixed     name: port                  off:   40  value: 7010
**Describing program 0x20fd070 of object:
**Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0
saved=15106 frame=(nil)
**Program id: 66099, flags: 308f, parent id: 66097
**Location:
/home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601
**Object got a parent.
*******************
Pike was in GC stage 400 when this fatal occurred.
Backtrace at time of fatal:

Subprocess died of signal SIGABRT.
------

So apparently the parent object was destructed before its child. Does
anyone have any ideas what parts of the function call part in
interpret.c might be incorrect? Is the PARENT_CLONE case wrong?

Thanks

Arne

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
This fatal is actually also happening in usual pike 8.1, so i guess its
unrelated to the new branch. The fatal happens during cleanup on exit,
should that not make sure that things are cleaned up in the correct
order? Or should the check be disabled on the final gc run when all
objects are freed?

arne

On 04/23/16 16:10, Arne Goedeke wrote:

> I have been able to fix a couple of bugs in the branch by now. There are
> some left, which I could use some help to understand why they occur. One
> is triggered by the DHT.test in the bittorrent library. When running
> that test in the 'faster_calls' branch with debug enabled, the following
> fatal occurs:
>
> ------
> /home/el/code/rw/pike/src/gc.c:3864: GC fatal:
> GC destructed parent prematurely.
> **Block: 0x7f0d52594180  Type: object  Refs: 5
> **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0
> saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil)
> next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff
> link_top/last_cycle_piece=0xffffffffffffffff]
> **Parent identifier: 20
> **Program id: 66099
> **Object variables:
> ** rtt: mixed     name: state                 off:   56  value: 1
> ** rtt: mixed     name: ping_fails            off:   72  value: 0
> ** rtt: mixed     name: last_ping             off:   88  value: 1461419197
> ** rtt: mapping   name: pings_inflight        off:  104  value: mapping
> of size 0
> ** rtt: mixed     name: last_response         off:  112  value: 1461419362
> ** rtt: mixed     name: check_node_callout    off:  128  value: array of
> size 1
> **  === In inherit "Node", program 66100:
> **  rtt: string    name: node_id               off:   16  value:
> "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30"
> **  rtt: string    name: address               off:   24  value: "127.0.0.1"
> **  rtt: string    name: token                 off:   32  value:
> "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311"
> **  rtt: mixed     name: port                  off:   40  value: 7010
> **Describing program 0x20fd070 of object:
> **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0
> saved=15106 frame=(nil)
> **Program id: 66099, flags: 308f, parent id: 66097
> **Location:
> /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601
> **Object got a parent.
> *******************
> Pike was in GC stage 400 when this fatal occurred.
> Backtrace at time of fatal:
>
> Subprocess died of signal SIGABRT.
> ------
>
> So apparently the parent object was destructed before its child. Does
> anyone have any ideas what parts of the function call part in
> interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
>
> Thanks
>
> Arne
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Martin Karlgren
In reply to this post by Arne Goedeke
Hi,

Great work!

A thought on the API – could frame_init perhaps be removed from the “public” API and have frame_setup_from_* perform that internally instead (and return the new frame)? That way a frame cache could be introduced transparently later on, by having frame_setup_from_* perform cache lookups based on the object/function offset etc. instead of creating a new frame.

Yes, caching multiple frame entries sounds desirable. If the cache is placed/referenced by the parent frame, even a multi-level cache should be possible:

void fie(int i) { }
void bar(int i) { fie(i + 1); }
void foo() { foreach(enumerate(1000000), int i) bar(i); }

Here, the F_FOREACH handler could set a PIKE_FRAME_DO_CACHE flag in the current frame, which would enable caching of the “bar” call. But the flag could also be propagated to the child frame, so that the “fie” call is cached in the frame handling the execution of “bar”. When execution leaves the scope that initiated the caching, the caches should be cleaned up recursively. I’m not sure whether this would work from a practical standpoint, but it might be worth considering. Thoughts?

/Marty

> On 12 Apr 2016, at 18:48 , Arne Goedeke <[hidden email]> wrote:
>
>
> 4)
> - Use the new API in more places like filter and automap.
> - I also really like your idea of caching the last frame. I was
> thinking about something similar, except that I did not think of using
> the frame for that, but instead cache it globally. Also, in many
> situations it is actually not enough to cache only one frame, instead we
> need 2 or more:
>
> foreach (foo; mixed key; mixed val) bar();
>
> During one iteration, this calls both next() in the iterator of foo and
> bar(). Its probably a good idea to experiment with that.
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
Hi Martin,

sorry for the late response. Some thoughts:

1) I think the current API is not perfect. But aside from the implicit
caching idea there is some places where caching happens explicitly. In
those cases it would still be necessary to have an API which uses a
given frame and does the setup without allocation. Those cases also fall
into different cases, in some it is known that the same function will be
called (f_map, f_filter, etc) and those where it could be a different
function (tail recursion, etc). In the latter the current frame can only
be reused when calling into pike code, which adds another complication.
2) I think the pike_frame struct should be smaller. My patch currently
also added another pointer, which is not really needed. I did this
mostly for clarity and was planning to clean it up later. It might also
help to seperate the different frame types into several structs and
place them into a union. This would also take care of places in the code
where uninitialized values are used from the frames, mainly in the
backtrace handling code. The different pointers to the stack could also
be implemented using an offset from the save_sp.

Maybe a good idea would be to first add the caching using a new API.
Also, I think interpreter struct might be a good place to do the
caching. This avoids increasing the size of frames and also make the
amount of cached frames a bit more predictable.

I would also be glad to have some feedback from people who are more
familiar with the interpreter.

Arne

On 05/02/16 22:44, Martin Karlgren wrote:

> Hi,
>
> Great work!
>
> A thought on the API – could frame_init perhaps be removed from the “public” API and have frame_setup_from_* perform that internally instead (and return the new frame)? That way a frame cache could be introduced transparently later on, by having frame_setup_from_* perform cache lookups based on the object/function offset etc. instead of creating a new frame.
>
> Yes, caching multiple frame entries sounds desirable. If the cache is placed/referenced by the parent frame, even a multi-level cache should be possible:
>
> void fie(int i) { }
> void bar(int i) { fie(i + 1); }
> void foo() { foreach(enumerate(1000000), int i) bar(i); }
>
> Here, the F_FOREACH handler could set a PIKE_FRAME_DO_CACHE flag in the current frame, which would enable caching of the “bar” call. But the flag could also be propagated to the child frame, so that the “fie” call is cached in the frame handling the execution of “bar”. When execution leaves the scope that initiated the caching, the caches should be cleaned up recursively. I’m not sure whether this would work from a practical standpoint, but it might be worth considering. Thoughts?
>
> /Marty
>
>> On 12 Apr 2016, at 18:48 , Arne Goedeke <[hidden email]> wrote:
>>
>>
>> 4)
>> - Use the new API in more places like filter and automap.
>> - I also really like your idea of caching the last frame. I was
>> thinking about something similar, except that I did not think of using
>> the frame for that, but instead cache it globally. Also, in many
>> situations it is actually not enough to cache only one frame, instead we
>> need 2 or more:
>>
>> foreach (foo; mixed key; mixed val) bar();
>>
>> During one iteration, this calls both next() in the iterator of foo and
>> bar(). Its probably a good idea to experiment with that.
>>
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Martin Karlgren
In reply to this post by Arne Goedeke
Hi Arne,

I'm curious about what you think it would take to merge the faster_calls branch to 8.1? Do you know about any outstanding issues or is it mostly about rebasing it to 8.1 head and running the testsuite?

Thanks,
Marty

> 24 apr. 2016 kl. 16:10 skrev Arne Goedeke <[hidden email]>:
>
> This fatal is actually also happening in usual pike 8.1, so i guess its
> unrelated to the new branch. The fatal happens during cleanup on exit,
> should that not make sure that things are cleaned up in the correct
> order? Or should the check be disabled on the final gc run when all
> objects are freed?
>
> arne
>
>> On 04/23/16 16:10, Arne Goedeke wrote:
>> I have been able to fix a couple of bugs in the branch by now. There are
>> some left, which I could use some help to understand why they occur. One
>> is triggered by the DHT.test in the bittorrent library. When running
>> that test in the 'faster_calls' branch with debug enabled, the following
>> fatal occurs:
>>
>> ------
>> /home/el/code/rw/pike/src/gc.c:3864: GC fatal:
>> GC destructed parent prematurely.
>> **Block: 0x7f0d52594180  Type: object  Refs: 5
>> **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0
>> saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil)
>> next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff
>> link_top/last_cycle_piece=0xffffffffffffffff]
>> **Parent identifier: 20
>> **Program id: 66099
>> **Object variables:
>> ** rtt: mixed     name: state                 off:   56  value: 1
>> ** rtt: mixed     name: ping_fails            off:   72  value: 0
>> ** rtt: mixed     name: last_ping             off:   88  value: 1461419197
>> ** rtt: mapping   name: pings_inflight        off:  104  value: mapping
>> of size 0
>> ** rtt: mixed     name: last_response         off:  112  value: 1461419362
>> ** rtt: mixed     name: check_node_callout    off:  128  value: array of
>> size 1
>> **  === In inherit "Node", program 66100:
>> **  rtt: string    name: node_id               off:   16  value:
>> "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30"
>> **  rtt: string    name: address               off:   24  value: "127.0.0.1"
>> **  rtt: string    name: token                 off:   32  value:
>> "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311"
>> **  rtt: mixed     name: port                  off:   40  value: 7010
>> **Describing program 0x20fd070 of object:
>> **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0
>> saved=15106 frame=(nil)
>> **Program id: 66099, flags: 308f, parent id: 66097
>> **Location:
>> /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601
>> **Object got a parent.
>> *******************
>> Pike was in GC stage 400 when this fatal occurred.
>> Backtrace at time of fatal:
>>
>> Subprocess died of signal SIGABRT.
>> ------
>>
>> So apparently the parent object was destructed before its child. Does
>> anyone have any ideas what parts of the function call part in
>> interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
>>
>> Thanks
>>
>> Arne
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Optimised map

Arne Goedeke
Hi Marty,

I have recently looked into the branch again. I am not sure if there
were any bugs when we last left it. The main issue I currently see
with the code (and that is my fault) is that it allocates a frame for
efun calls. This does not happen in pike currently and I suppose we want
to keep it that way. The current API in the new branch unfortunately
requires this for keeping the information about the frame type.

I have recently tried to rebase that branch onto current pike 8.1 in
order to attempt another refactoring to address that particular problem.
However, that did not go very well in the end because I encountered huge
merge conflicts and gave up. I am a bit undecided at the moment how we should
go forward with this.

My current feeling is that the new API is broken and should be replaced
by something which:

1) optimizes lookup and frame allocation for map/filter/automap
2) does not allocate a frame for efun calls
3) allows easy integration of frame caching
...
n) continuations?

I have written down some notes last time I looked at this, but its all
very early stage right now.

On the other hand, maybe what we want to achieve for 8.1 is only the
first part. Maybe we can come up with some kind of intermediate version
which does that?

Arne

On Mon, 9 Jan 2017, Martin Karlgren wrote:

> Hi Arne,
>
> I'm curious about what you think it would take to merge the faster_calls branch to 8.1? Do you know about any outstanding issues or is it mostly about rebasing it to 8.1 head and running the testsuite?
>
> Thanks,
> Marty
>
>> 24 apr. 2016 kl. 16:10 skrev Arne Goedeke <[hidden email]>:
>>
>> This fatal is actually also happening in usual pike 8.1, so i guess its
>> unrelated to the new branch. The fatal happens during cleanup on exit,
>> should that not make sure that things are cleaned up in the correct
>> order? Or should the check be disabled on the final gc run when all
>> objects are freed?
>>
>> arne
>>
>>> On 04/23/16 16:10, Arne Goedeke wrote:
>>> I have been able to fix a couple of bugs in the branch by now. There are
>>> some left, which I could use some help to understand why they occur. One
>>> is triggered by the DHT.test in the bittorrent library. When running
>>> that test in the 'faster_calls' branch with debug enabled, the following
>>> fatal occurs:
>>>
>>> ------
>>> /home/el/code/rw/pike/src/gc.c:3864: GC fatal:
>>> GC destructed parent prematurely.
>>> **Block: 0x7f0d52594180  Type: object  Refs: 5
>>> **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0
>>> saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil)
>>> next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff
>>> link_top/last_cycle_piece=0xffffffffffffffff]
>>> **Parent identifier: 20
>>> **Program id: 66099
>>> **Object variables:
>>> ** rtt: mixed     name: state                 off:   56  value: 1
>>> ** rtt: mixed     name: ping_fails            off:   72  value: 0
>>> ** rtt: mixed     name: last_ping             off:   88  value: 1461419197
>>> ** rtt: mapping   name: pings_inflight        off:  104  value: mapping
>>> of size 0
>>> ** rtt: mixed     name: last_response         off:  112  value: 1461419362
>>> ** rtt: mixed     name: check_node_callout    off:  128  value: array of
>>> size 1
>>> **  === In inherit "Node", program 66100:
>>> **  rtt: string    name: node_id               off:   16  value:
>>> "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30"
>>> **  rtt: string    name: address               off:   24  value: "127.0.0.1"
>>> **  rtt: string    name: token                 off:   32  value:
>>> "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311"
>>> **  rtt: mixed     name: port                  off:   40  value: 7010
>>> **Describing program 0x20fd070 of object:
>>> **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0
>>> saved=15106 frame=(nil)
>>> **Program id: 66099, flags: 308f, parent id: 66097
>>> **Location:
>>> /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601
>>> **Object got a parent.
>>> *******************
>>> Pike was in GC stage 400 when this fatal occurred.
>>> Backtrace at time of fatal:
>>>
>>> Subprocess died of signal SIGABRT.
>>> ------
>>>
>>> So apparently the parent object was destructed before its child. Does
>>> anyone have any ideas what parts of the function call part in
>>> interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
>>>
>>> Thanks
>>>
>>> Arne
>>>
>>
>
>
>
Loading...