compilation handler for multi-tenant applications: update

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

compilation handler for multi-tenant applications: update

Bill Welliver-2
Hi all,

A few months ago, started a conversation about what I'll call multi-tenant
applications: that is, a single pike process running multiple applications
(perhaps including copies of a given application) with isolated program
and module spaces. The idea is to provide a similar capability to that
provided by Java's ClassLoader API.

That is, I'd like to be able to do the following:

                                pike
                +-----------------|-----------------+
                |                                   |
          thread a1...an                     thread b1 ... bn
     compilation handler a                compilation handler b
  loads classes for app1 instance 1  loads classes for app1 instance
       from location a,b                    from location a,c (perhaps)
      uses module foo.bar                  uses module foo.bar

            foo.bar              !=             foo.bar


            though it would perhaps be acceptable if

      object_program(foo.bar)    ==        object_program(foo.bar)

        applications would not be aware of each other (a condition
         of the multi-tenant contract) and thus object identity
                   would not need to be maintained.

I've been dabbling with the approach suggested at the Pike conference,
which was to use a compilation handler to provide this functionality. I've
come tantalizingly close, being able to use an overriden master to provide
versions of compile_string() and friends that automatically select the
desired compilation handler based on various criteria (such as threads in
an application). While a bit clunky, it seems to allow me to control
visibility of identifiers in a given application or thread, but there does
seem to be a limitation that for me is fatal: the programs and module
objects are cached by the master, and therefore any two applications are
not truely isolated: they share a common set of modules (and
join/dirnodes) and (though less problematic) precompiled programs.Because
programs and modules seem to be indexed by filename, I've played around
with adding a unique identifier in order to split the cache on a
per-handler basis, but this hasn't worked either (programs seem to be
loaded as desired, but modules are still problematic.)

I've attempted to add storage of these caches in the compilation handler,
but it results in extremely odd failures (for example, a given class will
be cached as a zero value in the programs mapping, which means the program
won't be found, even if it's on disk.

As a brute force attempt to prove that the idea can work, I'm thinking
about short circuiting the auto-reload functionality so that it always
reloads a given class from disk. I'm not sure that this will actually
prove beneficial, as modules would still be persistent.

As always, any thoughts or suggestions would be welcome.

Bill

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

compilation handler for multi-tenant applications: update

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
Isn't it necessary for you to implement something similar to the
compat master scheme? I.e. not only have separate compilation
handlers, but also separate master objects?

Looks like the compat master scheme uses subtyped object pointers,
i.e. Pike_N_M_master::xxx, which I doubt would work for you, but you
could instead have a global mapping somewhere where you keep track of
the master objects for your "tenants".

Note also that some caches in the real master should be possible to
keep global. E.g. fc, because it uses paths, and the objects mapping,
because it's indexed on the program instances which are different when
there is a real difference. The programs mapping also uses paths, but
a problem there is the special "/master" entry, so it'd require some
sort of wrapper object with `[], `[]= etc.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Bill Welliver-2
Yes, I think it will have to be something a bit more involved.

My original plan was to override all of the functions that are in play
here (all of the methods defined in CompilationHandler, plus a number of
others) so that they use the appropriate data.

However, as I think about this, perhaps the answer is even simpler:

The problem isn't strictly the methods themselves, it's more a matter of
making them use the right set of data. Therefore, would it not be just as
effective to implement getters/setters on the appropriate datasources:

class ResolutionEnvironment
{
   array pike_module_path = ({});
   array pike_include_path = ({});
   mapping objects;
   // etc...
}

mapping(Pike.Thread|string:ResolutionEvironment) _multitenant_threads =
(["default": ResolutionEvironment()]);

`->pike_module_path()
{
   array x;
   // do we have a special environment for this thread?
   if(x = _multitenant_threads[Thread.this_thread()])
   {
     return x->pike_module_path;
   }
   // otherwise return the global environment
   else return _multitentant_threads["default"]->pike_module_path;
}

My understanding is that the getter/setters operate at a lower level than
standard `->(), so it's impossible to avoid them being called, which is
desirable in this case.

Of course, this is all in addition to the necessary machinery to register
a given configuration with one or more threads.

Bill


On Sun, 12 Feb 2012, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

> Isn't it necessary for you to implement something similar to the
> compat master scheme? I.e. not only have separate compilation
> handlers, but also separate master objects?
>
> Looks like the compat master scheme uses subtyped object pointers,
> i.e. Pike_N_M_master::xxx, which I doubt would work for you, but you
> could instead have a global mapping somewhere where you keep track of
> the master objects for your "tenants".
>
> Note also that some caches in the real master should be possible to
> keep global. E.g. fc, because it uses paths, and the objects mapping,
> because it's indexed on the program instances which are different when
> there is a real difference. The programs mapping also uses paths, but
> a problem there is the special "/master" entry, so it'd require some
> sort of wrapper object with `[], `[]= etc.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Bill Welliver-2
It turns out that using `->symbol and some other minor magic can be used to solve all of the problems I've been concerned about. I've run some simple tests that show two different threads with independent module/program paths that seem to be (almost) completely isolated from each other. The only code they share with each other is the standard set of static modules (which is an acceptable situation that could also be changed).

I'll need to test this quite a bit, but the initial results are encouraging.

Bill

On Mar 14, 2012, at 1:01 PM, Bill Welliver wrote:

> Yes, I think it will have to be something a bit more involved.
>
> My original plan was to override all of the functions that are in play here (all of the methods defined in CompilationHandler, plus a number of others) so that they use the appropriate data.
>
> However, as I think about this, perhaps the answer is even simpler:
>
> The problem isn't strictly the methods themselves, it's more a matter of making them use the right set of data. Therefore, would it not be just as effective to implement getters/setters on the appropriate datasources:
>
> class ResolutionEnvironment
> {
>  array pike_module_path = ({});
>  array pike_include_path = ({});
>  mapping objects;
>  // etc...
> }
>
> mapping(Pike.Thread|string:ResolutionEvironment) _multitenant_threads = (["default": ResolutionEvironment()]);
>
> `->pike_module_path()
> {
>  array x;
>  // do we have a special environment for this thread?
>  if(x = _multitenant_threads[Thread.this_thread()])
>  {
>    return x->pike_module_path;
>  }
>  // otherwise return the global environment
>  else return _multitentant_threads["default"]->pike_module_path;
> }
>
> My understanding is that the getter/setters operate at a lower level than standard `->(), so it's impossible to avoid them being called, which is desirable in this case.
>
> Of course, this is all in addition to the necessary machinery to register a given configuration with one or more threads.
>
> Bill
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Bill Welliver-2
For those following along, I may have been premature in my declaration of
total victory, as the solution only seems to work with pike 7.9 (where it
seems to work with very few issues). When used with 7.8, I get lots of
errors like so:

/usr/local/pike/7.8.352/lib/modules/Protocols.pmod/HTTP.pmod/module.pmod:161:Got
placeholder object when indexing module HTTP with 'Query'. (Resolver
problem.)

The problem appears to be the programs and objects mappings in the master.
If I replace either, these problems happen. If use the existing mappings
from the current master (moved before using replace_master()), things work
properly. If I do a shallow copy, things fail, so I'm inclined to believe
that somethings holding on to those mappings, perhaps in dirnode().

Note that at this point, the error occurs before trying to use multiple
compile environments; so it doesn't seem like it could be a matter of
disjoint data for the resolver.

I haven't compared the differences between the two masters, but I know
that changes were made to the resolver, correct?

Anyhow care to venture a guess as to the source of the problem?

Bill

On Thu, 15 Mar 2012, H. William Welliver III wrote:

> It turns out that using `->symbol and some other minor magic can be
> used to solve all of the problems I've been concerned about. I've run
> some simple tests that show two different threads with independent
> module/program paths that seem to be (almost) completely isolated from
> each other. The only code they share with each other is the standard set
> of static modules (which is an acceptable situation that could also be
> changed).
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Stephen R. van den Berg
In reply to this post by Bill Welliver-2
Bill Welliver wrote:
>multi-tenant applications: that is, a single pike process running
>multiple applications (perhaps including copies of a given
>application) with isolated program and module spaces. The idea is to
>provide a similar capability to that provided by Java's ClassLoader
>API.

Interesting as such, but, what would be the real benefit of this approach
as opposed to simply starting multiple instances of Pike?
Or is this geared towards an embedded solution where starting another
Pike is difficult and/or impossible?
--
Stephen.

Being able to try has no purpose if failing is not an option.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Bill Welliver-2
Hi Stephen,

Welcome back.

Good question. Obviously starting a new process is always going to be the
most flexible/straight-forward approach. That's the approach I've been
using, for a while now, however once you have 5-10 pike instances running
in a resource constrained environment, elbow room begins to become a
problem: the overhead of a running pike instance becomes important.

So, benefits might include:

- Minimize overhead of each additional pike interpreter
- Enable easier solutions to embedded problems (as you suggest)
- Enable multiple appplications to share a single port without having the
     (perhaps simply different) complexity of running a reverse proxy

There may be other "benefits" that I'm not thinking of, but those are the
major benefits for me.

Best,

Bill

On Wed, 28 Mar 2012, Stephen R. van den Berg wrote:

> Bill Welliver wrote:
>> multi-tenant applications: that is, a single pike process running
>> multiple applications (perhaps including copies of a given
>> application) with isolated program and module spaces. The idea is to
>> provide a similar capability to that provided by Java's ClassLoader
>> API.
>
> Interesting as such, but, what would be the real benefit of this approach
> as opposed to simply starting multiple instances of Pike?
> Or is this geared towards an embedded solution where starting another
> Pike is difficult and/or impossible?
> --
> Stephen.
>
> Being able to try has no purpose if failing is not an option.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
In reply to this post by Bill Welliver-2
> The problem isn't strictly the methods themselves, it's more a
> matter of making them use the right set of data. Therefore, would it
> not be just as effective to implement getters/setters on the
> appropriate datasources:
/.../

That could be a simpler alternative, I guess. I think it'd be slightly
slower though.

> My understanding is that the getter/setters operate at a lower level than
> standard `->(), so it's impossible to avoid them being called, which is
> desirable in this case.

Yes. That's why I prefer the syntax `foo and `foo=, besides it being
shorter. I actually think it's somewhat unfortunate that the `->foo
and `->foo= syntax wasn't removed from the start, because it just
adds confusion.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: compilation handler for multi-tenant applications: update

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum
In reply to this post by Bill Welliver-2
/.../
> /usr/local/pike/7.8.352/lib/modules/Protocols.pmod/HTTP.pmod/module.pmod:161:Got
> placeholder object when indexing module HTTP with 'Query'. (Resolver
> problem.)
>
> The problem appears to be the programs and objects mappings in the master.
> If I replace either, these problems happen. If use the existing mappings
> from the current master (moved before using replace_master()), things work
> properly. If I do a shallow copy, things fail, so I'm inclined to believe
> that somethings holding on to those mappings, perhaps in dirnode().

It could also be that the old master is still called in some cases.

> I haven't compared the differences between the two masters, but I know
> that changes were made to the resolver, correct?

Spontaneously I thought there would be, but there are actually fairly
few commits to the master in 7.9 only:

  > git log --oneline 7.8..7.9 lib/master.pike.in
  eac07315 Fixed compat resolver fallback order.
  fac36c39 Runtime: Changed backtrace representation for event handlers.
  9eaaf898 Updated copyright.
  c5d9a9a6 Removed $Id$.
  080e3aa0 Added support for dynamic compile-time macros.
  0e22d157 Added RECUR_COMPILE_DEBUG to attempt to help debugging recursive resolver issues.
  91ac5642 Ensure _master_file_name is set even without -m.
  fea47d91 Fixed unbalanced use of INC/DEC_RESOLV_MSG_DEPTH() in dirnode()->low_ind()
  a63ecdbd Instantiate the fallback codecs instead of using the master directly.
  548e838f Improved unregister() to find stuff in joinnodes a bit better.
  3032b456 Deprecating pike.ida.liu.se for pike.lysator.liu.se.
  eb6f0eef master: Restored lost comment.
  1a03d133 Clean up some create():s
  0e26166a Added compatibility mode for Pike 7.8.
  6dd09dad Fixed the codec to handle the Val module values in a good way.
  7729dbc4 Improved master compatibility with Pike 7.6.
  130970d5 Improved describe_function for top level functions in modules.
  e15f7f13 Added callbacks to allow overlaying masters to read precompiled code from other sources.
  9dcabf1e Give dirnodes and joinnodes real names to improve sprintf output.

I haven't dug around, but I see nothing obvious there that could have
a bearing on this problem.
Loading...