Bleeding-edge Pike broadly operable - code review appreciated

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
I just pushed a few changes to master that get the build working (on
64-bit Debian Linux, no other platforms tested). They're mostly small
changes and I hope they won't break anything. This actually gets "make
doc" happy again after a good while, so it would be really awesome if
I haven't just broken something unrelated :)

There's another, more significant, change that happened in the past
few months. It's not a simple fix. The default behaviour of
Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP,
changed from "only look up A records" to "look up AAAA and A records,
and return the first AAAA if found, otherwise the first A".
Unfortunately everything is built to assume that there's at most one
IP address for that host name.

The ultimate solution would be to have an asynchronous "establish
connection" call, which will do an async DNS lookup, then attempt to
connect to each IP in turn until one succeeds. That would require some
other changes, and I'm not sure how it would interact with HTTP
keep-alive, for instance. As a lesser solution, would it be reasonable
to have a method like Protocols.DNS.prefer_ipv4() to invert the return
order of DNS records?

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Tobias S. Josefowitz
On Wed, Jun 9, 2021 at 3:48 PM Chris Angelico <[hidden email]> wrote:
>
> There's another, more significant, change that happened in the past
> few months. It's not a simple fix. The default behaviour of
> Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP,
> changed from "only look up A records" to "look up AAAA and A records,
> and return the first AAAA if found, otherwise the first A".
> Unfortunately everything is built to assume that there's at most one
> IP address for that host name.

Sounds like a good enough reason to revert that change to me...
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
On Thu, Jun 10, 2021 at 1:41 AM Tobias S. Josefowitz
<[hidden email]> wrote:

>
> On Wed, Jun 9, 2021 at 3:48 PM Chris Angelico <[hidden email]> wrote:
> >
> > There's another, more significant, change that happened in the past
> > few months. It's not a simple fix. The default behaviour of
> > Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP,
> > changed from "only look up A records" to "look up AAAA and A records,
> > and return the first AAAA if found, otherwise the first A".
> > Unfortunately everything is built to assume that there's at most one
> > IP address for that host name.
>
> Sounds like a good enough reason to revert that change to me...

Needn't revert completely. If it's decided that the preference should
continue to be A, then it's just a matter of reversing the order of
returned values - a simple matter of switching two numbers.

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Stephen R. van den Berg
Chris Angelico wrote:
>> > few months. It's not a simple fix. The default behaviour of
>> > Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP,
>> > changed from "only look up A records" to "look up AAAA and A records,
>> > and return the first AAAA if found, otherwise the first A".

In accordance to the rule that if a connection over IPv6 is possible,
it should be preferred.

>> > Unfortunately everything is built to assume that there's at most one
>> > IP address for that host name.

I'd say, the way forward should be changing the connecting functions to
iterate through the IP addresses and find the first that actually connects.

>Needn't revert completely. If it's decided that the preference should
>continue to be A, then it's just a matter of reversing the order of
>returned values - a simple matter of switching two numbers.

That would be an acceptable workaround until the connection code is fixed,
I think.
--
Stephen.
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Tobias S. Josefowitz
On Wed, Jun 9, 2021 at 10:49 PM Stephen R. van den Berg <[hidden email]> wrote:

>
> I'd say, the way forward should be changing the connecting functions to
> iterate through the IP addresses and find the first that actually connects.
>
> >Needn't revert completely. If it's decided that the preference should
> >continue to be A, then it's just a matter of reversing the order of
> >returned values - a simple matter of switching two numbers.
>
> That would be an acceptable workaround until the connection code is fixed,
> I think.

Maybe I am the one missing something here, but even if "the connection
code" were to learn to try to establish connections to all IPv4+IPv6
addresses resolved for a hostname, it could not do that on top of
Protocols.DNS.async_host_to_ip, as that returns a single result.
Changing it to result in something other than a single result will
break every single user of Protocols.DNS.async_host_to_ip.

As Protocols.DNS.async_host_to_ip is thus constrained to providing a
single response, changing it to respond with an IPv6 IP preferred at
this stage of IPv6 deployment is breaking more than it fixes. I would
assume there are still much more IPv4-only than IPv6 hosts in the
world, and the amount of IPv6-only hosts will be severely limited - at
least some 6to4 translation should normally be available. Responding
with IPv6 addresses first, esp. without applying any heuristics
whatsoever to determine if a system might be IPv6-enabled, can thus
only be seen as a severe regression.

Responding with an IPv6-address if no IPv4-address can be resolved
might not be a regression, but is neither terribly consistent or
elegant, so it might be best to not do that either.

All in all, I think the solution - beyond restoring useful and
expected behaviour of Protocols.DNS.async_host_to_ip - would be to
introduce a more suitable API - if necessary - and to start using that
instead.
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
On Thu, Jun 10, 2021 at 8:21 AM Tobias S. Josefowitz
<[hidden email]> wrote:

>
> On Wed, Jun 9, 2021 at 10:49 PM Stephen R. van den Berg <[hidden email]> wrote:
> >
> > I'd say, the way forward should be changing the connecting functions to
> > iterate through the IP addresses and find the first that actually connects.
> >
> > >Needn't revert completely. If it's decided that the preference should
> > >continue to be A, then it's just a matter of reversing the order of
> > >returned values - a simple matter of switching two numbers.
> >
> > That would be an acceptable workaround until the connection code is fixed,
> > I think.
>
> Maybe I am the one missing something here, but even if "the connection
> code" were to learn to try to establish connections to all IPv4+IPv6
> addresses resolved for a hostname, it could not do that on top of
> Protocols.DNS.async_host_to_ip, as that returns a single result.

True, but if you call Protocols.HTTP.get_url(), the way that it does
its DNS lookups is an implementation detail.

> Changing it to result in something other than a single result will
> break every single user of Protocols.DNS.async_host_to_ip.
>
> As Protocols.DNS.async_host_to_ip is thus constrained to providing a
> single response, changing it to respond with an IPv6 IP preferred at
> this stage of IPv6 deployment is breaking more than it fixes. I would
> assume there are still much more IPv4-only than IPv6 hosts in the
> world, and the amount of IPv6-only hosts will be severely limited - at
> least some 6to4 translation should normally be available. Responding
> with IPv6 addresses first, esp. without applying any heuristics
> whatsoever to determine if a system might be IPv6-enabled, can thus
> only be seen as a severe regression.
>
> Responding with an IPv6-address if no IPv4-address can be resolved
> might not be a regression, but is neither terribly consistent or
> elegant, so it might be best to not do that either.
>
> All in all, I think the solution - beyond restoring useful and
> expected behaviour of Protocols.DNS.async_host_to_ip - would be to
> introduce a more suitable API - if necessary - and to start using that
> instead.

Okay. Here's a proposal:

1) Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is
preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will
result in the current behaviour, calling prefer_ipv6(0) will return to
previous behaviour. This just changes the order of responses, nothing
else.

2) Create host_to_ips() which returns (or in the case of a promise,
yields) an array with all of the results. By definition, host_to_ip()
== host_to_ips()[0] or "".

3) Change Protocols.HTTP.Query() to use async_host_to_ips(). This will
change its hostname_cache to carry arrays instead of strings - is that
used externally? I found one reference in Protocols.HTTP.Session and
one in Web.Crawler, but they're just synchronizing caches, and they
don't care what's actually stored in it.

4) Possibly change Protocols.HTTP.dns_lookup to use Protocols.DNS for
consistency?? Currently that one uses gethostbyname, so there's a
notable behavioural difference between sync_request and thread_request
(which use dns_lookup and gethostbyname) and async_request (which uses
dns_lookup_async and Protocols.DNS). Otherwise, just tweak dns_lookup
to store an array in the cache.

5) Add another phase to asynchronous HTTP connection after obtaining
DNS results: it maintains an array of IPs and attempts to connect to
them sequentially. The success callback will be the same, the failure
callback will attempt the next IP.

I'm part-way inclined to start with #4, since there's a weird
inconsistency there already. If you mix and match
sync_request/thread_request and async_request, the behaviour may
depend on which one populates the cache.

Is this worth working on?

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Martin Nilsson (Coppermist) @ Pike (-) developers forum
>
>Okay. Here's a proposal:
>
>1) Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is
>preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will
>result in the current behaviour, calling prefer_ipv6(0) will return to
>previous behaviour. This just changes the order of responses, nothing
>else.

No, add new methods for new behavior. The old function continues to
return IPv4 and the new one does IPv6+IPv4. Or have some new parameter
to control preference on a per call basis. We don't want global states
in modules.
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
In reply to this post by Chris Angelico
On Thu, Jun 10, 2021 at 11:49 AM Martin Nilsson (Coppermist) @ Pike
(-) developers forum <[hidden email]> wrote:

>
> >
> >Okay. Here's a proposal:
> >
> >1) Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is
> >preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will
> >result in the current behaviour, calling prefer_ipv6(0) will return to
> >previous behaviour. This just changes the order of responses, nothing
> >else.
>
> No, add new methods for new behavior. The old function continues to
> return IPv4 and the new one does IPv6+IPv4. Or have some new parameter
> to control preference on a per call basis. We don't want global states
> in modules.

Trouble with that is that it'd need to be carried through as part of
the API for everything that uses it. So Protocols.HTTP.Query would
need to have something to control which one it prefers, and
Protocols.HTTP.get_url would need to have something to set that
parameter, etc.

But maybe that won't matter, if the other parts of the plan are worth doing.

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
In reply to this post by Stephen R. van den Berg
On Thu, Jun 10, 2021 at 6:49 AM Stephen R. van den Berg <[hidden email]> wrote:

>
> Chris Angelico wrote:
> >> > few months. It's not a simple fix. The default behaviour of
> >> > Protocols.DNS.async_host_to_ip, which is used by Protocols.HTTP,
> >> > changed from "only look up A records" to "look up AAAA and A records,
> >> > and return the first AAAA if found, otherwise the first A".
>
> In accordance to the rule that if a connection over IPv6 is possible,
> it should be preferred.
>
> >> > Unfortunately everything is built to assume that there's at most one
> >> > IP address for that host name.
>
> I'd say, the way forward should be changing the connecting functions to
> iterate through the IP addresses and find the first that actually connects.
>

Branch: rosuav/http-multi-connect

New APIs in Protocols.DNS.async_client - host_to_ips in both callback
and Promise variants.

Change of behaviour in Protocols.HTTP.Query and the high level
functions that call on it (eg Protocols.HTTP.get_url_data and
do_async_method): All forms will now look up the name with potentially
multiple results, and will attempt connections on IPv6 before IPv4
(previously, synchronous requests would prefer IPv4 unless
unavailable, and async would prefer IPv6 unless unavailable).

Not merged into master as yet, but I will be using this on my own system.

Code review would be appreciated.

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Tobias S. Josefowitz
On Fri, Jun 11, 2021 at 1:31 PM Chris Angelico <[hidden email]> wrote:

>
> Branch: rosuav/http-multi-connect
>
> New APIs in Protocols.DNS.async_client - host_to_ips in both callback
> and Promise variants.
>
> Change of behaviour in Protocols.HTTP.Query and the high level
> functions that call on it (eg Protocols.HTTP.get_url_data and
> do_async_method): All forms will now look up the name with potentially
> multiple results, and will attempt connections on IPv6 before IPv4
> (previously, synchronous requests would prefer IPv4 unless
> unavailable, and async would prefer IPv6 unless unavailable).
>
> Not merged into master as yet, but I will be using this on my own system.
>
> Code review would be appreciated.

This looks good to me, but we still need to undo the host_to_ip()
changes of course. I also wonder if implementers of Happy Eyeballs[1]
really are just too careful, or if we ultimately would want to
implement Happy Eyeballs, too; using a "get me a connection to TARGET
please" type API.

[1] https://en.wikipedia.org/wiki/Happy_Eyeballs
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
On Thu, Jun 17, 2021 at 6:57 AM Tobias S. Josefowitz
<[hidden email]> wrote:

>
> On Fri, Jun 11, 2021 at 1:31 PM Chris Angelico <[hidden email]> wrote:
> >
> > Branch: rosuav/http-multi-connect
> >
> > New APIs in Protocols.DNS.async_client - host_to_ips in both callback
> > and Promise variants.
> >
> > Change of behaviour in Protocols.HTTP.Query and the high level
> > functions that call on it (eg Protocols.HTTP.get_url_data and
> > do_async_method): All forms will now look up the name with potentially
> > multiple results, and will attempt connections on IPv6 before IPv4
> > (previously, synchronous requests would prefer IPv4 unless
> > unavailable, and async would prefer IPv6 unless unavailable).
> >
> > Not merged into master as yet, but I will be using this on my own system.
> >
> > Code review would be appreciated.
>
> This looks good to me, but we still need to undo the host_to_ip()
> changes of course. I also wonder if implementers of Happy Eyeballs[1]
> really are just too careful, or if we ultimately would want to
> implement Happy Eyeballs, too; using a "get me a connection to TARGET
> please" type API.
>
> [1] https://en.wikipedia.org/wiki/Happy_Eyeballs
>

Should host_to_ip be put completely back how it was (IPv4 only), or
should it return IPv4 if available, IPv6 else?

Happy Eyeballs, if implemented, should probably be an option - it's a
tradeoff between faster connections and more load. Might be worth
actually pushing that one down in the stack a bit, maybe a Stdio or
Protocols function that establishes a connection and returns it?

ChrisA
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Tobias S. Josefowitz
On Wed, Jun 16, 2021 at 11:08 PM Chris Angelico <[hidden email]> wrote:
>
> Should host_to_ip be put completely back how it was (IPv4 only), or
> should it return IPv4 if available, IPv6 else?

In my opinion returning IPv6 if no IPv4 can be found would neither be
something that even new users (i.e. people calling the method after
that behaviour would have been introduced) would expect, or for that
matter, desire. I suppose the proposed utility in behaving like this
is in the case where some code blindly passing on the result to e.g.
connect() or similar would magically and/or accidentally simply keep
working with IPv6-only hostnames. I would think that that is not worth
the confusion, surprise and wholly unexpected system states (in user
programs never expecting this to happen).

> Happy Eyeballs, if implemented, should probably be an option - it's a
> tradeoff between faster connections and more load. Might be worth
> actually pushing that one down in the stack a bit, maybe a Stdio or
> Protocols function that establishes a connection and returns it?

I would really like that.
Reply | Threaded
Open this post in threaded view
|

Re: Bleeding-edge Pike broadly operable - code review appreciated

Chris Angelico
On Thu, Jun 17, 2021 at 7:49 AM Tobias S. Josefowitz
<[hidden email]> wrote:

>
> On Wed, Jun 16, 2021 at 11:08 PM Chris Angelico <[hidden email]> wrote:
> >
> > Should host_to_ip be put completely back how it was (IPv4 only), or
> > should it return IPv4 if available, IPv6 else?
>
> In my opinion returning IPv6 if no IPv4 can be found would neither be
> something that even new users (i.e. people calling the method after
> that behaviour would have been introduced) would expect, or for that
> matter, desire. I suppose the proposed utility in behaving like this
> is in the case where some code blindly passing on the result to e.g.
> connect() or similar would magically and/or accidentally simply keep
> working with IPv6-only hostnames. I would think that that is not worth
> the confusion, surprise and wholly unexpected system states (in user
> programs never expecting this to happen).

Fair enough. The change is a very simple one. My recommendation is:
Use host_to_ips (in the plural) for most purposes.

I think this is ready for master branch now. Last call for comments or
objections.

> > Happy Eyeballs, if implemented, should probably be an option - it's a
> > tradeoff between faster connections and more load. Might be worth
> > actually pushing that one down in the stack a bit, maybe a Stdio or
> > Protocols function that establishes a connection and returns it?
>
> I would really like that.

Shouldn't be too hard. I'll just do the asynchronous version, but if a
threaded version is wanted, that might be of value too.

That'll be a separate project, but I'll see about creating
Stdio.establish_connection(name_or_ips) to do this.

ChrisA