The Source for Java Technology Collaboration
Webmaster Alert: Posting to Jive Forums is currently not working. Estimated time for fix is unknown.

Home » java.net Forums » GlassFish » GlassFish

Thread: Communication trouble DAS - Node Agent

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 11 - Last Post: Nov 25, 2009 2:04 PM by: gjwiley
gjwiley

Posts: 8
Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 5:19 PM
  Click to reply to this thread Reply

Behavior:
Windows NA binds to a Linux DAS but cannot be monitored nor managed by the DAS. The NA shows up in the DAS admin console and when the NA is not running, it shows a status of stopped. When the NA is started, the DAS admin console shows an empty cell in the status column. When the NA is stopped again, the status returns to stopped on the console.

Investigations:
1) The logs are unhelpful on both DAS and NA. There is an entry on the DAS that says the NA could not be notified. But that is not repeated for subsequent starts and stops of the NA.

2) Running a TCP monitor on both the DAS and NA shows that the NAS and DAS are communicating over the NA jmx port. There is a suspicious packet, though, transmitted from the NA to the DAS. It contains a string something like 'UnicastRef2 10.97.20.23'. Now, the DAS and NA communicate on a different net, '10.97.30.0/24', and the net that contains 10.97.20.23 is only accessible by the NA, not the DAS. Thus, I suspect that this is the trouble. 10.97.20.23 is an interface on the NA but I never use it in any configuration of the NA and it is not mentioned in any of the .properties files.

3) By accident, I discovered that the 10.97.20.23 address IS contained in a serialized Java objectfile in the node agent config directory--only when the node agent is running. The file, 'admch', is created at NA start and deleted at NA stop. The file looks to me like a plain serialization of an RMI socket factory stub. The address appears to be attached to an object of type UnicastRef2. hmm. I checked similar files on working Linux NAs and in those cases the address associated with UnicastRef2 designates the correct interface for NA-DAS communication.

Question:
How can I resolve this?. The Windows NA seems to be pulling an arbitrary address from those available on the host and handing that to the DAS. I do not know what else to investigate, searches on the web for UnicastRef2 produce little and I cannot seem to find anyone else with a similar problem.

BTW, I have no idea if that address is actually the cause of the behavior, I just suspect it strongly. If it is the problem, then I don't believe this is an issue with the Windows/Linux mix itself. I think the selection of that address by the NA is independent of whether or not it is binding to a Linux DAS.

Many TIA,

-=greg

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 5:41 PM   in response to: gjwiley
  Click to reply to this thread Reply

Sorry. Neglected to mention that version on all hosts is 2.1 b60. -g

ne110415

Posts: 11
Re: Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 6:16 PM   in response to: gjwiley
  Click to reply to this thread Reply

admch is used for NA to collocated instances (on the same machine as NA is on). So for the time being you can set aside that part.

What seems to be broken is the DAS to NA communication. (Is NA to DAS fine ? For example create an instance for that NA and restart NA. If NA om restart becomes aware of the instance then NA to DAS is fine)

DAS tries to contact NA based on client-hostname property in that NA's config in DAS's domian.xml. This property is the "published" address by the NA. The das.properties in NA's config dir is used by NA to lookup DAS. And nodeagent.properties files in NA is used in configuration of jmx connector server in NA. Typically hostname in nodeagent.propeties is what you should see in client-hostname for that NA in the DAS's domain.xml

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 8:49 PM   in response to: ne110415
  Click to reply to this thread Reply

Thank you.

> admch is used for NA to collocated instances (on the
> same machine as NA is on). So for the time being you
> can set aside that part.

Good to know. I will ignore that file.

> What seems to be broken is the DAS to NA
> communication. (Is NA to DAS fine ? For example
> create an instance for that NA and restart NA. If NA
> om restart becomes aware of the instance then NA to
> DAS is fine)

Created an instance on the NA. It shows up as stopped
on the DAS console. Retarted the NA, startup OK. DAS
still shows instance status stopped and node agent
status is still a blank.

> DAS tries to contact NA based on client-hostname
> property in that NA's config in DAS's domian.xml.
> This property is the "published" address by the NA.

That property is correct. Verified that NA is reachable
by that name from the DAS.

> The das.properties in NA's config dir is used by NA
> to lookup DAS. And nodeagent.properties files in NA
> is used in configuration of jmx connector server in
> NA.

These also are correct.

> Typically hostname in nodeagent.propeties is what
> you should see in client-hostname for that NA in the
> DAS's domain.xml

Yes, they are identical.

Putting a pair of frame monitors on the interfaces shows
that communication is completing without error whether
initiated by the DAS or the NA. I've monitored TCP
conversations initiated from each side and see no
trouble.

I also monitored the default route of the DAS. Lo and
behold! the DAS is trying to send to 10.97.20.23--the
address sent by the NA in some of the traffic.

Now why would the DAS use that address?
I never used it in any configuration and it is not
mentioned in any configuration file on either the
NA or the DAS.

As I mentioned before, the packet from the NA to
the DAS that includes that address is sent during
a DAS-initiated conversation with the NA jmx port.
Oh, and the DAS initiates that conversation when
the 'node agents' link is clicked on the console.
It has the character string 'UnicastRef2' just before
it in the transmission.

Thank you again for your help.

-=greg

ne110415

Posts: 11
Re: Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 10:09 PM   in response to: gjwiley
  Click to reply to this thread Reply

Let's see what options can be availed of:

1. Can you try creating your NA with your expected IP address using the --agent* options in create-node-agent:
./asadmin create-node-agent --asdas
Usage: create-node-agent [--terse=false] [--echo=false] [--interactive=true] [--host DAS_host(Default localhost)] [--port 4848|4849] [--user DAS_user] [--passwordfile file_name] [--agentdir nodeagent_path] [--agentport port_number] [--savemasterpassword=false] [--secure=true] [--agentproperties (name=value)[:name=value]*] [nodeagent_name]

2. Or else you can add java.rmi.server.hostname property to NA and DAS VMs to force the RMI stubs of their JMX connector servers to have an IP you desire. For DAS you can set it in domain.xml. For NA in the command above (or you can try setting that in the processLauncher.xml -- beware that it is not a published interface)

3. Lastly, if that 10.97.20.23 interface is not used anywhere else, may be you can get rid of that from NA host ?

ne110415

Posts: 11
Re: Communication trouble DAS - Node Agent
Posted: Nov 24, 2009 10:12 PM   in response to: ne110415
  Click to reply to this thread Reply

For option 2: I think it's sufficient to set that property only for NA. Your NA can connect to the DAS. So let's keep DAS config as is.

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 8:58 AM   in response to: ne110415
  Click to reply to this thread Reply

Thanks again.

> 1. Can you try creating your NA with your expected IP
> address using the --agent* options in
> create-node-agent:
> ./asadmin create-node-agent --asdas
> Usage: create-node-agent [--terse=false]
> [--echo=false] [--interactive=true] [--host
> DAS_host(Default localhost)] [--port 4848|4849]
> [--user DAS_user] [--passwordfile file_name]
> [--agentdir nodeagent_path] [--agentport port_number]
> [--savemasterpassword=false] [--secure=true]
> [--agentproperties (name=value)[:name=value]*]
> [nodeagent_name]

From what you mentioned, it seems that the NA is telling the DAS, via JMX, that its (the NA's) RMI address is 10.97.20.23. Is my understanding correct? Then, as you suggest, I need to tell the NA that its RMI address is something else. But I don't see how to do that via the create-node-agent command. According to the help text, the only agentproperties supported are listenaddress and remoteclientaddress. I have tried setting each of those. Setting remoteclientaddress to the NA's address doesn't fix the problem. Setting listenaddress to the NA's address prevents the NA from starting (binding exception, searching the net it seems that others have seen this problem also). I don't think listenaddress matters to this problem anyway as by default the NA binds to all addresses. Setting remoteclientaddress only seems to configure the JMX address sent to the DAS at NA registration--and JMX from DAS to NA is already working correctly.


> 2. Or else you can add java.rmi.server.hostname
> property to NA and DAS VMs to force the RMI stubs of
> their JMX connector servers to have an IP you desire.
> For DAS you can set it in domain.xml. For NA in the
> command above (or you can try setting that in the
> processLauncher.xml -- beware that it is not a
> published interface)

I agree with what you wrote in the followup. It should be configured on the NA as it is the NA that reports the RMI address to the DAS on each DAS-initiated query for it. But, I can't figure out how to set it on the NA.

> 3. Lastly, if that 10.97.20.23 interface is not used
> anywhere else, may be you can get rid of that from NA
> host ?

Ah, pragmatism. :)

Unfortunately, not an option. The Windows boxes form a nexus of legacy services--the whole reason why we are attempting to integrate them into the otherwise Linux-based appserv infrastructure in the first place.

-=greg

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 9:44 AM   in response to: gjwiley
  Click to reply to this thread Reply

> > command above (or you can try setting that in the
> > processLauncher.xml -- beware that it is not a
> > published interface)

Some additional research tells me that this may be the only option. I cannot find any documentation that describes setting JVM options on the NA itself (other than for the purpose of syncing the instance stores). I will try this just to see if it fixes the behavior although I will probably not suggest that we push such a fix into production.

What seems to be needed is a configuration hook for either general NA system properties at the JVM level or, at a higher level, configuring an NA's JMX response to RMI address queries. I hope one of these hooks already exists but research suggests that neither does.


-=greg

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 10:15 AM   in response to: gjwiley
  Click to reply to this thread Reply

> > > command above (or you can try setting that in
> the
> > > processLauncher.xml -- beware that it is not a
> > > published interface)
>
> Some additional research tells me that this may be
> the only option. I cannot find any documentation that
> describes setting JVM options on the NA itself (other
> than for the purpose of syncing the instance stores).
> I will try this just to see if it fixes the behavior
> although I will probably not suggest that we push
> such a fix into production.
>

This solved the problem. In ./lib/processLauncher.xml I added a sysproperty element to the s1as8-nodeagent process element with a key of java.rmi.server.hostname and value of the inter-appserv communication interface address.

So, the question now is: is there a supported configuration hook to do this? If not, I will file a feature request.

-g

ne110415

Posts: 11
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 10:45 AM   in response to: gjwiley
  Click to reply to this thread Reply

NA's config is not in domain.xml. NA's lifecycle puts some limitations. Its vm has to start pull down(sync) the domain.xml from DAS. If DAS has a config for NA it may be too late to configure the java sys props as the VM is already in action. In any case, there is atleast one more way through startserv script in glassfish/nodeagents/<your-na-name>/agent/bin/startserv. That's when you start na using that script.

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 12:42 PM   in response to: ne110415
  Click to reply to this thread Reply

I will explore initialization scripts as an option but it would be a tradeoff. An operator who needs to bring the NA up manually will need to know that asadmin is not applicable for starting a node agent on the Windows hosts. Ops will love that.

Windows does have a graphical service console so we might integrate with that and disable CLI appserv operations on Win. But then we are still making unsupported changes to the GF installation so we might as well modify the launcher config.

But no worries, I'll figure out some kind of recommendation that makes sense.

Thank you again for all of your help.

-=greg

gjwiley

Posts: 8
Re: Communication trouble DAS - Node Agent
Posted: Nov 25, 2009 2:04 PM   in response to: gjwiley
  Click to reply to this thread Reply

An additional wrinkle if you're following along at home:

The java.rmi.server.hostname property also needs to be set in the JVM config of each instance. I've tested hardcoding the value in a cluster config and it works. Supposing it will work with per-instance substitution but not yet tested.




 XML java.net RSS