|
Replies:
44
-
Last Post:
Apr 10, 2009 8:05 AM
by: rwillie6
|
|
|
|
|
|
|
Debugging Glassfish CPU usage
Posted:
Mar 23, 2009 5:53 PM
|
|
|
I'm trying to determine the cause of elevated CPU and Interrupt / Context-switch activity on our application server that occurs at non-peak traffic times and at times causes the application server to become non-responsive. Munin graphs of the CPU usage and Interrupts / Context-Switches are attached.
These spikes are odd because they occur during non-peak times and do not resolve themselves. So far, the only solution I have found is to restart glassfish, which usually makes the problem go away for 8-24 hours. Sometimes, these spikes do not affect load times, but sometimes the spike is more severe and the application server becomes non-responsive.
I have confirmed during these times that there are no other processes running that are using a significant amount of CPU; i.e. the CPU spikes are glassfish CPU spikes. IO activity is no higher than usual during there periods. Munin memory graphs showed no differences between problematic and non-problematic times.
The app is J2EE using JPA + Hibernate, no EJBs.
I'm looking for ideas / advice on tracking down the root cause. Thanks!
System: - x-large high-cpu Amazon EC2 instance (7.5GB of memory, 8 cpus, 20 amazon "compute units") - ubuntu 8.04, 64-bit - dedicated to glassfish
Notable JVM options: <jvm-options>-server</jvm-options> <jvm-options>-Xmx2500m</jvm-options> <jvm-options>-Xms2500m</jvm-options> <jvm-options>-Xmn1000m</jvm-options> <jvm-options>-Xss128k</jvm-options> <jvm-options>-XX:+AggressiveOpts</jvm-options> <jvm-options>-XX:+AggressiveHeap</jvm-options> <jvm-options>-XX:+DisableExplicitGC</jvm-options> <jvm-options>-XX:+UseParallelGC</jvm-options> <jvm-options>-XX:+UseParallelOldGC</jvm-options> <jvm-options>-XX:ParallelGCThreads=10</jvm-options> <jvm-options>-Dcom.sun.enterprise.server.ss.ASQuickStartup=false</jvm-options> <jvm-options>-XX:MaxPermSize=192m</jvm-options> <jvm-options>-XX:NewRatio=2</jvm-options>
Notable Glassfish Settings: <http-listener acceptor-threads="5" address="0.0.0.0" blocking-enabled="false" default-virtual-server="server" enabled="true" family="inet" id="http-listener-1" port="80" security-enabled="false" server-name="" xpowered-by="false"><property name="compression" value="on"/></http-listener> <request-processing header-buffer-length-in-bytes="8192" initial-thread-count="50" request-timeout-in-seconds="60" thread-count="250" thread-increment="10"/> <keep-alive max-connections="256" thread-count="5" timeout-in-seconds="30"/> <connection-pool max-pending-count="4096" queue-size-in-bytes="4096" receive-buffer-size-in-bytes="4096" send-buffer-size-in-bytes="8192"/> <http-file-cache file-caching-enabled="true" file-transmission-enabled="false" globally-enabled="true" hash-init-size="0" max-age-in-seconds="60" max-files-count="1024" medium-file-size-limit-in-bytes="537600" medium-file-space-in-bytes="10485760" small-file-size-limit-in-bytes="2048" small-file-space-in-bytes="1048576"/>
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 23, 2009 7:32 PM
in response to: rwillie6
|
|
|
Thanks for all the data. One thing I'd like to suggest is getting the Java Thread Dump (send a kill -3 to JVM process) in order to get the same (it goes to domain/logs/jvm.log), when you experience the CPU spikes.
The number of Grizzly threads seem to be slightly higher than usual. With 8 cores, your max number of Grizzly threads go to 250, which is rather high. That's just an observation.
It looks to me that you can consider sending "-d64" instead of "-server".
Also, what is the average time per HTTP request that is spent in the database tier?
Some other data like pfiles would help too. But a number of stack dumps are required at a minimum.
Regards, Kedar
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 3:26 AM
in response to: km
|
|
|
|
|
Hi Kedar,
I've taken a couple of thread dumps with jstack, are those the same as would be created with a kill -3? If so, see attached.
I've knocked the threads down to 200, would that still be rather high? Also, I noticed an error in the jvm-options above; all are correct except we're running -Xmx3000m and -Xms3000m, instead of 2500 as I previously listed.
Also, I thought that -d64 only applied to Solaris. If I'm incorrect, what are the benefits of -d64 over -server?
As for average time in the database tier, we don't calculate that, but from watching mytop (filtering inactive) while a spike was occurring I would guesstimate it at or under 1s.
I'm unfamiliar with pfiles, could you explain a bit more there?
Thanks!
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 3:07 AM
in response to: rwillie6
|
|
|
Salut,
glassfish@javadesktop.org wrote: > I'm trying to determine the cause of elevated CPU and Interrupt / Context-switch activity on our application server that occurs at non-peak traffic times and at times causes the application server to become non-responsive. Munin graphs of the CPU usage and Interrupts / Context-Switches are attached. > > These spikes are odd because they occur during non-peak times and do not resolve themselves. So far, the only solution I have found is to restart glassfish, which usually makes the problem go away for 8-24 hours. Sometimes, these spikes do not affect load times, but sometimes the spike is more severe and the application server becomes non-responsive. > > I have confirmed during these times that there are no other processes running that are using a significant amount of CPU; i.e. the CPU spikes are glassfish CPU spikes. IO activity is no higher than usual during there periods. Munin memory graphs showed no differences between problematic and non-problematic times. > > The app is J2EE using JPA + Hibernate, no EJBs.
any web related call? It looks like a possible JDK issue IMO (which JDK are you using?). When it peak, can you grab a thread dump? I think the NIO Selector might get into a spinning state...we are currently testing a fix for JDK 7 b51 and if that work it will be backported to 6 urX. Before I share a patch, I just want to make sure this is the issue.
Thanks
-- Jeanfrancois
> > I'm looking for ideas / advice on tracking down the root cause. Thanks! > > System: > - x-large high-cpu Amazon EC2 instance (7.5GB of memory, 8 cpus, 20 amazon "compute units") > - ubuntu 8.04, 64-bit > - dedicated to glassfish > > Notable JVM options: > <jvm-options>-server</jvm-options> > <jvm-options>-Xmx2500m</jvm-options> > <jvm-options>-Xms2500m</jvm-options> > <jvm-options>-Xmn1000m</jvm-options> > <jvm-options>-Xss128k</jvm-options> > <jvm-options>-XX:+AggressiveOpts</jvm-options> > <jvm-options>-XX:+AggressiveHeap</jvm-options> > <jvm-options>-XX:+DisableExplicitGC</jvm-options> > <jvm-options>-XX:+UseParallelGC</jvm-options> > <jvm-options>-XX:+UseParallelOldGC</jvm-options> > <jvm-options>-XX:ParallelGCThreads=10</jvm-options> > <jvm-options>-Dcom.sun.enterprise.server.ss.ASQuickStartup=false</jvm-options> > <jvm-options>-XX:MaxPermSize=192m</jvm-options> > <jvm-options>-XX:NewRatio=2</jvm-options> > > Notable Glassfish Settings: > <http-listener acceptor-threads="5" address="0.0.0.0" blocking-enabled="false" default-virtual-server="server" enabled="true" family="inet" id="http-listener-1" port="80" security-enabled="false" server-name="" xpowered-by="false"><property name="compression" value="on"/></http-listener> > <request-processing header-buffer-length-in-bytes="8192" initial-thread-count="50" request-timeout-in-seconds="60" thread-count="250" thread-increment="10"/> > <keep-alive max-connections="256" thread-count="5" timeout-in-seconds="30"/> > <connection-pool max-pending-count="4096" queue-size-in-bytes="4096" receive-buffer-size-in-bytes="4096" send-buffer-size-in-bytes="8192"/> > <http-file-cache file-caching-enabled="true" file-transmission-enabled="false" globally-enabled="true" hash-init-size="0" max-age-in-seconds="60" max-files-count="1024" medium-file-size-limit-in-bytes="537600" medium-file-space-in-bytes="10485760" small-file-size-limit-in-bytes="2048" small-file-space-in-bytes="1048576"/> > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=338592 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 3:37 AM
in response to: Jeanfrancois Ar...
|
|
|
Hi Jeanfrancois,
Could you clarify what you mean by "any web related call?"? In case it helps, we have anywhere from 500 - 900 users using the site during the "off-peak" times I mentioned before, compared to our "peak" times which range can reach 1600 concurrent users.
The Java version on the server is 1.6.0_07-b06. Though, it should be noted that our team runs Apple computers, so the development environment Java version is 1.5.0_16-b06-284. Even though the server is 64-bit, we are running the 32-bit jvm in mixed mode on the recommendation of Scott Oaks, who advised me when configuring glassfish last fall.
For thread dumps, see my reply to Kedar, above. I attached two jstack dumps, one from yesterday and one that just occurred. I have another from 2 days ago, but it was too large a file size to attach.
Thanks!
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 3:46 AM
in response to: rwillie6
|
|
|
Salut,
glassfish@javadesktop.org wrote: > Hi Jeanfrancois, > > Could you clarify what you mean by "any web related call?"? In case it helps, we have anywhere from 500 - 900 users using the site during the "off-peak" times I mentioned before, compared to our "peak" times which range can reach 1600 concurrent users. > > The Java version on the server is 1.6.0_07-b06. Though, it should be noted that our team runs Apple computers, so the development environment Java version is 1.5.0_16-b06-284. Even though the server is 64-bit, we are running the 32-bit jvm in mixed mode on the recommendation of Scott Oaks, who advised me when configuring glassfish last fall. > > For thread dumps, see my reply to Kedar, above. I attached two jstack dumps, one from yesterday and one that just occurred. I have another from 2 days ago, but it was too large a file size to attach. >
I've missed those. The issue might be
> "httpWorkerThread-80-197" daemon prio=10 tid=0x1d0a5000 nid=0xe0e waiting for monitor entry [0x04f72000..0x04f72dc0] > java.lang.Thread.State: BLOCKED (on object monitor) > at com.sun.enterprise.resource.AbstractResourcePool.transactionCompleted(AbstractResourcePool.java:1176) > - waiting to lock <0x3bf7c598> (a com.sun.enterprise.resource.SJSASResourcePool) > at com.sun.enterprise.resource.PoolManagerImpl.transactionCompleted(PoolManagerImpl.java:393) > at com.sun.enterprise.resource.PoolManagerImpl$SynchronizationListener.afterCompletion(PoolManagerImpl.java:540) > at com.sun.enterprise.distributedtx.J2EETransaction.commit(J2EETransaction.java:491) > at com.sun.enterprise.distributedtx.J2EETransactionManagerOpt.commit(J2EETransactionManagerOpt.java:371) > at com.sun.enterprise.distributedtx.UserTransactionImpl.commit(UserTransactionImpl.java:197) > at org.j2free.jpa.Controller.endTransaction(Controller.java:158) > at org.j2free.servlet.filter.InvokerFilter.doFilter(InvokerFilter.java:151) > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) > at org.j2free.servlet.filter.ErrorLogFilter.doFilter(ErrorLogFilter.java:103) > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) > at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:288) > at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271) > at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202) > at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) > at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) > at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
but I also see in the second log:
> "httpWorkerThread-80-229" daemon prio=10 tid=0x08eccc00 nid=0x5833 runnable [0x35b59000..0x35b5b140] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > - locked <0xbad6e370> (a java.io.BufferedInputStream) > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2452) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2906) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2895) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3438) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554) > - locked <0xbad6c648> (a java.lang.Object) > at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761) > at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1912) > - locked <0xbad6c648> (a java.lang.Object) > at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:186) > at org.hibernate.loader.Loader.getResultSet(Loader.java:1787) > at org.hibernate.loader.Loader.doQuery(Loader.java:674) > at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236) > at org.hibernate.loader.Loader.doList(Loader.java:2213) > at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104) > at org.hibernate.loader.Loader.list(Loader.java:2099) > at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:378) > at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:338) > at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:172) > at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1121) > at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79) > at org.hibernate.ejb.QueryImpl.getSingleResult(QueryImpl.java:81) > at org.j2free.jpa.Controller.query(Controller.java:653) > at org.j2free.jpa.Controller.query(Controller.java:661) > at com.tuneshero.servlet.ajax.FlexRequests.doGet(FlexRequests.java:309) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:718) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:831)
Jagadish, do you know if this could be the issue or not? But for sure all Grizzly WorkerThread may gets locked by such call.
A+
-- Jeanfrancois
> Thanks! > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=338656 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 4:01 AM
in response to: Jeanfrancois Ar...
|
|
|
@rwillie6 : can you post your connection pool configuration ?
Also, you can set connection-validation to "table" instead of "auto-commit" and let us know the results. http://blogs.sun.com/JagadishPrasath/entry/connection_validation_in_glassfish_jdbc
Thanks, -Jagadish
On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: > Salut, > > glassfish@javadesktop.org wrote: > > Hi Jeanfrancois, > > > > Could you clarify what you mean by "any web related call?"? In case it helps, we have anywhere from 500 - 900 users using the site during the "off-peak" times I mentioned before, compared to our "peak" times which range can reach 1600 concurrent users. > > > > The Java version on the server is 1.6.0_07-b06. Though, it should be noted that our team runs Apple computers, so the development environment Java version is 1.5.0_16-b06-284. Even though the server is 64-bit, we are running the 32-bit jvm in mixed mode on the recommendation of Scott Oaks, who advised me when configuring glassfish last fall. > > > > For thread dumps, see my reply to Kedar, above. I attached two jstack dumps, one from yesterday and one that just occurred. I have another from 2 days ago, but it was too large a file size to attach. > > > > I've missed those. The issue might be > > > "httpWorkerThread-80-197" daemon prio=10 tid=0x1d0a5000 nid=0xe0e waiting for monitor entry [0x04f72000..0x04f72dc0] > > java.lang.Thread.State: BLOCKED (on object monitor) > > at com.sun.enterprise.resource.AbstractResourcePool.transactionCompleted(AbstractResourcePool.java:1176) > > - waiting to lock <0x3bf7c598> (a com.sun.enterprise.resource.SJSASResourcePool) > > at com.sun.enterprise.resource.PoolManagerImpl.transactionCompleted(PoolManagerImpl.java:393) > > at com.sun.enterprise.resource.PoolManagerImpl$SynchronizationListener.afterCompletion(PoolManagerImpl.java:540) > > at com.sun.enterprise.distributedtx.J2EETransaction.commit(J2EETransaction.java:491) > > at com.sun.enterprise.distributedtx.J2EETransactionManagerOpt.commit(J2EETransactionManagerOpt.java:371) > > at com.sun.enterprise.distributedtx.UserTransactionImpl.commit(UserTransactionImpl.java:197) > > at org.j2free.jpa.Controller.endTransaction(Controller.java:158) > > at org.j2free.servlet.filter.InvokerFilter.doFilter(InvokerFilter.java:151) > > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) > > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) > > at org.j2free.servlet.filter.ErrorLogFilter.doFilter(ErrorLogFilter.java:103) > > at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) > > at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:198) > > at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:288) > > at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:271) > > at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:202) > > at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:632) > > at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:577) > > at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) > > but I also see in the second log: > > > "httpWorkerThread-80-229" daemon prio=10 tid=0x08eccc00 nid=0x5833 runnable [0x35b59000..0x35b5b140] > > java.lang.Thread.State: RUNNABLE > > at java.net.SocketInputStream.socketRead0(Native Method) > > at java.net.SocketInputStream.read(SocketInputStream.java:129) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > - locked <0xbad6e370> (a java.io.BufferedInputStream) > > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2452) > > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2906) > > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2895) > > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3438) > > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1951) > > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2101) > > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2554) > > - locked <0xbad6c648> (a java.lang.Object) > > at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1761) > > at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1912) > > - locked <0xbad6c648> (a java.lang.Object) > > at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:186) > > at org.hibernate.loader.Loader.getResultSet(Loader.java:1787) > > at org.hibernate.loader.Loader.doQuery(Loader.java:674) > > at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236) > > at org.hibernate.loader.Loader.doList(Loader.java:2213) > > at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104) > > at org.hibernate.loader.Loader.list(Loader.java:2099) > > at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:378) > > at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:338) > > at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:172) > > at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1121) > > at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79) > > at org.hibernate.ejb.QueryImpl.getSingleResult(QueryImpl.java:81) > > at org.j2free.jpa.Controller.query(Controller.java:653) > > at org.j2free.jpa.Controller.query(Controller.java:661) > > at com.tuneshero.servlet.ajax.FlexRequests.doGet(FlexRequests.java:309) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:718) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:831) > > Jagadish, do you know if this could be the issue or not? But for sure > all Grizzly WorkerThread may gets locked by such call. > > A+ > > -- Jeanfrancois > > > > Thanks! > > [Message sent by forum member 'rwillie6' (rwillie6)] > > > > http://forums.java.net/jive/thread.jspa?messageID=338656 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > > For additional commands, e-mail: users-help@glassfish.dev.java.net > >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 11:44 AM
in response to: Jagadish Prasat...
|
|
|
I'm not sure which connection pool you were referring to, so both are included below. I changed auto-commit to table after pasting the config below and will report any results.
<connection-pool max-pending-count="4096" queue-size-in-bytes="4096" receive-buffer-size-in-bytes="4096" send-buffer-size-in-bytes="8192"/>
And the JDBC pool: <jdbc-connection-pool allow-non-component-callers="false" associate-with-thread="false" connection-creation-retry-attempts="0" connection-creation-retry-interval-in-seconds="10" connection-leak-reclaim="false" connection-leak-timeout-in-seconds="0" connection-validation-method="auto-commit" datasource-classname="com.mysql.jdbc.jdbc2.optional.MysqlDataSource" fail-all-connections="false" idle-timeout-in-seconds="300" is-connection-validation-required="true" is-isolation-level-guaranteed="false" lazy-connection-association="false" lazy-connection-enlistment="false" match-connections="false" max-connection-usage-count="300" max-pool-size="400" max-wait-time-in-millis="60000" name="fathom-pool2" non-transactional-connections="false" pool-resize-quantity="10" res-type="javax.sql.DataSource" statement-timeout-in-seconds="55" steady-pool-size="40" validate-atmost-once-period-in-seconds="10" wrap-jdbc-objects="false"> <property name="SelfDestructOnPingMaxOperations" value="0"/> <property name="UseUltraDevWorkAround" value="false"/> <property name="UseStreamLengthsInPrepStmts" value="true"/> <property name="TcpRcvBuf" value="0"/> <property name="UltraDevHack" value="false"/> <property name="InitialTimeout" value="2"/> <property name="UseNanosForElapsedTime" value="false"/> <property name="MetadataCacheSize" value="50"/> <property name="UseColumnNamesInFindColumn" value="false"/> <property name="RequireSSL" value="false"/> <property name="LogXaCommands" value="false"/> <property name="CacheCallableStatements" value="false"/> <property name="NoDatetimeStringSync" value="false"/> <property name="EmulateUnsupportedPstmts" value="true"/> <property name="TreatUtilDateAsTimestamp" value="true"/> <property name="EmulateLocators" value="false"/> <property name="ClientInfoProvider" value="com.mysql.jdbc.JDBC4CommentClientInfoProvider"/> <property name="IsInteractiveClient" value="false"/> <property name="IncludeInnodbStatusInDeadlockExceptions" value="false"/> <property name="ProcessEscapeCodesForPrepStmts" value="true"/> <property name="GatherPerformanceMetrics" value="false"/> <property name="UseOnlyServerErrorMessages" value="true"/> <property name="HoldResultsOpenOverStatementClose" value="false"/> <property name="LoadBalanceBlacklistTimeout" value="0"/> <property name="UseServerPreparedStmts" value="false"/> <property name="UseTimezone" value="false"/> <property name="BlobsAreStrings" value="false"/> <property name="IgnoreNonTxTables" value="false"/> <property name="UseSSL" value="false"/> <property name="AllowLoadLocalInfile" value="true"/> <property name="SocketFactoryClassName" value="com.mysql.jdbc.StandardSocketFactory"/> <property name="SelfDestructOnPingSecondsLifetime" value="0"/> <property name="AutoClosePStmtStreams" value="false"/> <property name="NullCatalogMeansCurrent" value="true"/> <property name="UseInformationSchema" value="false"/> <property name="DynamicCalendars" value="false"/> <property name="CreateDatabaseIfNotExist" value="false"/> <property name="UseServerPrepStmts" value="false"/> <property name="Pedantic" value="false"/> <property name="UseSSPSCompatibleTimezoneShift" value="false"/> <property name="ReconnectAtTxEnd" value="false"/> <property name="Url" value="jdbc:mysql://:3306/"/> <property name="UseJvmCharsetConverters" value="false"/> <property name="FunctionsNeverReturnBlobs" value="false"/> <property name="NoAccessToProcedureBodies" value="false"/> <property name="AllowNanAndInf" value="false"/> <property name="CachePreparedStatements" value="false"/> <property name="ExplainSlowQueries" value="false"/> <property name="CacheResultSetMetadata" value="false"/> <property name="TcpKeepAlive" value="true"/> <property name="PreparedStatementCacheSqlLimit" value="256"/> <property name="SlowQueryThresholdNanos" value="0"/> <property name="LoginTimeout" value="0"/> <property name="UseCompression" value="false"/> <property name="AutoSlowLog" value="true"/> <property name="TcpSndBuf" value="0"/> <property name="TcpTrafficClass" value="0"/> <property name="RoundRobinLoadBalance" value="false"/> <property name="CallableStatementCacheSize" value="100"/> <property name="InteractiveClient" value="false"/> <property name="NoTimezoneConversionForTimeType" value="false"/> <property name="UseHostsInPrivileges" value="true"/> <property name="UseSqlStateCodes" value="true"/> <property name="GatherPerfMetrics" value="false"/> <property name="ReportMetricsIntervalMillis" value="30000"/> <property name="UseFastDateParsing" value="true"/> <property name="LoggerClassName" value="com.mysql.jdbc.log.StandardLogger"/> <property name="DontTrackOpenResources" value="false"/> <property name="UseOldUTF8Behavior" value="false"/> <property name="OverrideSupportsIntegrityEnhancementFacility" value="false"/> <property name="RollbackOnPooledClose" value="true"/> <property name="GenerateSimpleParameterMetadata" value="false"/> <property name="SecondsBeforeRetryMaster" value="30"/> <property name="ResultSetSizeThreshold" value="100"/> <property name="TransformedBitIsBoolean" value="false"/> <property name="SocketTimeout" value="0"/> <property name="Paranoid" value="false"/> <property name="FailOverReadOnly" value="true"/> <property name="CapitalizeTypeNames" value="true"/> <property name="VerifyServerCertificate" value="true"/> <property name="TinyInt1isBit" value="true"/> <property name="UseDynamicCharsetInfo" value="true"/> <property name="UseBlobToStoreUTF8OutsideBMP" value="false"/> <property name="EnableQueryTimeouts" value="true"/> <property name="PopulateInsertRowWithDefaultValues" value="false"/> <property name="UseDirectRowUnpack" value="true"/> <property name="PadCharsWithSpace" value="false"/> <property name="UseOldAliasMetadataBehavior" value="false"/> <property name="PrepStmtCacheSqlLimit" value="256"/> <property name="MaxReconnects" value="3"/> <property name="AutoDeserialize" value="false"/> <property name="JdbcCompliantTruncationForReads" value="true"/> <property name="CompensateOnDuplicateKeyUpdateCounts" value="false"/> <property name="PinGlobalTxToPhysicalConnection" value="false"/> <property name="ProfileSQL" value="false"/> <property name="RewriteBatchedStatements" value="false"/> <property name="UseUnicode" value="true"/> <property name="Logger" value="com.mysql.jdbc.log.StandardLogger"/> <property name="DumpMetadataOnColumnNotFound" value="false"/> <property name="MaxQuerySizeToLog" value="2048"/> <property name="LocatorFetchBufferSize" value="1048576"/> <property name="CallableStmtCacheSize" value="100"/> <property name="ClobberStreamingResults" value="false"/> <property name="TcpNoDelay" value="true"/> <property name="LoadBalanceStrategy" value="random"/> <property name="StrictFloatingPoint" value="false"/> <property name="AutoReconnectForPools" value="false"/> <property name="CacheCallableStmts" value="false"/> <property name="MaxRows" value="-1"/> <property name="UseLocalTransactionState" value="false"/> <property name="UseLegacyDatetimeCode" value="true"/> <property name="BlobSendChunkSize" value="1048576"/> <property name="NetTimeoutForStreamingResults" value="600"/> <property name="UseGmtMillisForDatetimes" value="false"/> <property name="AllowMultiQueries" value="false"/> <property name="JdbcCompliantTruncation" value="true"/> <property name="AutoGenerateTestcaseScript" value="false"/> <property name="EnablePacketDebug" value="false"/> <property name="ContinueBatchOnError" value="true"/> <property name="TraceProtocol" value="false"/> <property name="UseCursorFetch" value="false"/> <property name="UseAffectedRows" value="false"/> <property name="EmptyStringsConvertToZero" value="true"/> <property name="PreparedStatementCacheSize" value="25"/> <property name="StrictUpdates" value="true"/> <property name="DumpQueriesOnException" value="false"/> <property name="PacketDebugBufferSize" value="20"/> <property name="AllowUrlInLocalInfile" value="false"/> <property name="UseUsageAdvisor" value="false"/> <property name="QueriesBeforeRetryMaster" value="50"/> <property name="SocketFactory" value="com.mysql.jdbc.StandardSocketFactory"/> <property name="DefaultFetchSize" value="0"/> <property name="ProfilerEventHandler" value="com.mysql.jdbc.profiler.LoggingProfilerEventHandler"/> <property name="ConnectTimeout" value="0"/> <property name="YearIsDateType" value="true"/> <property name="RunningCTS13" value="false"/> <property name="UseJDBCCompliantTimezoneShift" value="false"/> <property name="UseFastIntParsing" value="true"/> <property name="ProfileSql" value="false"/> <property name="RelaxAutoCommit" value="false"/> <property name="NullNamePatternMatchesAll" value="true"/> <property name="RetriesAllDown" value="120"/> <property name="CachePrepStmts" value="true"/> <property name="UseReadAheadInput" value="false"/> <property name="SlowQueryThresholdMillis" value="30000"/> <property name="LogSlowQueries" value="true"/> <property name="MaintainTimeStats" value="false"/> <property name="PrepStmtCacheSize" value="512"/> <property name="ZeroDateTimeBehavior" value="convertToNull"/> <property name="AlwaysSendSetIsolation" value="false"/> <property name="UseUnbufferedInput" value="false"/> <property name="CacheServerConfiguration" value="true"/> <property name="ElideSetAutoCommits" value="true"/> <property name="UseLocalSessionState" value="true"/> <property name="DatabaseName" value="XXX"/> <property name="PortNumber" value="XXX"/> <property name="Port" value="XXX"/> <property name="User" value="XXX"/> <property name="URL" value="XXX"/> <property name="Password" value=""/> </jdbc-connection-pool>
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 9:52 PM
in response to: rwillie6
|
|
|
Hi Ryan,
Most of your pool settings look fine. You can switch back to "auto-commit" mode of validation.
Regarding the following thread dump : http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338842-8159/jstack_dump_2009-03-24_09:26:34.txt
As the resizer is cleaning & validating the pool, all the connection requests are made to wait. This will be completed in few seconds, again dependent on the number of connections to be validated. You can confirm this by taking another thread dump in a couple of minutes. However, you can tune the pool settings to minimize the frequency of resizer calls.
This is not the solution to the actual issue you may be facing (as your other stack traces confirm that the real issue is not w.r.t database), but it will help to further reduce the number of database validation calls / cleanup jobs.
idle-timeout : 300 seconds can be made to 1500 seconds validate-atmost-once-period-in-seconds : 10 seconds can be made to 60 seconds max-connection-usage : 300 can be made to 500
Hope this helps.
Thanks, -Jagadish
On Tue, 2009-03-24 at 11:44 -0700, glassfish@javadesktop.org wrote: > I'm not sure which connection pool you were referring to, so here are both: > > <connection-pool max-pending-count="4096" queue-size-in-bytes="4096" receive-buffer-size-in-bytes="4096" send-buffer-size-in-bytes="8192"/> > > And the JDBC pool: > <jdbc-connection-pool allow-non-component-callers="false" associate-with-thread="false" connection-creation-retry-attempts="0" connection-creation-retry-interval-in-seconds="10" connection-leak-reclaim="false" connection-leak-timeout-in-seconds="0" connection-validation-method="auto-commit" datasource-classname="com.mysql.jdbc.jdbc2.optional.MysqlDataSource" fail-all-connections="false" idle-timeout-in-seconds="300" is-connection-validation-required="true" is-isolation-level-guaranteed="false" lazy-connection-association="false" lazy-connection-enlistment="false" match-connections="false" max-connection-usage-count="300" max-pool-size="400" max-wait-time-in-millis="60000" name="fathom-pool2" non-transactional-connections="false" pool-resize-quantity="10" res-type="javax.sql.DataSource" statement-timeout-in-seconds="55" steady-pool-size="40" validate-atmost-once-period-in-seconds="10" wrap-jdbc-objects="false"> > <property name="SelfDestructOnPingMaxOperations" value="0"/> > <property name="UseUltraDevWorkAround" value="false"/> > <property name="UseStreamLengthsInPrepStmts" value="true"/> > <property name="TcpRcvBuf" value="0"/> > <property name="UltraDevHack" value="false"/> > <property name="InitialTimeout" value="2"/> > <property name="UseNanosForElapsedTime" value="false"/> > <property name="MetadataCacheSize" value="50"/> > <property name="UseColumnNamesInFindColumn" value="false"/> > <property name="RequireSSL" value="false"/> > <property name="LogXaCommands" value="false"/> > <property name="CacheCallableStatements" value="false"/> > <property name="NoDatetimeStringSync" value="false"/> > <property name="EmulateUnsupportedPstmts" value="true"/> > <property name="TreatUtilDateAsTimestamp" value="true"/> > <property name="EmulateLocators" value="false"/> > <property name="ClientInfoProvider" value="com.mysql.jdbc.JDBC4CommentClientInfoProvider"/> > <property name="IsInteractiveClient" value="false"/> > <property name="IncludeInnodbStatusInDeadlockExceptions" value="false"/> > <property name="ProcessEscapeCodesForPrepStmts" value="true"/> > <property name="GatherPerformanceMetrics" value="false"/> > <property name="UseOnlyServerErrorMessages" value="true"/> > <property name="HoldResultsOpenOverStatementClose" value="false"/> > <property name="LoadBalanceBlacklistTimeout" value="0"/> > <property name="UseServerPreparedStmts" value="false"/> > <property name="UseTimezone" value="false"/> > <property name="BlobsAreStrings" value="false"/> > <property name="IgnoreNonTxTables" value="false"/> > <property name="UseSSL" value="false"/> > <property name="AllowLoadLocalInfile" value="true"/> > <property name="SocketFactoryClassName" value="com.mysql.jdbc.StandardSocketFactory"/> > <property name="SelfDestructOnPingSecondsLifetime" value="0"/> > <property name="AutoClosePStmtStreams" value="false"/> > <property name="NullCatalogMeansCurrent" value="true"/> > <property name="UseInformationSchema" value="false"/> > <property name="DynamicCalendars" value="false"/> > <property name="CreateDatabaseIfNotExist" value="false"/> > <property name="UseServerPrepStmts" value="false"/> > <property name="Pedantic" value="false"/> > <property name="UseSSPSCompatibleTimezoneShift" value="false"/> > <property name="ReconnectAtTxEnd" value="false"/> > <property name="Url" value="jdbc:mysql://:3306/"/> > <property name="UseJvmCharsetConverters" value="false"/> > <property name="FunctionsNeverReturnBlobs" value="false"/> > <property name="NoAccessToProcedureBodies" value="false"/> > <property name="AllowNanAndInf" value="false"/> > <property name="CachePreparedStatements" value="false"/> > <property name="ExplainSlowQueries" value="false"/> > <property name="CacheResultSetMetadata" value="false"/> > <property name="TcpKeepAlive" value="true"/> > <property name="PreparedStatementCacheSqlLimit" value="256"/> > <property name="SlowQueryThresholdNanos" value="0"/> > <property name="LoginTimeout" value="0"/> > <property name="UseCompression" value="false"/> > <property name="AutoSlowLog" value="true"/> > <property name="TcpSndBuf" value="0"/> > <property name="TcpTrafficClass" value="0"/> > <property name="RoundRobinLoadBalance" value="false"/> > <property name="CallableStatementCacheSize" value="100"/> > <property name="InteractiveClient" value="false"/> > <property name="NoTimezoneConversionForTimeType" value="false"/> > <property name="UseHostsInPrivileges" value="true"/> > <property name="UseSqlStateCodes" value="true"/> > <property name="GatherPerfMetrics" value="false"/> > <property name="ReportMetricsIntervalMillis" value="30000"/> > <property name="UseFastDateParsing" value="true"/> > <property name="LoggerClassName" value="com.mysql.jdbc.log.StandardLogger"/> > <property name="DontTrackOpenResources" value="false"/> > <property name="UseOldUTF8Behavior" value="false"/> > <property name="OverrideSupportsIntegrityEnhancementFacility" value="false"/> > <property name="RollbackOnPooledClose" value="true"/> > <property name="GenerateSimpleParameterMetadata" value="false"/> > <property name="SecondsBeforeRetryMaster" value="30"/> > <property name="ResultSetSizeThreshold" value="100"/> > <property name="TransformedBitIsBoolean" value="false"/> > <property name="SocketTimeout" value="0"/> > <property name="Paranoid" value="false"/> > <property name="FailOverReadOnly" value="true"/> > <property name="CapitalizeTypeNames" value="true"/> > <property name="VerifyServerCertificate" value="true"/> > <property name="TinyInt1isBit" value="true"/> > <property name="UseDynamicCharsetInfo" value="true"/> > <property name="UseBlobToStoreUTF8OutsideBMP" value="false"/> > <property name="EnableQueryTimeouts" value="true"/> > <property name="PopulateInsertRowWithDefaultValues" value="false"/> > <property name="UseDirectRowUnpack" value="true"/> > <property name="PadCharsWithSpace" value="false"/> > <property name="UseOldAliasMetadataBehavior" value="false"/> > <property name="PrepStmtCacheSqlLimit" value="256"/> > <property name="MaxReconnects" value="3"/> > <property name="AutoDeserialize" value="false"/> > <property name="JdbcCompliantTruncationForReads" value="true"/> > <property name="CompensateOnDuplicateKeyUpdateCounts" value="false"/> > <property name="PinGlobalTxToPhysicalConnection" value="false"/> > <property name="ProfileSQL" value="false"/> > <property name="RewriteBatchedStatements" value="false"/> > <property name="UseUnicode" value="true"/> > <property name="Logger" value="com.mysql.jdbc.log.StandardLogger"/> > <property name="DumpMetadataOnColumnNotFound" value="false"/> > <property name="MaxQuerySizeToLog" value="2048"/> > <property name="LocatorFetchBufferSize" value="1048576"/> > <property name="CallableStmtCacheSize" value="100"/> > <property name="ClobberStreamingResults" value="false"/> > <property name="TcpNoDelay" value="true"/> > <property name="LoadBalanceStrategy" value="random"/> > <property name="StrictFloatingPoint" value="false"/> > <property name="AutoReconnectForPools" value="false"/> > <property name="CacheCallableStmts" value="false"/> > <property name="MaxRows" value="-1"/> > <property name="UseLocalTransactionState" value="false"/> > <property name="UseLegacyDatetimeCode" value="true"/> > <property name="BlobSendChunkSize" value="1048576"/> > <property name="NetTimeoutForStreamingResults" value="600"/> > <property name="UseGmtMillisForDatetimes" value="false"/> > <property name="AllowMultiQueries" value="false"/> > <property name="JdbcCompliantTruncation" value="true"/> > <property name="AutoGenerateTestcaseScript" value="false"/> > <property name="EnablePacketDebug" value="false"/> > <property name="ContinueBatchOnError" value="true"/> > <property name="TraceProtocol" value="false"/> > <property name="UseCursorFetch" value="false"/> > <property name="UseAffectedRows" value="false"/> > <property name="EmptyStringsConvertToZero" value="true"/> > <property name="PreparedStatementCacheSize" value="25"/> > <property name="StrictUpdates" value="true"/> > <property name="DumpQueriesOnException" value="false"/> > <property name="PacketDebugBufferSize" value="20"/> > <property name="AllowUrlInLocalInfile" value="false"/> > <property name="UseUsageAdvisor" value="false"/> > <property name="QueriesBeforeRetryMaster" value="50"/> > <property name="SocketFactory" value="com.mysql.jdbc.StandardSocketFactory"/> > <property name="DefaultFetchSize" value="0"/> > <property name="ProfilerEventHandler" value="com.mysql.jdbc.profiler.LoggingProfilerEventHandler"/> > <property name="ConnectTimeout" value="0"/> > <property name="YearIsDateType" value="true"/> > <property name="RunningCTS13" value="false"/> > <property name="UseJDBCCompliantTimezoneShift" value="false"/> > <property name="UseFastIntParsing" value="true"/> > <property name="ProfileSql" value="false"/> > <property name="RelaxAutoCommit" value="false"/> > <property name="NullNamePatternMatchesAll" value="true"/> > <property name="RetriesAllDown" value="120"/> > <property name="CachePrepStmts" value="true"/> > <property name="UseReadAheadInput" value="false"/> > <property name="SlowQueryThresholdMillis" value="30000"/> > <property name="LogSlowQueries" value="true"/> > <property name="MaintainTimeStats" value="false"/> > <property name="PrepStmtCacheSize" value="512"/> > <property name="ZeroDateTimeBehavior" value="convertToNull"/> > <property name="AlwaysSendSetIsolation" value="false"/> > <property name="UseUnbufferedInput" value="false"/> > <property name="CacheServerConfiguration" value="true"/> > <property name="ElideSetAutoCommits" value="true"/> > <property name="UseLocalSessionState" value="true"/> > <property name="DatabaseName" value="XXX"/> > <property name="PortNumber" value="XXX"/> > <property name="Port" value="XXX"/> > <property name="User" value="XXX"/> > <property name="URL" value="XXX"/> > <property name="Password" value=""/> > </jdbc-connection-pool> > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=338782 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 12:20 AM
in response to: Jagadish Prasat...
|
|
|
Hi Jagadish,
I have made your changes with respect to the connection pool, as well as disabled auto-commit for the time being. Also, I updated to jdk 1.6.0_13, as well as Glassfish 2.1-b60e.
Unfortunately, I am not sure whether these changes have had a positive effect because we started experiencing general sluggishness (10s+ page loads) about 12 hours ago that we have been unable to remedy. Our traffic during this period has been unremarkable, and our munin graphs have appeared normal. Nonetheless, the server has responding very slowly. I don't think it's related to our application, since this is the same version of the app we had deployed for the previous week without experiencing this slowness. Also, I don't think it's database related, since even static content is affected by the slow responses.
I thought the change may have been due to running at Xmx3000m and Xms3000m, instead of Xmx2500m and Xms2500m, so we changed back to 2500 but that was unremarkable as well.
I'm not sure what to do. Sometimes, page loads are quite snappy, but other times they take 20s, and I have been unable to pinpoint differences.
Please let me know if there is any other data I can provide.
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 12:31 AM
in response to: rwillie6
|
|
|
Actually, there is one remarkable change: the application server is now using less CPU than ever. I suspect this is partially due to decreased traffic, but the decrease seems too large to be explainable by that alone...
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 2:33 AM
in response to: rwillie6
|
|
|
On Wed, 2009-03-25 at 00:20 -0700, glassfish@javadesktop.org wrote: > Nonetheless, the server has responding very slowly. I don't think > it's related to our application, since this is the same version of the > app we had deployed for the previous week without experiencing this > slowness. Also, I don't think it's database related, since even > static content is affected by the slow responses. JeanFrancois : hope this clarifies that this is not w.r.t DB, can you look into the issue ?
Thanks, -Jagadish
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 3:50 AM
in response to: Jagadish Prasat...
|
|
|
|
|
On the whole am having general sluggishness in the last 12+ hours, this includes both static and dynamic content. I setup a new instance using the latest jdk and glassfish, configured it from a clean install changing only know settings and managed to reproduce the problem I was having on the previous server; though I was intending to solve the problem.
I'm not sure how it could be the code base, since the same code base was running for days previously without incident.
I'm checking with our DB expert to see if anything changed there that I was unaware of.
I've attached a jstack dump that was taken while I was waiting for a 20s page load, perhaps there is something in there.
Not sure what to do....
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 4:52 AM
in response to: rwillie6
|
|
|
|
|
As an addendum, I did some further debugging to isolate glassfish with these results:
Bypassing ha-proxy as a load-balancer and directing traffic directly to glassfish does not reduce slowness.
Switching to a different database setup we had previously used without trouble does not reduce slowness. (We tried this because a little over 2 weeks ago we switch to a new database server and even though we hadn't had trouble with it in the last 2 weeks we wanted to confirm that it was not the issue).
We're on ec2 and the ec2 status panel is reporting norma behavior, so I'm not sure about any ways to confirm it isn't ec2 related.
This time I've attached the domain.xml from our currently problematic install; maybe there's something in there that matters that I was forgetting to mention.
Thanks for the help so far!
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 6:55 AM
in response to: rwillie6
|
|
|
I don't see anything suspicious in your domain.xml.
In your stack dump however, *all* the 150 Grizzly threads are just waiting for someone to notify them, since everyone grabs the pipeline and goes back to waiting state. JF, do you have any comment on that?
-Kedar
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 7:55 AM
in response to: km
|
|
|
|
|
I'm not JF (nor do I play him on TV), but I'll weigh in with a few observations.
In the stacks I saw, either all threads are just waiting (as Kedar mentions), or most threads are waiting and a few are blocked reading from the database. None of those threads would be causing CPU to be used, nor would that slow down accessing static resources. [It could mean that requests that need to go to the database are slow; the fact that threads are either idle or reading from the database but not doing anything else indicates that is a potential problem, but not necessarily so either.]
When the OS reports that CPU is being used and it doesn't show up in jstacks with threads actually running (in the RUNNABLE state, but not in socketRead0), it typically means that the GC threads are running and using the CPU. It would be good to use jstat to monitor how much time GC (or use -Xloggc:gc.log in the domain.xml, which will print out the GC usage to config/gc.log, which you can tail and see when GC is running). If GC is spending more than a few % of total time, that's an issue, and I'd guess that the hibernate cache is causing memory pressures.
The spinning poll issue JF mentioned earlier in the thread is still a possibility in my mind if CPU is 100% utilized; I didn't see any discussion of that specifically.
-Scott
On Wed, 2009-03-25 at 06:55 -0700, glassfish@javadesktop.org wrote:
> I don't see anything suspicious in your domain.xml. > > In your stack dump however, *all* the 150 Grizzly threads are just waiting for someone to notify > them, since everyone grabs the pipeline and goes back to waiting state. JF, do you have any > comment on that? > > -Kedar > [Message sent by forum member 'km' (km)] > > http://forums.java.net/jive/thread.jspa?messageID=338958 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net
[att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 12:28 PM
in response to: Scott Oaks
|
|
|
Hi Scott,
Just to clarify, the original issue for creating this thread was investigating increases in CPU and interrupts / context switching that occurred during non-peak traffic. Now, however, our graphs are unremarkable. We're barely using any CPU ( <8% ) and the interruptes /
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 25, 2009 2:58 PM
in response to: rwillie6
|
|
|
|
|
So if you're not seeing the CPU spikes anymore, it's hard to guess what's going on -- but given that the glassfish threads are all idle in the latest stack dumps, it seems likely that the problem is elsewhere.
The most likely culprit seems to me to be the network -- I did notice in some of the stack dumps that a few glassfish threads were blocking writing data back to the clients, which would normally only happen if there is a network issue. I wonder if running tcpdump/snoop on both the client and server machines would show any delays. Or seeing how much ftp thoughput you get during those 30-second intervals. Or even simple ping tests (or traceroutes) could show a long time delay.
We have seen on some platforms (like Windows 2003) that the OS fails to send us the appropriate signal when a request is pending; I suppose something like that could be going on. That might also explain why it started happening suddenly, since an automatic OS update might have introduced the problem. But that seems much less likely to me; it would be a big OS bug (Windows 2003, not being a server-targeted OS, didn't necessarily cause so many difficulties with its issue).
-Scott
On Wed, 2009-03-25 at 12:28 -0700, glassfish@javadesktop.org wrote:
> Hi Scott, > > Just to clarify, the original issue for creating this thread was investigating increases in CPU and interrupts / context switching that occurred during non-peak traffic. Now, however, our graphs are unremarkable. We're barely using any CPU ( <8% ) and the interruptes / [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=339036 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net
[att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 26, 2009 1:15 AM
in response to: Scott Oaks
|
|
|
Here's an update of our progress. We tried to isolate various parts of the setup today to find anything that would help. We tried creating an entirely new app server from scratch, using a different ec2 availability zone, the latest jdk6 and latest glassfish, but the results were the same.
So, I started investigating gc. I added the JVM options Xloggc and XX:+PrintGCDetails and then used apache benchmark to put the server under higher load to try to force memory to grow more quickly. After a while of doing this and watching the gc log, I concluded that gc was not a problem. For the duration, young gen collections took only a few milliseconds and full GC took 2-3 seconds (when it happened, which was rarely). Another lesson from this was that it occurred when the network variable was removed, as ab hit the same slow request loads when running locally on the app server.
I wanted to know more about where the slow requests were spending their time, so I added a filter to our app on /* to benchmark the processing time that each request spent in our app's code. I ran the server with this logging enabled until I'd collected 30K requests, then wrote a little util to parse that log and insert the results into my local DB. I ran some queries on the data and discovered that only 4 of the 30K requests had a processing time of >15s (1 at >20s, and 1 at >30s), of which 3 or those 4 requests were (oddly) to a HttpServlet that did nothing but call request.getSession() and then set two String constants to the session. Overall, 99.9% of requests were <2s.
This was confusing because I had data to show that our code was processing requests very quickly, but I was watching the counter on our down page hover around at 200+ users all afternoon (the down page is served by haproxy when it receives a 504 from glassfish). So, requests were processing slowly enough to timeout frequently, but they were not spending that time in our application code.
To be doubly sure that haproxy wasn't misbehaving, we directed traffic straight to glassfish for an hour or so, but that didn't change anything. On a whim, I cut the http acceptor threads down to 1 with an interesting effect: the site was still slow, but less erratic. Whereas, before, some requests were very fast and others very slow, moving to 1 acceptor just made every request in the range of sort-of-slow to more-slow. Also, with 1 acceptor, the server is noticeably snappier immediately after restart and becomes progressively slower thereafter.
Lacking other ideas, we're considering trying solaris instead of ubuntu, but are hoping to get access to an ec2 ami of OpenSolaris 2008.11 64-bit (if we can't, we might try the 64-bit xsce glassfish ami that we found).
Any other advice? Things to try? Data to collect?
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 26, 2009 3:44 AM
in response to: rwillie6
|
|
|
I work with Ryan (rwillie6) and want to report that we believe we have narrowed down the root of our problem, and have resolved our server slowdowns for the time being.
While debugging our slowness we tried setting our acceptor threads count to 1 (it used to be 5). Unfortunately, when we did this something with lucene caused our threads to lock up permanently so we thought that setting the acceptor threads down to 1 didn't help. However after looking through thread dumps and finding out what lucene was doing we redeployed a version of our app with lucene commented out and have since had fantastic performance, even with just 1 acceptor thread.
However, thinking that the lucene conflict was the root of the problem, we set the acceptor threads back to 5 and discovered that the occasional slowness problem reappeared. After some more testing we discovered that the problem occurs with any acceptor thread value above 1.
I noticed that during the periods of "slowness" when the acceptor threads were greater than 1 our CPU usage on the server was 0%. I made a thread dump during this state. Im just shooting in the dark but I noticed there were two threads there were running sun.nio.ch.FileDispatcher.preClose(). Perhaps a file lock problem? I have attached the full dump so you can take a closer look.
Its still quite a mystery to us why we were able to run with 5 acceptor threads and no problems up until yesterday, especially given that yesterday was not an abnormal day in terms of traffic.
Thanks for all the great help so far. Let us know if we can do anything else to help detail the problem.
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 26, 2009 3:48 AM
in response to: arjunlall
|
|
|
|
|
Forgot to attach the thread dump!
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 8, 2009 7:05 PM
in response to: Scott Oaks
|
|
|
Hi Scott,
See my most recent reply at the bottom. It does seem to be a GC issue, and I attached a couple of gc logs from the most recent occurrences. Unfortunately, I'm not sure it's the Hibernate cache, since we do not have any second-level cache enabled. Or, could it be elsewhere in Hibrnate? It seems to me from the gc logs that something is stuck, since full GC's are running continuously when the problem occurs. It's as if they are running, but failing, and so running over and over again. Where might I be able to look to try to find the problem?
Thanks for your help once again!
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 9, 2009 7:42 AM
in response to: rwillie6
|
|
|
|
|
There's definitely something there using more memory than you want; then at each GC it cleans up just enough to limp along to the next full gc (but not enough to do anything useful).
You'll have to look into some memory tools (jmap to generate a heap dump, and then something like jhat or the Netbeans memory tool or the Eclipse memory analyzer) to see what objects are using all that space and see why they aren't getting reclaimed.
-Scott
On Wed, 2009-04-08 at 19:05 -0700, glassfish@javadesktop.org wrote:
> Hi Scott, > > See my most recent reply at the bottom. It does seem to be a GC issue, and I attached a couple of gc logs from the most recent occurrences. Unfortunately, I'm not sure it's the Hibernate cache, since we do not have any second-level cache enabled. Or, could it be elsewhere in Hibrnate? It seems to me from the gc logs that something is stuck, since full GC's are running continuously when the problem occurs. It's as if they are running, but failing, and so running over and over again. Where might I be able to look to try to find the problem? > > Thanks for your help once again! > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=341268 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net
[att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 5:59 PM
in response to: Jagadish Prasat...
|
|
|
Hi Jagadish,
Hmm... the only noticeable change after switching to "table" connection validation is the whole site seems to be more sluggish. Is that expected?
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 4:16 AM
in response to: Jeanfrancois Ar...
|
|
|
On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: > Salut, > > glassfish@javadesktop.org wrote:
> Jagadish, do you know if this could be the issue or not? But for sure > all Grizzly WorkerThread may gets locked by such call. jstack_dump_2009-03-23_22:10:22.txt I see 250 "httpWorkerThread-80*" threads of which 34 are used for database.
Lets wait for "rwillie" to provide more details, output based on the proposed settings and see.
Thanks, -Jagadish
> > A+ > > -- Jeanfrancois > > > > Thanks! > > [Message sent by forum member 'rwillie6' (rwillie6)] > > > > http://forums.java.net/jive/thread.jspa?messageID=338656 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > > For additional commands, e-mail: users-help@glassfish.dev.java.net > >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 7:11 AM
in response to: Jagadish Prasat...
|
|
|
Salut,
Jagadish Prasath Ramu wrote: > > > On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: >> Salut, >> >> glassfish@javadesktop.org wrote: > >> Jagadish, do you know if this could be the issue or not? But for sure >> all Grizzly WorkerThread may gets locked by such call. > jstack_dump_2009-03-23_22:10:22.txt > I see 250 "httpWorkerThread-80*" threads of which 34 are used for > database.
Hum, for that one:
http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338654-8150/jstack_dump_2009-03-24_09:26:34.txt
All Grizzly WorkerThread are locked.
> > Lets wait for "rwillie" to provide more details, output based on the > proposed settings and see.
Thanks!
-- Jeanfrancois
> > Thanks, > -Jagadish > > >> A+ >> >> -- Jeanfrancois >> >> >>> Thanks! >>> [Message sent by forum member 'rwillie6' (rwillie6)] >>> >>> http://forums.java.net/jive/thread.jspa?messageID=338656 >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net >>> For additional commands, e-mail: users-help@glassfish.dev.java.net >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 7:20 AM
in response to: Jeanfrancois Ar...
|
|
|
On Tue, 2009-03-24 at 15:11 +0100, Jeanfrancois Arcand wrote: > Salut, > > Jagadish Prasath Ramu wrote: > > > > > > On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: > >> Salut, > >> > >> glassfish@javadesktop.org wrote: > > > >> Jagadish, do you know if this could be the issue or not? But for sure > >> all Grizzly WorkerThread may gets locked by such call. > > jstack_dump_2009-03-23_22:10:22.txt > > I see 250 "httpWorkerThread-80*" threads of which 34 are used for > > database. > > Hum, for that one: > > http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338654-8150/jstack_dump_2009-03-24_09:26:34.txt > > All Grizzly WorkerThread are locked. When resizer (cleaner) is doing cleanup of the pool, all the connections will be validated. This will make all the requesting threads to wait till resizer completes its job. This is a temporary wait stage.
jstack_dump_2009-03-23_22:10:22.txt For the above stack, only 34 out of 250 httpWorkerThreads are used by database calls. So, what could be the issue ?
Thanks, -Jagadish > > > > > Lets wait for "rwillie" to provide more details, output based on the > > proposed settings and see. > > Thanks! > > -- Jeanfrancois > > > > > > Thanks, > > -Jagadish > > > > > >> A+ > >> > >> -- Jeanfrancois > >> > >> > >>> Thanks! > >>> [Message sent by forum member 'rwillie6' (rwillie6)] > >>> > >>> http://forums.java.net/jive/thread.jspa?messageID=338656 > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > >>> For additional commands, e-mail: users-help@glassfish.dev.java.net > >>> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > > For additional commands, e-mail: users-help@glassfish.dev.java.net > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 7:27 AM
in response to: Jagadish Prasat...
|
|
|
Jagadish Prasath Ramu wrote: > > On Tue, 2009-03-24 at 15:11 +0100, Jeanfrancois Arcand wrote: >> Salut, >> >> Jagadish Prasath Ramu wrote: >>> >>> On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: >>>> Salut, >>>> >>>> glassfish@javadesktop.org wrote: >>>> Jagadish, do you know if this could be the issue or not? But for sure >>>> all Grizzly WorkerThread may gets locked by such call. >>> jstack_dump_2009-03-23_22:10:22.txt >>> I see 250 "httpWorkerThread-80*" threads of which 34 are used for >>> database. >> Hum, for that one: >> >> http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338654-8150/jstack_dump_2009-03-24_09:26:34.txt >> >> All Grizzly WorkerThread are locked. > When resizer (cleaner) is doing cleanup of the pool, all the connections > will be validated. This will make all the requesting threads to wait > till resizer completes its job. This is a temporary wait stage. > > jstack_dump_2009-03-23_22:10:22.txt
Also 1 hour pass between the two jstack so I would think the slow down happened during that time. How long does it take for the finalizer to execute? That window might be the issue. It is really hard to say with those 2 jstack as the time frame in between is really long.
A+
- Jeanfrancois
> For the above stack, only 34 out of 250 httpWorkerThreads are used by > database calls. So, what could be the issue ? > > Thanks, > -Jagadish >>> Lets wait for "rwillie" to provide more details, output based on the >>> proposed settings and see. >> Thanks! >> >> -- Jeanfrancois >> >> >>> Thanks, >>> -Jagadish >>> >>> >>>> A+ >>>> >>>> -- Jeanfrancois >>>> >>>> >>>>> Thanks! >>>>> [Message sent by forum member 'rwillie6' (rwillie6)] >>>>> >>>>> http://forums.java.net/jive/thread.jspa?messageID=338656 >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net >>>>> For additional commands, e-mail: users-help@glassfish.dev.java.net >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net >>> For additional commands, e-mail: users-help@glassfish.dev.java.net >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net >> For additional commands, e-mail: users-help@glassfish.dev.java.net >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 7:43 AM
in response to: Jeanfrancois Ar...
|
|
|
On Tue, 2009-03-24 at 15:27 +0100, Jeanfrancois Arcand wrote: > > jstack_dump_2009-03-23_22:10:22.txt > > > Also 1 hour pass between the two jstack so I would think the slow > down > happened during that time. How long does it take for the finalizer to > execute? It will be in the order of seconds. Again, it depends on the total number of connections being validated.
Thanks, -Jagadish > That window might be the issue. It is really hard to say with > those 2 jstack as the time frame in between is really long. > > A+ > > - Jeanfrancois
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 7:45 AM
in response to: Jagadish Prasat...
|
|
|
On Tue, 2009-03-24 at 20:13 +0530, Jagadish Prasath Ramu wrote: > > On Tue, 2009-03-24 at 15:27 +0100, Jeanfrancois Arcand wrote: > > > jstack_dump_2009-03-23_22:10:22.txt > > > > > > Also 1 hour pass between the two jstack so I would think the slow > > down > > happened during that time. How long does it take for the finalizer to > > execute? > It will be in the order of seconds. Again, it depends on the total > number of connections being validated. Also, there is no resizer involved in the first jstack.
> > Thanks, > -Jagadish > > That window might be the issue. It is really hard to say with > > those 2 jstack as the time frame in between is really long. > > > > A+ > > > > - Jeanfrancois
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 5:47 PM
in response to: Jagadish Prasat...
|
|
|
|
|
More jstack dumps attached from earlier today.
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 9:33 PM
in response to: rwillie6
|
|
|
|
|
attaching another thread dump from rwillie6
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 24, 2009 9:52 PM
in response to: Jeanfrancois Ar...
|
|
|
Hi JeanFrancois :
Except the follwing thread dump jstack_dump_2009-03-24_09:26:34.txt
all others are not related to connection pool /db.
As there are three other stack traces which are indicating that the database operations are fine during the "appserver unresponsive" moments, the issue is not really due to database calls.
Can you take a look at the three stack traces below ?
http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338857-8163/jstack_dump_2009-03-22_20-25-12.txt.tar.gz grep -ri "httpWorkerThread-80" jstack_dump_2009-03-22_20:25:12.txt | wc -l 250
grep -ri "AbstractResourcePool" jstack_dump_2009-03-22_20:25:12.txt | wc -l 0
grep -ri "org.apache.log4j.Category.forcedLog" jstack_dump_2009-03-22_20:25:12.txt | wc -l 67
grep -ri "BLOCKED" jstack_dump_2009-03-22_20:25:12.txt | wc -l 68
http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338654-8151/jstack_dump_2009-03-23_22:10:22.txt
grep -ri "httpWorkerThread-80" jstack_dump_2009-03-23_22:10:22.txt | wc -l 250
grep -ri "com.mysql.jdbc.MysqlIO.readFully" jstack_dump_2009-03-23_22:10:22.txt | wc -l 24
grep -ri "AbstractResourcePool.getResource" jstack_dump_2009-03-23_22:10:22.txt | wc -l 5
http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338842-8160/jstack_dump_2009-03-24_18:49:21.txt grep -ri "httpWorkerThread-80" jstack_dump_2009-03-24_18:49:21.txt | wc -l 50
grep -ri "AbstractResourcePool" jstack_dump_2009-03-24_18:49:21.txt | wc -l 0
grep -ri "com.mysql.jdbc.ConnectionImpl" jstack_dump_2009-03-24_18:49:21.txt | wc -l 5
Thanks, -Jagadish
On Tue, 2009-03-24 at 15:27 +0100, Jeanfrancois Arcand wrote: > > Jagadish Prasath Ramu wrote: > > > > On Tue, 2009-03-24 at 15:11 +0100, Jeanfrancois Arcand wrote: > >> Salut, > >> > >> Jagadish Prasath Ramu wrote: > >>> > >>> On Tue, 2009-03-24 at 11:46 +0100, Jeanfrancois Arcand wrote: > >>>> Salut, > >>>> > >>>> glassfish@javadesktop.org wrote: > >>>> Jagadish, do you know if this could be the issue or not? But for sure > >>>> all Grizzly WorkerThread may gets locked by such call. > >>> jstack_dump_2009-03-23_22:10:22.txt > >>> I see 250 "httpWorkerThread-80*" threads of which 34 are used for > >>> database. > >> Hum, for that one: > >> > >> http://forums.java.net/jive/servlet/JiveServlet/download/56-59323-338654-8150/jstack_dump_2009-03-24_09:26:34.txt > >> > >> All Grizzly WorkerThread are locked. > > When resizer (cleaner) is doing cleanup of the pool, all the connections > > will be validated. This will make all the requesting threads to wait > > till resizer completes its job. This is a temporary wait stage. > > > > jstack_dump_2009-03-23_22:10:22.txt > > > Also 1 hour pass between the two jstack so I would think the slow down > happened during that time. How long does it take for the finalizer to > execute? That window might be the issue. It is really hard to say with > those 2 jstack as the time frame in between is really long. > > A+ > > - Jeanfrancois > > > > For the above stack, only 34 out of 250 httpWorkerThreads are used by > > database calls. So, what could be the issue ? > > > > Thanks, > > -Jagadish > >>> Lets wait for "rwillie" to provide more details, output based on the > >>> proposed settings and see. > >> Thanks! > >> > >> -- Jeanfrancois > >> > >> > >>> Thanks, > >>> -Jagadish > >>> > >>> > >>>> A+ > >>>> > >>>> -- Jeanfrancois > >>>> > >>>> > >>>>> Thanks! > >>>>> [Message sent by forum member 'rwillie6' (rwillie6)] > >>>>> > >>>>> http://forums.java.net/jive/thread.jspa?messageID=338656 > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > >>>>> For additional commands, e-mail: users-help@glassfish.dev.java.net > >>>>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > >>> For additional commands, e-mail: users-help@glassfish.dev.java.net > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > >> For additional commands, e-mail: users-help@glassfish.dev.java.net > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Mar 27, 2009 6:45 AM
in response to: Jeanfrancois Ar...
|
|
|
Ah, I misread the timestamps on the thread dumps I thought I had (turns out they were actually taken by Arjun 2 hours before the CPU spike I was investigating). Sorry, JF.
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 1, 2009 1:02 PM
in response to: Jeanfrancois Ar...
|
|
|
|
|
Hi JF,
I'm still having the intermittent spikes that were the original reason for the post and, because of increasing traffic, they've increased in frequency to every 24 hours or so. They are easy to see coming, because of increases in CPU and Interrupts / Context-Switches in my munin graphs, and can be remedied by restarting the app server. But, without a restart, the situation does not resolve itself. Eventually, the spikes go even higher and the server becomes non-responsive.
As an update, to resolve the sluggishness that we started experiencing after originally creating this thread, I cut the acceptors down to 1 and the thread-count down to 80 (I'm not sure why this helped, since we had been running 5 acceptors and 150 thread-count for 6 months without problem). After doing so, I'm now back to the original problem and am still hoping to find a solution, since waking up in the middle of the night to restart the server after it blew up is getting exhausting ...
I've attached two more thread dumps from when the CPU and Interrupts were escalating and I had to restart the server. Please let me know if these are indicative of the spin state you previously referenced or whether there is any other data I can provide.
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 3, 2009 1:02 PM
in response to: rwillie6
|
|
|
Salut,
glassfish@javadesktop.org wrote: > Hi JF, > > I'm still having the intermittent spikes that were the original reason for the post and, because of increasing traffic, they've increased in frequency to every 24 hours or so. They are easy to see coming, because of increases in CPU and Interrupts / Context-Switches in my munin graphs, and can be remedied by restarting the app server. But, without a restart, the situation does not resolve itself. Eventually, the spikes go even higher and the server becomes non-responsive. > > As an update, to resolve the sluggishness that we started experiencing after originally creating this thread, I cut the acceptors down to 1 and the thread-count down to 80 (I'm not sure why this helped, since we had been running 5 acceptors and 150 thread-count for 6 months without problem). After doing so, I'm now back to the original problem and am still hoping to find a solution, since waking up in the middle of the night to restart the server after it blew up is getting exhausting ... > > I've attached two more thread dumps from when the CPU and Interrupts were escalating and I had to restart the server. Please let me know if these are indicative of the spin state you previously referenced or whether there is any other data I can provide. > [Message sent by forum member 'rwillie6' (rwillie6)]
Nothing strange. Can you try openJDK b 50? That one has the fix for the spinning Selector so it will clear the issue.
Thanks
-- Jeanfrancois
> > http://forums.java.net/jive/thread.jspa?messageID=340113 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 5, 2009 2:32 PM
in response to: Jeanfrancois Ar...
|
|
|
Hi JF,
Thanks for the help. Unfortunately, after 12 hours of running on JDK 7 b50 the server exhibited the same increase in CPU usage and context-switch activity as before. Any other suggestions?
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 6, 2009 3:11 AM
in response to: rwillie6
|
|
|
|
|
Hi all,
We are suffering with that CPU usage peeks in GFv3 b43 promoted. We discovered that it has nothing to do with the applications loaded to GFv3. It is a very strange behaviour because when garbage collector starts to run, CPU usage never drops from 100% in Linux. We have profiled it via Netbeans6.5 profiler. Our gc related jvm settings is below:
<jvm-options>-server</jvm-options> <jvm-options>-XX:NewRatio=2</jvm-options> <jvm-options>-XX:PermSize=32m</jvm-options> <jvm-options>-XX:MaxPermSize=256m</jvm-options> <jvm-options>-XX:+UseConcMarkSweepGC</jvm-options> <jvm-options>-XX:+CMSPermGenSweepingEnabled</jvm-options> <jvm-options>-XX:+CMSClassUnloadingEnabled</jvm-options> <jvm-options>-Dsun.rmi.dgc.server.gcInterval=3600000</jvm-options> <jvm-options>-Dsun.rmi.dgc.client.gcInterval=3600000</jvm-options> <jvm-options>-Dcom.sun. enterprise.server.ss.ASQuickStartup=false</jvm-options> <jvm-options>-Xms40m</jvm-options> <jvm-options>-Xmx512m</jvm-options>
In order to avoid PermGenSpace exception we have changed our GC to ConcMark and after that it starts to give this non-understandable CPU usage issue. Also we observed that it starts this strange behaviour when memory usage is higher than Xmx which means GC starts to work. We tried too many configurations but could not solve that problem.
Any help or idea will be greatly appreciated.
- Burak
On Mon, Apr 6, 2009 at 12:32 AM, <glassfish@javadesktop.org> wrote:
> Hi JF, > > Thanks for the help. Unfortunately, after 12 hours of running on JDK 7 b50 > the server exhibited the same increase in CPU usage and context-switch > activity as before. Any other suggestions? > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=340680 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net > > [att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 6, 2009 7:58 AM
in response to: Burak Oguz
|
|
|
|
|
Hi all again,
We have tried it crossing these 1.5.0_18, 6u7, 6u12, 6u13 JDKs with GFv3 builds b43, b31. We experienced 100% CPU problem again.
-Burak
On Mon, Apr 6, 2009 at 1:11 PM, Burak Oguz <burak.oguz@gmail.com> wrote:
> Hi all, > > We are suffering with that CPU usage peeks in GFv3 b43 promoted. We > discovered that it has nothing to do with the applications loaded to GFv3. > It is a very strange behaviour because when garbage collector starts to run, > CPU usage never drops from 100% in Linux. We have profiled it via > Netbeans6.5 profiler. Our gc related jvm settings is below: > > <jvm-options>-server</jvm-options> > <jvm-options>-XX:NewRatio=2</jvm-options> > <jvm-options>-XX:PermSize=32m</jvm-options> > <jvm-options>-XX:MaxPermSize=256m</jvm-options> > <jvm-options>-XX:+UseConcMarkSweepGC</jvm-options> > <jvm-options>-XX:+CMSPermGenSweepingEnabled</jvm-options> > <jvm-options>-XX:+CMSClassUnloadingEnabled</jvm-options> > <jvm-options>-Dsun.rmi.dgc.server.gcInterval=3600000</jvm-options> > <jvm-options>-Dsun.rmi.dgc.client.gcInterval=3600000</jvm-options> > <jvm-options>-Dcom.sun. > enterprise.server.ss.ASQuickStartup=false</jvm-options> > <jvm-options>-Xms40m</jvm-options> > <jvm-options>-Xmx512m</jvm-options> > > In order to avoid PermGenSpace exception we have changed our GC to ConcMark > and after that it starts to give this non-understandable CPU usage issue. > Also we observed that it starts this strange behaviour when memory usage is > higher than Xmx which means GC starts to work. We tried too many > configurations but could not solve that problem. > > Any help or idea will be greatly appreciated. > > - Burak > > > > On Mon, Apr 6, 2009 at 12:32 AM, <glassfish@javadesktop.org> wrote: > >> Hi JF, >> >> Thanks for the help. Unfortunately, after 12 hours of running on JDK 7 >> b50 the server exhibited the same increase in CPU usage and context-switch >> activity as before. Any other suggestions? >> [Message sent by forum member 'rwillie6' (rwillie6)] >> >> http://forums.java.net/jive/thread.jspa?messageID=340680 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net >> For additional commands, e-mail: users-help@glassfish.dev.java.net >> >> > [att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 7, 2009 1:52 AM
in response to: Jeanfrancois Ar...
|
|
|
|
|
Unfortunately, this is still occurring, nearly like clockwork every 8-12 hours. I've attached two images; one of the CPU usage for the last day and week, the other for the context-switch activity during the same period.
Every 8-12 hours, there is an (as yet) unexplained increase in CPU and context-switch activity that is unrelated to traffic patterns. I've haven't noticed any unusual GC times, but I will post a GC log next time this happens in case anybody sees something I'm missing.
From looking at the stack traces repeatedly, many of which are posted above, I haven't seen anything indicative; mostly threads just waiting.
We're currently running on Ubuntu, but are considering trying Solaris. Any thoughts on whether that could help, or would be better for performance anyway?
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 8, 2009 7:00 PM
in response to: Jeanfrancois Ar...
|
|
|
|
|
Hi JF,
I had observed the GC log when the server was behaving appropriately but had not captured a GC log from when it was being problematic (since the gc.log was overwritten on server restart). After copying the GC log on the last two cpu / context-switch spikes, I think the problem is GC related. Attached are two zips, each containing a jstack dump and the gc.log when the problem was occurring.
I noticed for most of the GC log looks normal (many PSYoungGen collections with intermittent Full GC), but at the end, when the CPU usage was so high, Full GC's are continually running back to back. Since the jstack dumps have shown mostly waiting threads, it seems like the GC must be responsible for the increased CPU and context-switches and eventual non-responsiveness of the server if not restarted. Any idea why this might be? Is it configuration related? Application related? Etc?
Thanks for all your help so far. I can't believe I didn't notice this sooner!
For reference, here are the jvm-options from domain.xml: -XX:+PrintGCDetails -Xloggc:${com.sun.aas.instanceRoot}/logs/gc.log -XX:HeapDumpPath=/mnt/dumps -XX:+HeapDumpOnOutOfMemoryError -server -Xmx2500m -Xms2500m -Xmn1000m -Xss128k -XX:+AggressiveOpts -XX:+AggressiveHeap -XX:+DisableExplicitGC -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -Dcom.sun.enterprise.server.ss.ASQuickStartup=false -XX:MaxPermSize=192m -Djava.endorsed.dirs=${com.sun.aas.installRoot}/lib/endorsed -Djava.security.policy=${com.sun.aas.instanceRoot}/config/server.policy -Djava.security.auth.login.config=${com.sun.aas.instanceRoot}/config/login.conf -Djavax.net.ssl.keyStore=${com.sun.aas.instanceRoot}/config/keystore.jks -Djavax.net.ssl.trustStore=${com.sun.aas.instanceRoot}/config/cacerts.jks -Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext${path.separator}${com.sun.aa\ s.derbyRoot}/lib -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver -Djavax.management.builder.initial=com.sun.enterprise.admin.server.core.jmx.AppServerMBeanServerBuilder -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory -Dcom.sun.enterprise.taglibs=appserv-jstl.jar,jsf-impl.jar -Dcom.sun.enterprise.taglisteners=jsf-impl.jar -XX:NewRatio=2
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 9, 2009 10:05 AM
in response to: rwillie6
|
|
|
That looks like a memory leak to me (constantly increasing Old Generation to near-full), and you start really being in trouble around 46,000 seconds in (on the first one, anyway), when a full GC doesn't empty the young generation (and thus, the old generation is full). It looks like you are leaking from the very beginning, though, even if you don't notice until the old generation fills up and you start running a GC every few seconds. Since it seems unlikely that the memory leak is GF's fault, it's almost certainly going to be application related (it's difficult to configure things to make them leak memory =).
To try to deal with it, I'd suggest taking a jmap dump (as Scott Oaks suggested above) and looking through that. If you aren't afraid of a bit of command-line diving, you can forgo the memory analyzers and use jmap -histo:live [process id] (which will print out the types, instance counts, and memory sizes of everything in memory). You'll want to run it once towards the beginning and once when you start having trouble, then compare the two for significant differences. The leaking object type will likely stick out, and give you a clue as to what in your application isn't going away properly.
glassfish@javadesktop.org wrote: > Hi JF, > > I had observed the GC log when the server was behaving appropriately but had not captured a GC log from when it was being problematic (since the gc.log was overwritten on server restart). After copying the GC log on the last two cpu / context-switch spikes, I think the problem is GC related. Attached are two zips, each containing a jstack dump and the gc.log when the problem was occurring. > > I noticed for most of the GC log looks normal (many PSYoungGen collections with intermittent Full GC), but at the end, when the CPU usage was so high, Full GC's are continually running back to back. Since the jstack dumps have shown mostly waiting threads, it seems like the GC must be responsible for the increased CPU and context-switches and eventual non-responsiveness of the server if not restarted. Any idea why this might be? Is it configuration related? Application related? Etc? > > Thanks for all your help so far. I can't believe I didn't notice this sooner! > > For reference, here are the jvm-options from domain.xml: > -XX:+PrintGCDetails > -Xloggc:${com.sun.aas.instanceRoot}/logs/gc.log > -XX:HeapDumpPath=/mnt/dumps > -XX:+HeapDumpOnOutOfMemoryError > -server > -Xmx2500m > -Xms2500m > -Xmn1000m > -Xss128k > -XX:+AggressiveOpts > -XX:+AggressiveHeap > -XX:+DisableExplicitGC > -XX:+UseParallelGC > -XX:+UseParallelOldGC > -XX:ParallelGCThreads=8 > -Dcom.sun.enterprise.server.ss.ASQuickStartup=false > -XX:MaxPermSize=192m > -Djava.endorsed.dirs=${com.sun.aas.installRoot}/lib/endorsed > -Djava.security.policy=${com.sun.aas.instanceRoot}/config/server.policy > -Djava.security.auth.login.config=${com.sun.aas.instanceRoot}/config/login.conf > -Djavax.net.ssl.keyStore=${com.sun.aas.instanceRoot}/config/keystore.jks > -Djavax.net.ssl.trustStore=${com.sun.aas.instanceRoot}/config/cacerts.jks > -Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext${path.separator}${com.sun.aa\ > s.derbyRoot}/lib > -Djdbc.drivers=org.apache.derby.jdbc.ClientDriver > -Djavax.management.builder.initial=com.sun.enterprise.admin.server.core.jmx.AppServerMBeanServerBuilder > -Dcom.sun.enterprise.config.config_environment_factory_class=com.sun.enterprise.config.serverbeans.AppserverConfigEnvironmentFactory > -Dcom.sun.enterprise.taglibs=appserv-jstl.jar,jsf-impl.jar > -Dcom.sun.enterprise.taglisteners=jsf-impl.jar > -XX:NewRatio=2 > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=341267 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net > >
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net For additional commands, e-mail: users-help@glassfish.dev.java.net
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 10, 2009 1:34 AM
in response to: Jacob Kessler
|
|
|
Hi Jacob,
love a good CLI session... so, I setup a cron to run jmap -histo:live every hour and collected the data until the problem resurfaced. After looking at the dumps, the class using the most memory was [C, which from what I have discovered seems to mean char arrays. When I started collecting the data (which was hours after restarting the server, but hours before the symptoms surfaced again), [C was using approx. 700MB. When the CPU spikes began to occur, [C was using 1.3GB. The next most memory intensive class in each dump was [B, but the usage was roughly the same in all dumps, fluctuating less than 20MB.
So, any advice on how to determine what is creating / using / holding on to the char arrays? I'm going to try generating a heap dump and analyzing it with eclipse memory analyzer or jhat. Hopefully that will show me the enclosing object ...
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 10, 2009 7:28 AM
in response to: rwillie6
|
|
|
|
|
There are several programs that will read a heap dump and show you what is holding onto the memory -- you can see who is holding on to your char arrays (likely strings, which in turn are held in some container, and so on -- the idea is to look at those objects and determine who is holding references to them and preventing them from being collected). The JDK comes with jhat, which lets you use a web browser to look through it and even do rudimentary queries on the heap; Netbeans will load a heap and let you poke around inside of it; there is an Eclipse memory analyzer which does a good job too.
--Scott
On Fri, 2009-04-10 at 01:34 -0700, glassfish@javadesktop.org wrote:
> Hi Jacob, > > love a good CLI session... so, I setup a cron to run jmap -histo:live every hour and collected the data until the problem resurfaced. After looking at the dumps, the class using the most memory was [C, which from what I have discovered seems to mean char arrays. When I started collecting the data (which was hours after restarting the server, but hours before the symptoms surfaced again), [C was using approx. 700MB. When the CPU spikes began to occur, [C was using 1.3GB. The next most memory intensive class in each dump was [B, but the usage was roughly the same in all dumps, fluctuating less than 20MB. > > So, any advice on how to determine what is creating / using / holding on to the char arrays? I'm going to try generating a heap dump and analyzing it with eclipse memory analyzer or jhat. Hopefully that will show me the enclosing object ... > [Message sent by forum member 'rwillie6' (rwillie6)] > > http://forums.java.net/jive/thread.jspa?messageID=341594 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@glassfish.dev.java.net > For additional commands, e-mail: users-help@glassfish.dev.java.net
[att1.html]
|
|
|
|
|
|
|
|
Re: Debugging Glassfish CPU usage
Posted:
Apr 10, 2009 8:05 AM
in response to: Scott Oaks
|
|
|
Tracked it down to a HTML FragmentCache that was not appropriately releasing expired items. It would replace expired items as needed with their new versions, but expired items that were not accessed would build up indefinitely. Quite a leak. Created a cleaner task to run every couple hours to remove the expired items, hopefully that will take care of it. If it doesn't, then we're just trying to cache too much in the java heap (and our machine can only address 4GB). If that's the case, we'll have to either cache less overall, cache for shorter periods and clean often, externalize the memory consumption of the cache to something like memcached, or try a 64-bit solaris installation so we can address more memory.
Thanks all for the help!
|
|
|
|
|