Details

    • Type: Bug/Feature
    • Status: Resolved (View workflow)
    • Resolution: Fixed
    • Affects Version/s: 1.3.3
    • Fix Version/s: None
    • Labels:

      Description

      Nicolas,

      You were right. Thank you.

      Background.

      20 shippers all Dev servers with minimal traffic running centos 5.10 and jdk 1.6.0_34 and 256M ram for shipper jvm.

      Shippers config.

      input {
      file {
      type => "web-access"
      path => "/zzz/platform/write/logs/access*"
      codec => multiline {
      pattern => "^%

      {TIMESTAMP_ISO8601}"
      negate => true
      what => previous
      }
      start_position => "beginning"
      exclude => "*.gz"
      tags => ["web" , "access"]
      }

      file {
      type => "web-catalina"
      path => "/zzz/platform/write/logs/catalina*"
      codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}

      "
      negate => true
      what => previous
      }
      start_position => "beginning"
      exclude => "*.gz"
      tags => ["web" , "catalina"]
      }

      file {
      type => "web-zzz"
      path => "/zzz/platform/write/logs/zzz*"
      codec => multiline {
      pattern => "^%

      {TIMESTAMP_ISO8601}"
      negate => true
      what => previous
      }
      start_position => "beginning"
      exclude => "*.gz"
      tags => ["web" , "zzz"]
      }

      file {
      type => "web-localhost"
      path => "/zzz/platform/write/logs/localhost*"
      codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}

      "
      negate => true
      what => previous
      }
      start_position => "beginning"
      exclude => "*.gz"
      tags => ["web" , "localhost"]
      }

      file {
      type => "web-tomcat"
      path => "/zzz/platform/write/logs/tomcat*"
      codec => multiline {
      pattern => "^%

      {TIMESTAMP_ISO8601}

      "
      negate => true
      what => previous
      }
      start_position => "beginning"
      exclude => "*.gz"
      tags => ["web" , "tomcat"]
      }
      }

      output {

      redis

      { host => "xxx.xxx.xxx.xxx" data_type => "list" key => "logstash" }

      }

      All 20 have the same config.

      relevant part of shippers init scripts

      export JAVA_HOME=/usr
      export SINCEDB_DIR=/opt/logstash/sincedb

      .# Directory where the logstash all in one jar lives
      LS_HOME=/opt/logstash

      .# Additional Java OPTS
      LS_JAVA_OPTS="-Xmx256m -Djava.io.tmpdir=$LS_HOME/tmp"

      .# logstash log directory
      LOG_DIR=/var/log/logstash

      .# logstash configuration directory
      CONF_DIR=/etc/logstash/conf.d

      .# logstash log file
      LOG_FILE=$LOG_DIR/$NAME.log

      .# Open File limit
      OPEN_FILES=2048

      .# Nice level
      NICE=19

      .# Filter threads
      FILTER_THREADS=1

      1 Redis and 1 indexer on dedicated host.

      2 ES cluster each with a dedicated server.

      1 Webserver with Kibana 3 dedicated host.

      Everything works as it should, the only issue at hand is that the shippers are dying with OOME.

      Logs..

      Exception in thread "LogStash::Runner" java.lang.OutOfMemoryError: Java heap space
      at org.joni.OptExactInfo.<init>(OptExactInfo.java:33)
      at org.joni.NodeOptInfo.<init>(NodeOptInfo.java:27)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:2113)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:1928)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:2177)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:1928)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:2165)
      at org.joni.Analyser.optimizeNodeLeft(Analyser.java:1939)
      at org.joni.Analyser.setOptimizedInfoFromTree(Analyser.java:2208)
      at org.joni.Analyser.compile(Analyser.java:157)
      at org.joni.Regex.<init>(Regex.java:162)
      at org.joni.Regex.<init>(Regex.java:139)
      at org.jruby.RubyRegexp.makeRegexp(RubyRegexp.java:165)
      at org.jruby.RubyRegexp.getPreprocessedRegexpFromCache(RubyRegexp.java:214)
      at org.jruby.RubyRegexp.preparePattern(RubyRegexp.java:474)
      at org.jruby.RubyString.gsubCommon19(RubyString.java:3119)
      at org.jruby.RubyString.gsubCommon19(RubyString.java:3107)
      at org.jruby.RubyString.gsub19(RubyString.java:3065)
      at org.jruby.RubyString$INVOKER$i$gsub19.call(RubyString$INVOKER$i$gsub19.gen)
      at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:336)
      at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:179)
      at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)
      at org.jruby.ast.CallOneArgBlockNode.interpret(CallOneArgBlockNode.java:60)
      at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
      at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
      at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
      at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:214)
      at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:346)
      at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:204)
      at org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
      at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
      at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
      A plugin had an unrecoverable error. Will restart this plugin.
      Plugin: <LogStash::Inputs::File type=>"web-access", path=>["/zzz/platform/write/logs/access*"], start_position=>"beginning", exclude=>["*.gz"], tags=>["web", "access"]>
      Error: Java heap space {:level=>:error}

      A plugin had an unrecoverable error. Will restart this plugin.
      Plugin: <LogStash::Inputs::File type=>"web-zzz", path=>["/zzz/platform/write/logs/zzz*"], start_position=>"beginning", exclude=>["*.gz"], tags=>["web", "zzz"]>
      Error: GC overhead limit exceeded {:level=>:error}
      output received {:event=>#<LogStash::Event:0x606c3f10 @data=

      {"@timestamp"=>"2014-02-20T09:45:51.800Z", "message"=>"2014-02-20 04:45:51,123 ERROR - - - [CachePoller] - Failed updating cache.\njava.lang.IllegalArgumentException: Site Name must not be null\n\tat com.zzz.emweb.business.util.SingleQuerySyncMessage.<init>(SingleQuerySyncMessage.java:9)\n\tat com.zzz.emweb.business.util.SystemCacheDAOSync.sendSync(SystemCacheDAOSync.java:66)\n\tat com.zzz.emweb.business.data.SystemCacheDAO.clearCache(SystemCacheDAO.java:249)\n\tat com.zzz.emweb.business.data.SystemCacheDAO.clearCache(SystemCacheDAO.java:244)\n\tat com.zzz.emweb.business.data.SystemCacheDAO.refreshCache(SystemCacheDAO.java:297)\n\tat com.zzz.emweb.business.data.SystemCacheDAO.populateCache(SystemCacheDAO.java:273)\n\tat com.zzz.emweb.crunchware.business.module.cache.CachePoller.loadData(CachePoller.java:56)\n\tat com.zzz.emweb.crunchware.business.module.cache.CachePoller.runIt(CachePoller.java:41)\n\tat com.zzz.emweb.crunchware.business.module.cache.CachePoller.run(CachePoller.java:29)", "@version"=>"1", "tags"=>["multiline", "web", "zzz"], "type"=>"web-zzz", "host"=>"smokevm-a.zzz.local", "path"=>"/zzz/platform/write/logs/zzz-05.00.01.01.log"}

      , @cancelled=false>, :level=>:info}
      failed to open /zzz/platform/write/logs/zzz-05.00.01.00.log.2013-12-27: GC overhead limit exceeded {:level=>:warn}
      A plugin had an unrecoverable error. Will restart this plugin.
      Plugin: <LogStash::Inputs::File type=>"web-zzz", path=>["/zzz/platform/write/logs/zzz*"], start_position=>"beginning", exclude=>["*.gz"], tags=>["web", "zzz"]>
      Error: GC overhead limit exceeded {:level=>:error}
      failed to open /zzz/platform/write/logs/zzz-05.00.01.00.log.2013-12-09: GC overhead limit exceeded {:level=>:warn}
      output received {:event=>#<LogStash::Event:0x4627f992 @data=

      {"@timestamp"=>"2014-02-20T09:48:10.898Z", "message"=>"2014-02-20 04:48:04,052 WARN - - - [PropertyPlaceholderConfigurer] - Could not load properties from URL [file:/etc/zzz/shared-messaging.properties]: /etc/zzz/shared-messaging.properties (No such file or directory)", "@version"=>"1", "type"=>"web-zzz", "tags"=>["web", "zzz"], "host"=>"smokevm-a.zzz.local", "path"=>"/zzz/platform/write/logs/zzz-05.00.01.01.log"}

      , @cancelled=false>, :level=>:info}

      java.lang.OutOfMemoryError: GC overhead limit exceeded

      Exception in thread "LogStash::Runner" java.lang.OutOfMemoryError: Java heap space
      at java.util.Arrays.copyOf(Arrays.java:2882)
      at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
      at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
      at java.lang.StringBuilder.append(StringBuilder.java:119)
      at org.jruby.RubyBasicObject.inspectObj(RubyBasicObject.java:1051)
      at org.jruby.RubyBasicObject.hashyInspect(RubyBasicObject.java:1020)
      at org.jruby.RubyBasicObject.inspect(RubyBasicObject.java:999)
      at org.jruby.RubyKernel.inspect(RubyKernel.java:2112)
      at org.jruby.RubyKernel$INVOKER$s$0$0$inspect.call(RubyKernel$INVOKER$s$0$0$inspect.gen)
      at org.jruby.RubyClass.finvoke(RubyClass.java:623)
      at org.jruby.runtime.Helpers.invoke(Helpers.java:495)
      at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:355)
      at org.jruby.RubyObject.inspect(RubyObject.java:534)
      at org.jruby.RubyHash$5.visit(RubyHash.java:810)
      at org.jruby.RubyHash.visitLimited(RubyHash.java:642)
      at org.jruby.RubyHash.visitAll(RubyHash.java:628)
      at org.jruby.RubyHash.inspectHash19(RubyHash.java:805)
      at org.jruby.RubyHash.inspect19(RubyHash.java:842)
      at org.jruby.RubyHash$INVOKER$i$0$0$inspect19.call(RubyHash$INVOKER$i$0$0$inspect19.gen)
      at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
      at rubyjit.Cabin::Outputs::IO$$tty_write_CD0D4CBEAEBA45932B68352C125A83C659DD8494._file_(file:/opt/logstash/logstash.jar!/cabin/outputs/io.rb:79)
      at rubyjit.Cabin::Outputs::IO$$tty_write_CD0D4CBEAEBA45932B68352C125A83C659DD8494._file_(file:/opt/logstash/logstash.jar!/cabin/outputs/io.rb)
      at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:181)
      at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
      at rubyjit.Cabin::Outputs::IO$$\=^^0ECBA4B605298C45B0B8A6C0E3990F2D99834468.block_0$RUBY$file_(file:/opt/logstash/logstash.jar!/cabin/outputs/io.rb:54)
      at rubyjit$Cabin::Outputs::IO$$\=^^0ECBA4B605298C45B0B8A6C0E3990F2D99834468$block_0$RUBY$file.call(rubyjit$Cabin::Outputs::IO$$\=^^_0ECBA4B605298C45B0B8A6C0E3990F2D99834468$block_0$RUBY$file_)
      at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:135)
      at org.jruby.runtime.Block.yield(Block.java:142)
      at org.jruby.ext.thread.Mutex.synchronize(Mutex.java:149)
      at org.jruby.ext.thread.Mutex$INVOKER$i$0$0$synchronize.call(Mutex$INVOKER$i$0$0$synchronize.gen)
      at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
      at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)

      Exception in thread "LogStash::Runner" java.lang.OutOfMemoryError: Java heap space
      Exception in thread "<file" java.lang.NullPointerException
      at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1214)
      at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:110)
      at java.lang.Thread.run(Thread.java:662)
      Exception in thread "<file" java.lang.NullPointerException
      at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1214)
      at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:110)
      at java.lang.Thread.run(Thread.java:662)

      jmap

      jmap 17032
      Attaching to process ID 17032, please wait...
      Debugger attached successfully.
      Server compiler detected.
      JVM version is 20.9-b04
      0x0000000040000000 49K /usr/java/jdk1.6.0_34/bin/java
      0x000000378ca00000 141K /lib64/ld-2.5.so
      0x000000378ce00000 1686K /lib64/libc-2.5.so
      0x000000378d200000 22K /lib64/libdl-2.5.so
      0x000000378d600000 146K /lib64/libpthread-2.5.so
      0x000000378de00000 600K /lib64/libm-2.5.so
      0x000000378e200000 52K /lib64/librt-2.5.so
      0x0000003790a00000 111K /lib64/libnsl-2.5.so
      0x00002b432956b000 47K /usr/java/jdk1.6.0_34/jre/lib/amd64/jli/libjli.s o
      0x00002b4329678000 13030K /usr/java/jdk1.6.0_34/jre/lib/amd64/server/libjv m.so
      0x00002b432a384000 64K /usr/java/jdk1.6.0_34/jre/lib/amd64/libverify.so
      0x00002b432a493000 229K /usr/java/jdk1.6.0_34/jre/lib/amd64/libjava.so
      0x00002b432a5d0000 52K /lib64/libnss_files-2.5.so
      0x00002b432a7db000 90K /usr/java/jdk1.6.0_34/jre/lib/amd64/libzip.so
      0x00002b432ed9c000 110K /usr/java/jdk1.6.0_34/jre/lib/amd64/libnet.so
      0x00002b432eeb3000 43K /usr/java/jdk1.6.0_34/jre/lib/amd64/libnio.so
      0x00002b432efbb000 88K /opt/logstash/tmp/jffi6209999662540504739.tmp

      jmap -finalizerinfo 17032
      Attaching to process ID 17032, please wait...
      Debugger attached successfully.
      Server compiler detected.
      JVM version is 20.9-b04
      Number of objects pending for finalization: 0

      jmap -heap 17032
      Attaching to process ID 17032, please wait...
      Debugger attached successfully.
      Server compiler detected.
      JVM version is 20.9-b04

      using thread-local object allocation.
      Parallel GC with 2 thread(s)

      Heap Configuration:
      MinHeapFreeRatio = 40
      MaxHeapFreeRatio = 70
      MaxHeapSize = 268435456 (256.0MB)
      NewSize = 1310720 (1.25MB)
      MaxNewSize = 17592186044415 MB
      OldSize = 5439488 (5.1875MB)
      NewRatio = 2
      SurvivorRatio = 8
      PermSize = 21757952 (20.75MB)
      MaxPermSize = 85983232 (82.0MB)

      Heap Usage:
      PS Young Generation
      Eden Space:
      capacity = 30867456 (29.4375MB)
      used = 26611440 (25.378646850585938MB)
      free = 4256016 (4.0588531494140625MB)
      86.2119638236465% used
      From Space:
      capacity = 29818880 (28.4375MB)
      used = 0 (0.0MB)
      free = 29818880 (28.4375MB)
      0.0% used
      To Space:
      capacity = 28770304 (27.4375MB)
      used = 0 (0.0MB)
      free = 28770304 (27.4375MB)
      0.0% used
      PS Old Generation
      capacity = 178978816 (170.6875MB)
      used = 178978792 (170.6874771118164MB)
      free = 24 (2.288818359375E-5MB)
      99.99998659059182% used
      PS Perm Generation
      capacity = 40501248 (38.625MB)
      used = 40175848 (38.314674377441406MB)
      free = 325400 (0.31032562255859375MB)
      99.19656796748585% used

      I also have a dump available its about 240MB but I can’t share that via email. I am not sure what else you guys would need but anything else just ask and I will submit.

      Things I have tried, increase memory to 512. It just takes longer for the same thing to happen. Forced cms collection to start at 60%, same results. Used parallel collector etc.. No matter that I do, the Old Gen in jvm just grows until all space is consumed. I checked redis and there was no backlog on redis. Indexer gets data less than a second after the log entry on the server gets created and it pumps to ES with no issues. The issue is only on the shipper.

      I have a 240MB dump and jmap -histo available if you need it.

      Anything else I can provide let me know. Running logstash 1.3.3

      Thanks in advance.

      --Martin

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                logstash-dev Logstash Developers (Inactive)
                Reporter:
                mcrespo Martin Crespo
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: