File input partially reading lines

Description

My IIS logs directory has a total of 250~ sub directories at 2 levels and a total of about 10,000~ files coming up to 850mb. When I try to run log stash to just grok all of them and output any errors I start to see _grokparsefailure tags appear and go into my stdout. The message that comes with it is mangled, like it started reading the line not at the beginning. Below I have listed all of th information that should help you debug the issue.

An important note... if I make the path property of the file input more specific it seems like it works fine. The more unspecific I make it the more erratic it gets.

LogStash: 1.2.2
JRE: 1.7.0_45
OS: Windows Server 2008 R2
Ran with: C:\>java -jar logstash-1.2.2-flatjar.jar agent -f logstash.conf -vv -l logstash.log

logstash.conf
input {
file {
type => "iis"
path => "X:/IISLogs///*.log"
debug => true
}
}

filter {
if [type] == "iis" {
if [message] =~ /^#/ {
drop { }
}

grok {
match => ["path", "IISLogs/(:?%{HOST:application})/\d+/u_ex\d+_%{NOTSPACE:server}.log"]
}

grok {
match => ["message", "(?<event_time>\d{4}\d{2}\d{2} \d{2}:\d{2}:\d{2}) %{IP:host_ip} %{URIPROTO:method} %{NOTSPACE:uri_path} (?:

%{NOTSPACE:uri_query}) %{NUMBERort:int} (?:

|%{WORD:username}) %{IP:client_ip} %{NOTSPACE:user_agent} (?:

%{NOTSPACE:referer}) (?:

|%{NOTSPACE:uri_host}) %{NUMBER:status:int} %{NUMBER:sub_status} %{NUMBER:win32_status} %{NUMBER:bytes_sent:int} %{NUMBER:bytes_received:int} %{NUMBER:time_taken:int}"]
}

date {
match => ["event_time", "yyyy-MM-dd HH:mm:ss"]
remove_field => ["event_time"]
}
}
}

output {
if "_grokparsefailure" in [tags] {
stdout {
codec => "rubydebug"
}
}
}

log output:
{:timestamp=>"2013-11-10T22:13:46.546000-0600", :message=>"Received line", ath=>"X:/IISLogs/My.IIS.Web.Project/131101/u_ex13110109_SERVERNAME.log", :text=>"09:59:28 10.10.0.1 GET /status - 443 - 10.10.0.1 A-User/Agent - 10.10.0.1 200 0 0 362 114 0\r", :level=>:debug, :file=>"/Users/cmoran/Downloads/logstash-1.2.2-flatjar.jar!/logstash/inputs/file.rb", :line=>"129"}

The actual line from the log:
2013-11-01 09:59:28 10.10.0.1 GET /status - 443 - 10.10.0.1 A-User/Agent - 10.10.0.1 200 0 0 362 114 0

Activity

Show:
Chad Moran
June 26, 2014, 4:17 PM

It looks like this is likely due to using an array for each folder. JRuby arrays are NOT thread safe.

http://deployingjruby.blogspot.com/

Tom Robinson
June 26, 2014, 8:17 AM

Is there likely to be a fix for this anytime soon? It's been 7 months since it was raised. I don't know Ruby so can't help by submitting a patch.

Bart Am.
March 12, 2014, 11:06 AM

I can reproduce this on a directory with 4 files (7Ko, 60Ko, 17Ko, 27Ko).

One of the file seems to have an invalid entry line (which is not the case).

When I remove the first file (which is not the one with the error), the previous error disappears.

The problem occurs proportionnaly with file number (generate partial lines on valid log file).

Seems to be a concurrent read issue as the line are cut in random position.

By using the TCP input and sending the files over TCP, there is no problem either.

Any patch or workaround ?

Tom Robinson
December 23, 2013, 3:55 PM

I can reproduce this on a directory containing 123 files, no subdirectories.

input {
file {
start_position => "beginning"
charset => "UTF-8"
type => "iislog"
path => "C:/inetpub/logs/LogFiles/W3SVC3/*.log"
}
}

Tom Robinson
December 23, 2013, 3:17 PM

I'm having the same problem, it's making it unusable. Would going back to an earlier version help?

Assignee

Logstash Developers

Reporter

Chad Moran

Labels