tcp input breaking JSON messages in half (and screwing them up as a result)

Description

It appears that the tcp input is now reading a fixed number of bytes from the incoming socket rather than reading a single line, and that's causing occasional issues where the tcp input splits an incoming message into two parts and tries to handle each part independently rather than as a single message.

After upgrading to 1.2.1 and making the appropriate changes to my config (change the tcp input's format => "json_event" to codec => "json"), I started to see that some of my longer JSON events sent via TCP are, on occasion, being split into two parts by the tcp input, which of course leads to total JSON parse failure. Watching the wire traffic, the split always happened at the same point that the message was split across two TCP packets; while the message was always split across two packets, the tcp input didn't always screw up, just occasionally. Any time the message got split, it got handled by the input as two independent messages... and as JSON, this is a miserable failure since neither of the two parts are complete JSON.

After looking into the differences between the 1.1.13 tcp input and the 1.2.1 tcp input, it looks like in 1.1.13 you were reading the socket data via socket.readline, but in 1.2.1 you're now using socket.sysread(16384)... my guess is that this is likely the cause.

After asking on the IRC channel just now, I made the change to using the plain codec for the tcp input, and introducing the json filter; this worked for a short bit, but in running my full testing suite, I am now still seeing occasional messages that get split, again right at the same place that the message is split between TCP packets. So this doesn't appear to be a solution.

Activity

Show:
Jordan Sissel
October 10, 2013, 10:44 PM

commit fd89b220d50634306f75d5ca008f4d5867a6173a fixes this.

Jordan Sissel
October 13, 2013, 6:23 AM
Darren Hobbs
May 27, 2014, 6:51 PM

This bug appears to have resurfaced in 1.4.0 - it was happening for me when I set tcpNoDelay to true and send a reasonably large stacktrace via TCP socket, using the json codec.

Jordan Sissel
May 28, 2014, 2:33 AM

- you most likely should be using the json_lines codec, not json codec. In the next release, we'll automatically select json_lines whenever anyone chooses json for tcp.

Darren Hobbs
May 28, 2014, 10:51 AM

Thanks, that sorted it, once I realised I needed to append "\n" to my json. Leaving this here in the hope it helps the next person to happen along!

Assignee

Nick Ethier

Reporter

Jason Levine

Labels

Fix versions

Affects versions

Configure