Tailing Existing Log Files
td-agent can “tail” log files like the UNIX tail command, then import the results into the cloud.
Table of Contents
Prerequisites
- Basic knowledge of Treasure Data, including the toolbelt.
- Basic knowledge of td-agent.
Tailing JSON-based Logs
We are using the tail input plugin with the following configuration file. We assume that each line of the log corresponds to a well-formed JSON (should not span multiple lines).
| This feature is supported in td-agent v1.1.5.1 and higher. |
<source> type tail path /path/to/the/file tag td.test_db.test_table format json pos_file /var/log/td-agent/test_db_test_table.pos </source> <match td.*.*> type tdlog apikey ... auto_create_table buffer_type file buffer_path /var/log/td-agent/buffer/td use_ssl true </match>
Here is a sample log file. Every time a new line is appended to the log file, td-agent parses the line and adds it to its buffer. td-agent uploads the data into the cloud every 5 minutes; to upload the data immediately, please send a SIGUSR1 signal.
{"a"=>"b", "c"=>"d"}
{"a"=>"b", "c"=>"d", "e"=>1}
{"a"=>"b", "c"=>"d", "e"=>1, "f"=>2.0}
{"a"=>"b", "c"=>"d"}
{"a"=>"b", "c"=>"d", "e"=>1}
Issue the commands below to confirm that everything is configured correctly.
# append new entries $ tail -n 3 /path/to/log/file > sample.txt # take the last three lines of the log... $ cat sample.txt >>/path/to/buffer/file # and append them to the buffer file to trigger the tail plugin. # flush the buffer $ kill -USR1 `cat /var/run/td-agent/td-agent.pid` # confirm the upload $ td tables test_db
| td-agent handles log-rotation. td-agent keeps a record of the last position of the log, ensuring that each line is read exactly once even if the td-agent process goes down. However, since the information is kept in a file, the "exactly once" guarantee breaks down if the file becomes corrupted. |
Tailing Custom-Formatted Logs
If your logs are in a custom format, you will need to write a custom parser (instructions). Once you have written the parser, please put the file into your /etc/td-agent/plugins/ directory.
We provide two example parsers: “URL-param style key-value pairs” and “ascii character delimited format”. Both formats are fairly common among our users.
# URL-param style key-value pairs last_name=smith&first_name=brian&age=22&state=CA # ASCII character delimited format. In this case, the delimiter is '|'. # There is usually a separate file that annotates the column names smith|brian|22|CA
| Tailing existing logs is by far the easiest way to get started with Treasure Data. We recommend logging everything as JSON. Here's why. |
Filtering Out the Records
If you need to filter logs (ex: filtering out impressions and just keeping clicks), the exec-filter plugin is useful. This plugin launches another script which takes STDIN as input and STDOUT as output, and filters logs accordingly.
Here’s an example configuration.
<source> type tail path /path/to/the/file1 tag filter format json pos_file /var/log/td-agent/file1.pos </source> <match filter> type exec_filter command /usr/lib64/fluent/ruby/bin/ruby /etc/td-agent/filter.rb in_format json # takes a JSON string from STDIN out_format json # generates a JSON string to STDOUT tag_key tag # The key for tags is "tag". time_key time # The key for timestamps is "time". </match> <match td.*.*> type tdlog apikey ... auto_create_table buffer_type file buffer_path /var/log/td-agent/buffer/td use_ssl true </match>
/etc/td-agent/filter.rb is the filter script (shown below). It filters out all the lines where the field “field0” is equal to “certain_value”. Errors are recorded in /var/log/td-agent/filter.rb.log.
open('/var/log/td-agent/filter.rb.log', 'a') { |f| f.puts "-- begin --" begin require 'json' STDOUT.sync = true while line = STDIN.gets # parse begin h = JSON.parse line rescue => e next # broken line end # filter # next if h["field0"] == "certain_value" # emit h['tag'] = 'td.testdb.test_table' puts h.to_json end rescue LoadError => e f.puts e.to_s end }
If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels. Live chat with our staffs also work well.