Streaming Import from Node.js Applications

fluent-logger-node’, a 3rd party library, is used to import data from Node.js applications to Treasure Data.

This article explains how to use the fluent-logger-node library.

Table of Contents

Prerequisites

  • Basic knowledge of Node.js and NPM.
  • Basic knowledge of Treasure Data, including the toolbelt.
  • Node.js 0.6 or higher (for local testing).
The fluent-logger-node library does not work in Heroku (here's why) or EngineYard.

Installing td-agent

fluent-logger-node requires td-agent to be installed on your application servers. td-agent is a daemon program dedicated to the streaming upload of any kind of the time-series data. td-agent is developed and maintained by Treasure Data, Inc.



The fluent-logger-node library enables Node.js applications to post records to their local td-agent. td-agent in turn uploads the data to the cloud every 5 minutes. Because the daemon runs on a local node, the logging latency is negligible.

To set up td-agent, please refer to the following articles; we provide deb/rpm packages for Linux systems.

If you have... Please look at...
MacOS X Installing td-agent on MacOS X
Debian / Ubuntu System Installing td-agent for Debian and Ubuntu
Redhat / CentOS System Installing td-agent for Redhat and CentOS
Joyent SmartOS Installing fluentd + td plugin on Joyent SmartOS
AWS Elastic Beanstalk Installing td-agent on AWS Elastic Beanstalk
td-agent is fully open-sourced under the fluentd project. td-agent extends fluentd with custom plugins for Treasure Data.

Modifying /etc/td-agent/td-agent.conf

Next, please specify your authentication key by setting the apikey option. You can view your api key with the td apikey:show command.

Note: You must first authenticate your account using the ‘td account’ command.

$ td apikey:show
3b7118fd3ad7e35bbd3c0e4f607ec7263aa93c30

Next, please set the apikey option in your td-agent.conf file.

Note: YOUR_API_KEY should be your actual apikey string.

# Treasure Data Input and Output
<source>
  type forward
  port 24224
</source>
<match td.*.*>
  type tdlog
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
</match>

Please restart your agent once these lines are in place.

$ sudo /etc/init.d/td-agent restart

td-agent will now accept data via port 24224, buffer it (var/log/td-agent/buffer/td), and automatically upload it into the cloud.

Using fluent-logger-node

Obtaining the Most Recent Version

The most recent version of fluent-logger-node can be found from here.

A Sample Application

A sample Express app using fluent-logger-node is shown below.

package.json

{
  "name": "node-example",
  "version": "0.0.1",
  "dependencies": {
    "express": "2.5.9",
    "fluent-logger": "0.1.0"
  }
}

Now use npm to install your dependencies locally:

$ npm install
fluent-logger@0.1.0 ./node_modules/fluent-logger
express@2.5.9 ./node_modules/express
|-- qs@0.4.2
|-- mime@1.2.4
|-- mkdirp@0.3.0
|-- connect@1.8.6 (formidable@1.0.9)

web.js

This is the simplest web app.

var express = require('express');
var app = express.createServer(express.logger());

var logger = require('fluent-logger');
logger.configure('td.test_db', {host: 'localhost', port: 24224});

app.get('/', function(request, response) {
  logger.emit('follow', {from: 'userA', to: 'userB'});
  response.send('Hello World!');
});
var port = process.env.PORT || 3000;
app.listen(port, function() {
  console.log("Listening on " + port);
});

Execute the app and go to http://localhost:3000/ in your browser.

$ node web.js

Confirming Data Import

Sending a SIGUSR1 signal will flush td-agent’s buffer; upload will start immediately.

$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`

To confirm that your data has been uploaded successfully, issue the td tables command as shown below.

$ td tables
+------------+------------+------+-----------+
| Database   | Table      | Type | Count     |
+------------+------------+------+-----------+
| test_db    | follow     | log  | 1         |
+------------+------------+------+-----------+
The first argument of post() determines the database name and table name. If you specify `td.test_db.test_table`, the data will be imported into the table *test_table* within the database *test_db*. They are automatically created at upload time.

Production Deployments

High-Availablability Configurations of td-agent

For high-traffic websites (more than 5 application nodes), we recommend using a high availability configuration of td-agent. This will improve data transfer reliability and query performance.

Monitoring td-agent

Monitoring td-agent itself is also important. Please refer to this document for general monitoring methods for td-agent.

Next Steps

We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.

For more specific assistance, please visit our support resources:


If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels. Live chat with our staffs also work well.