Tiles: An Easy Tool for Managing tmux Sessions

I use tmux extensively whenever I write code. Typically, I have about ten or so tmux windows open on my main tmux session and may have one or two other tmux sessions with fewer windows. My main tmux session is where I do most of my work, and typically, I keep one window per project or bug I am working on. I would use my other sessions for writing notes, doing operational tasks on the cluster, etc.

I found working with raw tmux commands to be cumbersome, so I wrote a simple Python script, Tiles, to make it easier for me to manage my tmux sessions, create tmux sessions with a predefined list of windows, and attaching to existing tmux sessions.

Tiles reads a .tiles configuration file in your home directory. The syntax of the Tiles DSL was inspired by that of the Bazel build system. The syntax is as follows:

tmux_session(
    name = "session-name",
    windows = [
        ["window-name", "/path/to/directory/for/window"],
        ...
    ],
)

Typically, my .tiles file on my home machine (where I often work on open source projects in my spare time) might look something like the following:

tmux_session(
    name = "default",
    windows = [
        ["tensorflow", "~/Projects/tensorflow/tensorflow"],
        ["bazel", "~/Projects/bazelbuild/bazel"],
        ["jsonnet", "~/Projects/google/jsonnet"],
    ],
)

tmux_session(
    name = "notes",
    windows = [
        ["notes", "~/Notes"],
        ["blog", "~/Projects/dzc/davidzchen.github.io"],
    ],
)

To launch a tmux session with the windows "tensorflow", "bazel", and "jsonnet", with each window startng in its respective directories, run:

tiles start default

Now, the "default" name is special, and running a tiles command without specifying a name will cause tiles to look for a session called "default". Thus, to start my default session, I can simply run the following command:

tiles start

A work, I generally keep my tmux sessions running all the time on my desktop and simply ssh in and attach to my tmux sessions. For example, to attach to an existing tmux session called “ops”, simply run:

tiles attach ops

Tiles also has a handy tiles ls command, which simply runs tmux list-sessions to list the currently active sessions.

Some future improvements I planning to make to Tiles include:

  • Making tiles available on PIP
  • Configuring panes within each window
  • Supporting GNU Screen in addition to tmux

If you want to give Tiles a try, check out the Tiles website and documentation and repository on GitHub. Feel free to open an issue or send a pull request if you have any feature requests or find any bugs.

A New Look

I realize that I have not posted to this blog for over a year. After looking over it again, I felt that the current design looked a bit dated and decided to give it a bit of a facelift. I wanted a look that is cleaner, better focused on the content, and more flexible for different types of media, whether it be text, code, images, or embedded content.

I began by reducing the size of the ridiculously large heading, making it look like a more conventional navbar, and giving it a dark background. Then, I shrunk the sidebar and expanded the main content, added a subtle blue background color to the sidebar modules for contrast, and changed the link color from red to blue. To match the header, I applied dark backgrounds to the page footer and pager buttons and changed the color scheme of the code blocks from base16-ocean-light to its dark variant.

In addition, I restored the font sizes of h1-h6 heading tags to Bootstrap’s defaults and using the normal font weight in order to better bring out the beauty of Kunth’s Computer Modern typefaces. Finally, I converted the style sheets from Less to Sass in order to take advantage of Jekyll’s built-in Sass support. After all, Bootstrap is also moving from Less to Sass starting with Bootstrap 4.

As for content, there are many topics that I am either itching to write about or have started writing. I have also slightly reorganized my blog categories since I am planning to expand the scope of this blog to also include some of my non-software hobbies and interests. In any event, I am expecting to update this blog a lot more frequently, so stay tuned.

Building a Self-Service Hadoop Platform at LinkedIn with Azkaban

At this year’s Hadoop Summit in San Jose, CA, I gave a talk on Building a Self-Service Hadoop Platform at LinkedIn with Azkaban. Azkaban is LinkedIn’s open-source workflow manager first developed back in 2009 with a focus on ease of use. Over the years, Azkaban has grown from being just a workflow scheduler for Hadoop to being an integrated environment for Hadoop tools and the primary front-end to Hadoop at LinkedIn.

The abstract and slides are below. A video of my talk will be available in the coming weeks.

Abstract

Hadoop comprises the core of LinkedIn’s data analytics infrastructure and runs a vast array of our data products, including People You May Know, Endorsements, and Recommendations. To schedule and run the Hadoop workflows that drive our data products, we rely on Azkaban, an open-source workflow manager developed and used at LinkedIn since 2009. Azkaban is designed to be scalable, reliable, and extensible, and features a beautiful and intuitive UI. Over the years, we have seen tremendous growth, both in the scale of our data and our Hadoop user base, which includes over a thousand developers, data scientists, and analysts. We evolved Azkaban to not only meet the demands of this scale, but also support query platforms including Pig and Hive and continue to be an easy to use, self-service platform. In this talk, we discuss how Azkaban’s monitoring and visualization features allow our users to quickly and easily develop, profile, and tune their Hadoop workflows.

Slides

A Curious Case of GCC Include Paths

One time, I was building a large C++ codebase and encountered a number of compiler errors that appeared to be caused by constants defined in the system <time.h> not getting picked up. Curiously, it appeared that the time.h in the current source directory was being included instead, even though the include statement read:

#include <time.h>

From my understanding, the difference between rules for #include <header.h> and #include "header.h" was that the former searched a set of system header directories first while the latter first searched the current directory. Something was causing GCC to search the current directory for system headers.

To verify that this behavior was not caused by the project’s build system, I created a simple Hello World source file hello.cc that included <time.h>:

#include <stdio.h>
#include <time.h>

int main(int argc, char **argv) {
  printf("Hello world.");
  return 0;
}

I created a time.h in the same directory that would raise a compiler error if included:

#error "Should not be included"

Sure enough, when I compiled test.cc, it raised the error:

$ gcc -o hello hello.cc
In file included from hello.cc:2:0:
./time.h:1:2: error: #error "Should not be included"

However, when I ran the same command as root, the compilation succeeded. This meant that there was something in the environment for my user that differed from that of root that is causing GCC to search in the current directory for system headers. This was when I remembered that I was setting CPLUS_INCLUDE_PATH in my shell startup script, which I set so that GCC would search other directories, such as /opt/local/include, since I use MacPorts.

I finally found that the reason that the current directory is being searched is that I set my CPLUS_INCLUDE_PATH as follows:

export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/opt/local/include:...

Appending paths to path variables this way seemed innocuous since most of us follow this convention when adding to our $PATHs, but in this case, it turned out to not be so harmless.

Because $CPLUS_INCLUDE_PATH is not by default, the first entry is an empty string. One would expect that an empty string would simply be skipped, as is the case for $PATH. However, I started to wonder whether an empty string in the CPLUS_INCLUDE_PATH actually signified to GCC that the current directory should be searched. A simple test proved that it did:

$ export CPLUS_INCLUDE_PATH=/opt/include
$ gcc -o hello hello.cc
$ export CPLUS_INCLUDE_PATH=:/opt/include
$ gcc -o hello hello.cc
In file included from hello.cc:2:0:
./time.h:1:2: error: #error "Should not be included."

I eventually found that this was actually an obscure feature of GCC. I am curious to know why this feature was implemented in the first place. The only use case that comes to mind is to get #include <header.h> to behave exactly like #include "header.h", which seems more like a hack than a valid use case.

Gradle Dust.js Plugin

LinkedIn Dust.js is a powerful, high-performance, and extensible front-end templating engine. Here is an excellent article comparing Dust.js with other template engines.

After learning Gradle, I have been using it almost exclusively for my JVM projects. While Dust.js plugins have been written for Play Framework and JSP, but it seems that nobody had written one for Gradle to compile Dust.js templates at build time.

As a result, I wrote my own, which is available on GitHub. The plugin uses Mozilla Rhino to invoke the dustc compiler. You do not need to have Node.js or NPM installed to use the plugin.

Using the plugin is easy. First, add a buildscript dependency to pull the gradle-dustjs-plugin artifact:

buildscript {
  repositories {
    mavenCentral()
  }
  dependencies {
    classpath 'com.linkedin:gradle-dustjs-plugin:1.0.0'
  }
}

Then, apply the plugin:

apply plugin: 'dustjs'

Finally, configure the plugin to specify your input files:

dustjs {
  source = fileTree('src/main/tl') {
    include 'template.tl'
  }
  dest = 'src/main/webapp/assets/js'
}

At build time, the dustjs task will compile your templates to JavaScript files. The basename of the template file is used as the current name. For example, compiling the template template.tl is equivalent to running the following dustc command:

dustc --name=template source/template.tl dest/template.js

Please check it out and feel free to open issues and pull requests.