I use tmux extensively whenever I write code. Typically, I have about ten or so tmux windows open on my main tmux session and may have one or two other tmux sessions with fewer windows. My main tmux session is where I do most of my work, and typically, I keep one window per project or bug I am working on. I would use my other sessions for writing notes, doing operational tasks on the cluster, etc.
I found working with raw tmux commands to be cumbersome, so I wrote a simple Python script, Tiles, to make it easier for me to manage my tmux sessions, create tmux sessions with a predefined list of windows, and attaching to existing tmux sessions.
Tiles reads a .tiles
configuration file in your home directory. The syntax of
the Tiles DSL was inspired by that of the Bazel build system. The
syntax is as follows:
tmux_session(
name = "session-name",
windows = [
["window-name", "/path/to/directory/for/window"],
...
],
)
Typically, my .tiles
file on my home machine (where I often work on open
source projects in my spare time) might look something like the following:
tmux_session(
name = "default",
windows = [
["tensorflow", "~/Projects/tensorflow/tensorflow"],
["bazel", "~/Projects/bazelbuild/bazel"],
["jsonnet", "~/Projects/google/jsonnet"],
],
)
tmux_session(
name = "notes",
windows = [
["notes", "~/Notes"],
["blog", "~/Projects/dzc/davidzchen.github.io"],
],
)
To launch a tmux session with the windows "tensorflow"
, "bazel"
, and
"jsonnet"
, with each window startng in its respective directories, run:
tiles start default
Now, the "default"
name is special, and running a tiles
command without
specifying a name will cause tiles
to look for a session called "default"
.
Thus, to start my default session, I can simply run the following command:
tiles start
A work, I generally keep my tmux sessions running all the time on my desktop and simply ssh in and attach to my tmux sessions. For example, to attach to an existing tmux session called “ops”, simply run:
tiles attach ops
Tiles also has a handy tiles ls
command, which simply runs tmux
list-sessions
to list the currently active sessions.
Some future improvements I planning to make to Tiles include:
tiles
available on PIPIf you want to give Tiles a try, check out the Tiles website and documentation and repository on GitHub. Feel free to open an issue or send a pull request if you have any feature requests or find any bugs.
I realize that I have not posted to this blog for over a year. After looking over it again, I felt that the current design looked a bit dated and decided to give it a bit of a facelift. I wanted a look that is cleaner, better focused on the content, and more flexible for different types of media, whether it be text, code, images, or embedded content.
I began by reducing the size of the ridiculously large heading, making it look like a more conventional navbar, and giving it a dark background. Then, I shrunk the sidebar and expanded the main content, added a subtle blue background color to the sidebar modules for contrast, and changed the link color from red to blue. To match the header, I applied dark backgrounds to the page footer and pager buttons and changed the color scheme of the code blocks from base16-ocean-light to its dark variant.
In addition, I restored the font sizes of h1
-h6
heading tags to Bootstrap’s
defaults and using the normal font weight in order to better bring out the
beauty of Kunth’s Computer Modern typefaces. Finally, I
converted the style sheets from Less to Sass in order to take
advantage of Jekyll’s built-in Sass support. After all, Bootstrap
is also moving from Less to Sass starting with
Bootstrap 4.
As for content, there are many topics that I am either itching to write about or have started writing. I have also slightly reorganized my blog categories since I am planning to expand the scope of this blog to also include some of my non-software hobbies and interests. In any event, I am expecting to update this blog a lot more frequently, so stay tuned.
At this year’s Hadoop Summit in San Jose, CA, I gave a talk on Building a Self-Service Hadoop Platform at LinkedIn with Azkaban. Azkaban is LinkedIn’s open-source workflow manager first developed back in 2009 with a focus on ease of use. Over the years, Azkaban has grown from being just a workflow scheduler for Hadoop to being an integrated environment for Hadoop tools and the primary front-end to Hadoop at LinkedIn.
The abstract and slides are below. A video of my talk will be available in the coming weeks.
Hadoop comprises the core of LinkedIn’s data analytics infrastructure and runs a vast array of our data products, including People You May Know, Endorsements, and Recommendations. To schedule and run the Hadoop workflows that drive our data products, we rely on Azkaban, an open-source workflow manager developed and used at LinkedIn since 2009. Azkaban is designed to be scalable, reliable, and extensible, and features a beautiful and intuitive UI. Over the years, we have seen tremendous growth, both in the scale of our data and our Hadoop user base, which includes over a thousand developers, data scientists, and analysts. We evolved Azkaban to not only meet the demands of this scale, but also support query platforms including Pig and Hive and continue to be an easy to use, self-service platform. In this talk, we discuss how Azkaban’s monitoring and visualization features allow our users to quickly and easily develop, profile, and tune their Hadoop workflows.
One time, I was building a large C++ codebase and encountered a number of
compiler errors that appeared to be caused by constants defined in the system
<time.h>
not getting picked up. Curiously, it appeared that the time.h
in
the current source directory was being included instead, even though the include
statement read:
From my understanding, the difference between rules for #include <header.h>
and #include "header.h"
was that the former searched a set of system header
directories first while the latter first searched the current directory.
Something was causing GCC to search the current directory for system headers.
To verify that this behavior was not caused by the project’s build system, I
created a simple Hello World source file hello.cc
that included <time.h>
:
#include <stdio.h>
#include <time.h>
int main(int argc, char **argv) {
printf("Hello world.");
return 0;
}
I created a time.h
in the same directory that would raise a compiler error if
included:
#error "Should not be included"
Sure enough, when I compiled test.cc
, it raised the error:
$ gcc -o hello hello.cc
In file included from hello.cc:2:0:
./time.h:1:2: error: #error "Should not be included"
However, when I ran the same command as root, the compilation succeeded. This
meant that there was something in the environment for my user that differed from
that of root that is causing GCC to search in the current directory for system
headers. This was when I remembered that I was setting CPLUS_INCLUDE_PATH
in
my shell startup script, which I set so that GCC would search other
directories, such as /opt/local/include
, since I use MacPorts.
I finally found that the reason that the current directory is being searched is
that I set my CPLUS_INCLUDE_PATH
as follows:
export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/opt/local/include:...
Appending paths to path variables this way seemed innocuous since most of us
follow this convention when adding to our $PATH
s, but in this case, it turned
out to not be so harmless.
Because $CPLUS_INCLUDE_PATH
is not by default, the first entry is an empty
string. One would expect that an empty string would simply be skipped, as is
the case for $PATH
. However, I started to wonder whether an empty string in
the CPLUS_INCLUDE_PATH
actually signified to GCC that the current directory
should be searched. A simple test proved that it did:
$ export CPLUS_INCLUDE_PATH=/opt/include
$ gcc -o hello hello.cc
$ export CPLUS_INCLUDE_PATH=:/opt/include
$ gcc -o hello hello.cc
In file included from hello.cc:2:0:
./time.h:1:2: error: #error "Should not be included."
I eventually found that this was actually an obscure feature of GCC.
I am curious to know why this feature was implemented in the first place. The
only use case that comes to mind is to get #include <header.h>
to behave
exactly like #include "header.h"
, which seems more like a hack than a valid
use case.
LinkedIn Dust.js is a powerful, high-performance, and extensible front-end templating engine. Here is an excellent article comparing Dust.js with other template engines.
After learning Gradle, I have been using it almost exclusively for my JVM projects. While Dust.js plugins have been written for Play Framework and JSP, but it seems that nobody had written one for Gradle to compile Dust.js templates at build time.
As a result, I wrote my own, which is available on
GitHub. The plugin uses Mozilla Rhino to invoke
the dustc
compiler. You do not need to have Node.js or NPM installed to use
the plugin.
Using the plugin is easy. First, add a buildscript dependency to pull the
gradle-dustjs-plugin
artifact:
buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath 'com.linkedin:gradle-dustjs-plugin:1.0.0'
}
}
Then, apply the plugin:
apply plugin: 'dustjs'
Finally, configure the plugin to specify your input files:
dustjs {
source = fileTree('src/main/tl') {
include 'template.tl'
}
dest = 'src/main/webapp/assets/js'
}
At build time, the dustjs
task will compile your templates to JavaScript
files. The basename of the template file is used as the current name. For
example, compiling the template template.tl
is equivalent to running the
following dustc
command:
dustc --name=template source/template.tl dest/template.js
Please check it out and feel free to open issues and pull requests.