Tuesday, December 27, 2011

Google's Closure Compiler (Advanced Optimizations) -- Lessons learned while refactoring a reusable library to survive compilation

I recently started a project were we wanted to switch a commonly used JavaScript library so that it would be usable after being minified with Google's Closure Compiler with Advanced Optimizations.  I wanted to share some of the lessons I learned while doing this conversion.

Closure Compiler defaults to Simple Optimizations which basically removes white space and comments, just about any code works after being compiled with this option.  Advanced Optimizations does much more, like: variable renaming, dead code removal, function inlining, and more.  Variable Renaming was the source of many of the bugs I hit doing this conversion.  One or more of these optimizations can make your code unusable if applied to a method/property/argument which needs to have a consistent interface outside of your library.  I call the technique for making the code survivable, "exporting the interface".  Here were the most common conditions which lead to bugs while using Advanced Optimizations:
  • Having functions/properties which need to be called "externally" by another libraries and scripts.
  • Having a caller create an objects and passing it into a function in your library. 
  • Having one of your methods create an object and passing it outside your library.
Here is an example of a class which isn't going to work after being compiled with Closure Compiler. 
        /**  
        * TimeManager.js - 

        * A version that shouldn't be minified with Advanced Optimizations
        */

        var TimeManager = {
           getDate: function() {
              return (new Date());
           },
           //Problem: The compiler will rename obj.timeOffset to something like b.a,
                      but the caller is still using the property name timeOffset.
           getDateOffset: function (obj) {
              timeOffset = 0;
              if(obj && obj.timeOffset) 
                timeOffset = obj.timeOffset;
              return (new Date(TimeManager.getTime() + timeOffset));
           },
           getTime: function () {
              return TimeManager.getDate().getTime();
           },
           //Problem: In this case we add the property name (vs. reading it), but 
                      again the compiler is going to rename the property to something
                      like G.c and the caller will be expecting MyObj.date.
           addDateProp: function (obj) {
              var res = obj || {};
              res.date = TimeManager.getDate();
              return res;
           }
        };
        //Problem: The class name is never used so all of this code gets removed as 
                   dead code.  Opps!
        //Problem: Even if you export the class, by adding "window.foo = TimeManager",
                   all the methods are going to be renamed to something like window.A,
                   A.a, A.b, A.c,... 

    In many cases, Closure Compiler considers any property named with the dot notation to be far game for variable renaming.  If you use the square bracket notation Closure Compiler converts it dot notation for you only without renaming the variable.  For example, Foo['bar']would be compiled to Foo.bar.  So we can use this to solve all of the problems with the above code.  Here is a version which can be compiled with Closure Compiler and still be usable:

    (function () {
        /** 
        * TimeManager.js -
        * A version that can be minified with Advanced Optimizations
        */

        var TimeManager = {
           getDate: function() {
              return (new Date());
           },
           //Solution - Using the square bracket notation tells Closure Compiler to
                        leave the property name intact. 
           getDateOffset: function (obj) {
              timeOffset = 0;
              if(obj && obj['timeOffset'])
                timeOffset = obj['timeOffset'];
              return (new Date(TimeManager.getTime() + timeOffset));
           },
           getTime: function () {
              return TimeManager.getDate().getTime();
           },
           //Solution - Using the square bracket notation tells Closure Compiler to 
                        leave the property name intact. 
           addDateProp: function (obj) {
              var res = obj || {};
              res['date'] = TimeManager.getDate();
              return res;
           }
        };
        //Solution - Export each of the methods with string literals and then export
                     the entire class to the window object.
        TimeManager['getDate'] = TimeManager.getDate;
        TimeManager['getDateOffset'] = TimeManager.getDateOffset;
        TimeManager['getTime'] = TimeManager.getTime;
        TimeManager['addDateProp'] = TimeManager.addDateProp;
        window['TimeManager'] = TimeManager;
    })();

    Finally, here what it looks like after being compiled:

    (function(){var a={getDate:function(){return new Date},b:function(b){timeOffset=0;b&&b.timeOffset&&(timeOffset=b.timeOffset);return new Date(a.getTime()+timeOffset)},getTime:function(){return a.getDate().getTime()},a:function(b){b=b||{};b.date=a.getDate();return b}};a.getDate=a.getDate;a.getDateOffset=a.b;a.getTime=a.getTime;a.addDateProp=a.a;window.TimeManager=a})();

    Saturday, January 22, 2011

    Taking Cygwin from cool to rockin...

    Here are three suggestions I have for making Cygwin work better for you:

    PuttyCyghttp://code.google.com/p/puttycyg/  If your like me you can't stand the limited command shell provided by windows.  It just sucks, full stop.   Putty supports select to copy and resizing the terminal without going into properties.  After you download puttycyg into the same directory as cygwin, just enter '-' as a host to open a local shell.

    EDIT: I've started using Console2 in favor of PuttyCyg.  I created a custom "tab" for CygWin (Settings->Tabs), with a Shell setting of C:\cygwin\Cygwin.bat.


    apt-cyg:  http://stephenjungels.com/jungels.net/projects/apt-cyg/ or it's new home : http://code.google.com/p/apt-cyg/ :  Ubuntu has been my Linux distro of choice for sometime now, partly because I love apt-get!   Now you can get something similar in cygwin ( apt-cyg ).  To install the script simply do the following:
      cd /usr/bin/
      wget http://stephenjungels.com/jungels.net/projects/apt-cyg/apt-cyg
      chmod 550 apt-cyg


      Here are command options:

    • "apt-cyg install " to install packages
    • "apt-cyg remove " to remove packages
    • "apt-cyg update" to update setup.ini
    • "apt-cyg show" to show installed packages
    • "apt-cyg find " to find packages matching patterns
    • "apt-cyg describe " to describe packages matching patterns
    • "apt-cyg packageof " to locate parent packages
       Here is a sample: (installing gcc-java)

       $ apt-cyg install gcc-java
       Working directory is /setup
       Mirror is ftp://mirror.mcs.anl.gov/pub/cygwin
       .... Truncated 
       Unpacking...
       Package gcc-mingw-java requires the following packages, installing:
       gcc-core gcc-java cygwin
       Package gcc-core is already installed, skipping
       Package gcc-java is already installed, skipping
       Package cygwin is already installed, skipping
       Package gcc-mingw-java installed
       Package gcc-core is already installed, skipping
       Package zlib is already installed, skipping
       Package _update-info-dir is already installed, skipping
       Running postinstall scripts
       *** Unpacking /etc/postinstall/gcc-mingw-java-3.4.4-20050522-1.tgz.  Please wait. ***
       Package gcc-java installed




    NotePad++http://notepad-plus-plus.org/ seems to work better with cygwin paths than ultraedit. I have alias's for both, but ultraedit always gets confused by the tilde(~) in the path name.  I suggest downloading and installing Notepad++ and then creating an alias for it in your .bashrc file.
       alias npedit='/cygdrive/c/Program\ Files\ \(x86\)/Notepad++/notepad++.exe '
    I hope to wrap this in a bash function and use "cygpath -w " to fix the few remaining pathing issues. 


    Thursday, February 11, 2010

    Getting started with Amazon Web Services

    Amazon Web Services (AWS) is designed so that you can build any size system you want without having to go get the hardware/software for it.  With these services you could easily create something as complex as Hulu or Facebook.  AWS exposes a  very expensive investment in technology so that anyone can build on top of it.  The best part is it's cheap!!! They charge you based on usage not on potential, so they don't make money until you start getting usage.  It's great for a company of any size.


    To get started: you are going to need to setup an AWS account and then turn on individual services in your AWS account.  The first time you click the "Sign up for XXX" Amazon asks you to input a billing method, but every service you add after that will default to first billing method, though you can opt to change it.


    Here are some helpful links before we get started:
    http://aws.amazon.com/solutions/aws-solutions/ - list of AWS solutions.
    http://calculator.s3.amazonaws.com/calc5.html  - Allows you to estimate what your monthly expenses.


    For this blog I'm assuming you want to setup a very basic web server and will talk about the services involved in that setup.  While I may go into some detail on how to setup a service the main goal of this blog is to get you familiar with the cloud and how it's components are used.


    Basic services for a first time AWS user:
    Before you get started goto https://console.aws.amazon.com/ec2/home, this is the AWS management console where most of the basics can be setup.


    From Computing:
       EC2 - If you know what a VM image is, then you can get the idea of how these work. It's basically a preconfigured VM image with an OS on it.  When you create the instance you get the option of choosing a base image to start with.  The AWS standard images are Windows or RedHat, but I'm a big fan of Ubuntu so I used the image (ami-55739e3c).   In the Ubuntu EC2 startup guide: https://help.ubuntu.com/community/EC2StartersGuide , you can find a list of the most up to date images.  After you select your image and move on, the console will ask you to select a security group, which is like a firewall for your instance ( later you will need to use this to open port 80).  Finally (if you are using a linux image) it will ask you to download a *.pem (SSH private key).  If you are using Putty, then read this: http://docs.amazonwebservices.com/AmazonEC2/gsg/2006-06-26/putty.html for instructions on how to import a pem key.  If you used an Ubuntu image, then the initial username is ubuntu and you will have to sudo for root level permissions (it's much safer that way anyway), the RedHat images defaults to using root.  I would create a new user right away, add it to the Admin group, and setup a new SSH key for that user (the Ubuntu wiki has instructions on how to do this via the command line).  


    From Networking:
          Elastic IP  -  This one is pretty easy.  You create a public IP address which points to your EC2 image, which is on a private IP (something like 10.##.##.##).  The AWS management console had a UI for pointing this external IP to your instance name. This external IP address is what you would point your A record to, for your DNS.


    From Storage:
    An EC2 instances's base drive volume is returned to the base snap-shot's state anytime you terminate your instance (think of this as shutting down your machine).  IMO, dealing with this is the more confusing parts of getting started with your AWS cloud.  AWS gives us two storage services to solve this problem; you can snapshot your instances (after setting them up) to S3 so that next time you startup your instance your setup is preserved and you use EBS blocks which are like SAN volumes for data that changes often (e.g. database's data files).
    So, for example if I was installing PostgreSQL I would use a setup like this:
    1. First I would install Postgres on my EC2 instance.
    2. Then I would attach a new EBS volume to the EC2 instance (create EBS in AWS console, format it, mount it and add to fstab).  
    3. Then move /opt/postgres/data to /newEBSvol/postgres/data 
    4. Then create a softlink that points /opt/postgres/data to /newEBSvol/postgres/data 
    5. Then take a new snap shot of the instance using the aws tools and store it as a private snap shot in my S3 bucket.
          EBS -  These are persistent blocks of data that you preallocate to some size.  Each one performs about as good as a SATA harddrive.  To get a large drive you could just allocate a single 1TB EBS block or you could lump many smaller ones together using software RAID.  Using the RAID option would give you faster IO, but since it's software RAID so it will cost  you some CPU and RAID'ed disk will use up more transfer time so the monthly cost will be just a little higher.  For trying it out, I would suggest using a single large volume :-). Just do a search for "EC2 mounting EBS" and you will find plenty of guides on setting up an EBS block for your instance.  Note: that Amazon don't place any guarantee that EBS-blocks will not experience data lose, but the chances of data lose is very small. So I suggest  you setup a nightly backup to S3, which is guaranteed not to have data lose.


          S3  - Slower but extremely cheap storage. It's basically like an FTP site, that you can keep any data on.  I use it for backing up all my home computers (I back up to it using JetS3), and doing a full nightly backup of my SVN server costs about $0.90 a month. :-) Instead of volumes you have what are called Buckets and each bucket can be made public or private and are given a unique URL for access over the web.    I currently have a my webserver setup to keep images, pdf's, ... i.e. non-dynamic data, there.  So, that I reduce the IO load on my EC2 Tomcat server.  Anyway, that required setting up special code for managing that and would be a more advanced topic, so for a basic setup the only thing we need S3 for is for taking snap-shots and doing backups to.  Here is a guide for taking a snap-shot of your instance : http://www.philchen.com/2009/05/19/how-to-save-a-snapshot-of-your-amazon-ec2-instance and here is a guide for using S3 for backup : http://jeremy.zawodny.com/blog/archives/007641.html .  I use JetS3t since it's easy to use in a cron job: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=617


    NOTE: as I was typing this up I found out that as of Dec. you can now move your entire instance onto an EBS block, which means you don't have to take snap-shots everytime you change anything.  I haven't tried this yet but you can read about it here: http://ec2-downloads.s3.amazonaws.com/BootFromEBSGSGGuide.pdf.  This would be a big help if it works out.


    Advanced Topics for a future blog:



    Queuing services - A SOAP queue you setup to pass messages between your EC2 instances.  Useful if you plan on setting up a cluster of machines to work together.  For Example: if you wanted to hand off file conversion work to a different EC2 instance than your web-server, the conversion server could read filenames from the in_queue and send the converted files back via the out_queue.


    Map Reduce -  Used to get many machines working in parallel that need to recombined work when they are all done working on a task.  For Example: Hadoop Sort uses the consept to sort large amounts of data very fast using many machines.


    SimpleDB - SOAP based Database which can be accessed directly by your clients (because it's SOAP) and is very fast. 


    CloudFront - A service for edging your network traffic.  Meaning your data traffic can travel internationally on Amazon's faster backbone and pop locally.  For Example: if someone in Hong Kong loads a site (which has Cloud Front), instead of getting to the US via the public network, they hit amazon's access point in Hong Kong and get to the servers via Amazon's network.


    Amazon RDS - This is a shared-disk MySQL cluster. That allows you to put nodes online and offline on the fly.  You can see my blog on shared-disk vs. shared-nothing clustering, if you want to know more. 
    Synergy


    Synergyhttp://synergy2.sourceforge.net/  This is a very cool program that lets you control more than one PC from one terminal.  In my case my laptop and my desktop.  I permanently have the Synergy server running on my home desktop, so that when I bring my laptop home all I have to do is startup the Synergy client and I can control my laptop from my desktop's keyboard and mouse (I have an old monitor arm to hold my laptop up next to the PCs monitor).  Simply moving the mouse to the right hand side of my desktop's monitor transfers the mouse pointer onto my laptop's screen, and causes keyboard commands to be routed to the second machine.  You can't move windows between the two PCs, but you can cut-and-paste between them.  Here is a video showing you how to use it: http://cnettv.cnet.com/using-synergy/9742-1_53-50003392.html .  Oh and BTW my laptop is running Linux and my desktop is running Vista.   NOTE: there is an annoying setting on the Synergy server which changes the client's screensaver settings to match the server's settings (its on by default).  This got me in trouble at work, because it keep turning off my "password on resume" setting and my work had a security program to check for this.  Eventually, I figured out it was synergy and stopped getting emails from my manager :-).