HtmlCompressor and Guava – LimitInputStream ClassNotFoundException in Java when using Google Closure Compiler

So I started off my week by upgrading my organization’s code base with the latest available libraries to make it more efficient. Little did I know that the HTMLCompressor will make it a hard day for me. Here is what I was doing :-

I upgraded all libraries including the HtmlCompressor library to 2.4.8, and Guava library to version 23. I was able to make the build a seamless affair. But, as soon as I ran the code, I was greeted with a Runtime Exception, which was :-

HtmlCompressor provides two inbuilt implementations of JavaScript compressors :-
1. YUI Compressor
2. Google Closure Compiler

I was using the Google Closure Compiler

After scratching my head for sometime, I found out this was an issue with the HtmlCompressor itself.
HtmlCompressor uses a class ClosureJavaScriptCompressor class, which is an implementation over Google Closure Compiler.

This class internally uses the LimitInputStream class, which is a part of the Guava library. Now, Guava library version 14 have the LimitInputStream class Deprecated, and in version 15, it was removed. More information at Deprecated LimitInputStream

So, if you have a later version of the Guava library in your classpath, the older library gets overwritten, and Google Closure throws the above exception.

This exception is a RuntimeException, not a compile time exception, because it is a part of the compiled jar, which causes this exception only at Runtime.

How to correct this ?

There are two ways to correct this :-

Way I :
Use the outdated Guava library. The LimitInputStream class can be found in Guava version <= 14, though it is marked as deprecated in version 14, and was removed in version 15.

Way II :
Use your own implementation of Compressor. HtmlCompressor gives this provision in case you want such a scenario, using :-

The only requirement being that your Custom Class must implement the Compressor Interface provided by HtmlCompressor.

You can read more about the HtmlCompressor to suit your needs.

This is how I did it. Please put your questions and suggestions in the Comments block, and I will try and answer them, or incorporate them in this post.

How to make a part of the page non-indexable for Google Crawler and Bot

We all have faced a situation, where we want to index a page, but keep some parts of it as un-indexable. This can be for content, which is being consumed from a third-party.

For example, TripAdvisor provides user reviews and other content through its APIs. Any website can buy them and start showing TripAdvisor’s content on their site. But, as per the basic concept of SEO, this leads to Content Duplication , and may result in the website being penalized. TripAdvisor got the content first on their pages, and so they will never be penalized.

So, how do we go about making sure such content is not indexed by crawler, but still visible for User Experience ?

Google provides the googleon and googleoff tag. It is written as follows :-

All you need to do is, put the ‘googleoff’ tag, place your content which you don’t want to be indexable, and then put the ‘googleon’ tag to make the crawler resume indexing. An example is,

More about other such lesser known tweaks at :-

Google Guildelines