LebGeeks

A community for technology geeks in Lebanon.

You are not logged in.

#1 May 12 2011

rolf
Member

Lebgeek text content is not compressed

Yesterday I was randomly checking websites to see if they compress their text content, and I noticed that Lebgeeks does not. A lot of sites compress their content, and it is supported by modern browsers. Some hosts even enable it automatically. It can result in 50% or more reduction in size for text files (html/css/js...). It may sound like much but it's not that dramatic since most bandwidth goes to image transfer, and even more so on lebgeeks.com which seems to have rather light html/css/js source.
Yet as a matter of principle compression sounds like a good idea, one that is worth trying. It should be easy to enable it, Apache mod deflate can be enabled from htaccess with a couple of lines, it can also be done from PHP. There is also mod gzip but it involves a bit more tinkering. If anyone wants to know more on this topic I can share my findings here.
This does sound like a topic about lebgeeks.com but please also consider it about HTTP content or transfer compression in general.

Offline

#2 May 12 2011

arithma
Member

Re: Lebgeek text content is not compressed

The downside of web page compression is CPU load. If the CPU for this website becomes the bottleneck, then compression will become the reason for the slowness of the website. I wonder if there's an adaptive compression module.
Lebgeeks uses nginx, not Apache.

Offline

#3 May 12 2011

Joe
Member

Re: Lebgeek text content is not compressed

It does seem like a very interesting idea although, I should say that lebgeeks mainly deal with text (well markup, but thats mainly text). Implementing such a compression can only induce problems (load, compatibility, ... ) for very small gain.

I may be wrong, maybe some compression algorithms aimed at text could prove very successful. Time for primary benchmarks :)

Offline

#4 May 12 2011

rolf
Member

Re: Lebgeek text content is not compressed

rahmu wrote:

It does seem like a very interesting idea although, I should say that lebgeeks mainly deal with text (well markup, but thats mainly text). Implementing such a compression can only induce problems (load, compatibility, ... ) for very small gain.

Not really. As I mentioned it is supported by modern browsers (which list "gzip" in the "Content-Accept" header), is compliant, and used extensively. google.com uses it, for example. If the browser does not support it (gzip is not listed in the Content-Accept header), the the client will be served plain text, as a fall-back.

Offline

#5 May 12 2011

proners
Member

Re: Lebgeek text content is not compressed

arithma wrote:

The downside of web page compression is CPU load. If the CPU for this website becomes the bottleneck, then compression will become the reason for the slowness of the website. I wonder if there's an adaptive compression module.
Lebgeeks uses nginx, not Apache.

rahmu wrote:

for very small gain

Nope it is quite the opposite. Transferring files on the network is much more costy, you can save more than 60% of the transfer time/size, for example the latest version of jquery minified gzipped is 33k, while the uncompressed version is 90k.
It is highly recommended by all standards to enable compression

Last edited by proners (May 12 2011)

Offline

#6 May 12 2011

rolf
Member

Re: Lebgeek text content is not compressed

I must look stupid now... it seems lebgeeks.com is compressed after all. I was mislead by one of these "test your website compression page". I still don't know of a reliable and easy way to test for compression. Of course I could try to install wireshark...

Last edited by rolf (May 12 2011)

Offline

#7 May 12 2011

proners
Member

Re: Lebgeek text content is not compressed

rolf wrote:

I must look stupid now... it seems lebgeeks.com is compressed after all. I was mislead by one of these "test your website compression page". I still don't know of a reliable and easy way to test for compression. Of course I could try to install wireshark...

Nope, not totally.
YSlow analysis of this page gave the following result: 4 uncompressed components. E score on "compress with gzip"

    http://www.lebgeeks.com/forums/viewtopic.php?...
    http://www.lebgeeks.com/forums/style/lebgeeksv2.css
    http://www.lebgeeks.com/forums/style/imports/base.css
    http://www.lebgeeks.com/forums/style/imports/lebgeeksv2_cs.css

Last edited by proners (May 12 2011)

Offline

#8 May 13 2011

arithma
Member

Re: Lebgeek text content is not compressed

Nope it is quite the opposite. Transferring files on the network is much more costy, you can save more than 60% of the transfer time/size, for example the latest version of jquery minified gzipped is 33k, while the uncompressed version is 90k.
It is highly recommended by all standards to enable compression

You're thinking one dimensionally.
Compression could not be always better since there's two components at play, CPU share (which is very small on shared hosting (1/64 sometimes of your average dedicated server CPU) and network.
It's not so simple, yet it's not that difficult to understand.
CPU and Network are like two beer bottlenecks connected in serial. On a small server, the CPU bottleneck could be very problematic, and even though you're saving off network, you'd have to wait (sometimes more) on the CPU. The worse of it is that the CPU gets so busy we'd start getting 500 errors off the bat with "server busy" error. Those "server is busy" stuff are CPU bottleneck issues, rather than congested networks in the usual case.
Note: Am not saying it doesn't work, instead am saying it's worth measuring.

Last edited by arithma (May 13 2011)

Offline

#9 May 13 2011

proners
Member

Re: Lebgeek text content is not compressed

if you are on shitty shared hosting(which most shared hostings are), performance shouldn't be your concern because it it will be fair at best.
Even though, compression is still a good option in this environment because of how the TCP protocol works, so longer transmissions means longer CPU wait cycles, and don't forget the network latency and the ARP packets.
So at worst you wouldn't want to compress frequently changing dynamic content, but at least you would want to compress all the static, js, css files..
Although i don't have numbers, i don't think that compression is more costy than a single fairly complex database operation
Compression should be enabled by default, and when it turns that it is the bottleneck(which i doubt), it can be deactivated for dynamic content, or it is worth considering a better hosting

If compression is the bottleneck imo, your application is not more complex than a fairly simple html page lol

Last edited by proners (May 13 2011)

Offline

#10 May 13 2011

MSD
Member

Re: Lebgeek text content is not compressed

proners wrote:

longer transmissions means longer CPU wait cycles

Can you elaborate on that?

proners wrote:

you wouldn't want to compress frequently changing dynamic content

AFAIK this (frequently changing content) is only relevant in caching not in compression, am I missing something?

Last edited by MSD (May 13 2011)

Offline

#11 May 13 2011

proners
Member

Re: Lebgeek text content is not compressed

Proners wrote:

at worst you wouldn't want to compress frequently changing dynamic content

I meant that if you have a CPU bottleneck, you can skip compressing dynamic content, but you should compress static components

longer transmissions means longer CPU wait cycles

Well for example, one aspect of this issue is that the number of connections that can be accepted by a server is limited, so if the number of connections become a bottleneck, you would want to speed up transfers to release resources on the server and allow for new connections

Look man when you have a system that is creating a dynamic page that is taking 500ms to generate, i doubt that 5ms more will matter.
If you are having a CPU bottleneck, you should look at your queries first

Offline

#12 May 13 2011

CSGeek
Member

Re: Lebgeek text content is not compressed

@arithma, we're in mid 2011 and you're still debating CPU load?

That sounds unprofessional because compression is a must. It's a how can we reduce on bandwidth consumption era.

Offline

#13 May 13 2011

rolf
Member

Re: Lebgeek text content is not compressed

arithma wrote:

You're thinking one dimensionally.
Compression could not be always better since there's two components at play, CPU share (which is very small on shared hosting (1/64 sometimes of your average dedicated server CPU) and network.
It's not so simple, yet it's not that difficult to understand.
CPU and Network are like two beer bottlenecks connected in serial. On a small server, the CPU bottleneck could be very problematic, and even though you're saving off network, you'd have to wait (sometimes more) on the CPU. The worse of it is that the CPU gets so busy we'd start getting 500 errors off the bat with "server busy" error. Those "server is busy" stuff are CPU bottleneck issues, rather than congested networks in the usual case.
Note: Am not saying it doesn't work, instead am saying it's worth measuring.

I just want to say, HTML being so verbose and repetitive, I think it can be compressed pretty easily and quickly.

Offline

#14 May 14 2011

arithma
Member

Re: Lebgeek text content is not compressed

@arithma, we're in mid 2011 and you're still debating CPU load?

That sounds unprofessional because compression is a must. It's a how can we reduce on bandwidth consumption era.

With virtualization, and mobiles (for the more general case of everything), CPU load is an important factor.

Try this: Stream a file directly through a script and then directly off the hard drive. Enjoy the difference in performance. I am trying to research a hunch, but I couldn't find any concrete references about DMA that web servers could be using (from hard disk to memory) to serve static disk files.

if you are on shitty shared hosting(which most shared hostings are), performance shouldn't be your concern because it it will be fair at best.

This is a very obtuse and strange statement. Under any allowed hardware, I should give my client, or deliver for my own software, the peak performance. What I am suggesting is "measuring" because it is worth "measuring". I am not claiming to be correct or wrong. It's not too difficult to do either, so the cost saving may be justified for the research (at least for a publication by someone for everyone else, since not everyone has to do it).

Even though, compression is still a good option in this environment because of how the TCP protocol works, so longer transmissions means longer CPU wait cycles, and don't forget the network latency and the ARP packets.

I am particularly talking about lebgeeks. We have a peak time since all of the people are from the same area, and finish work at almost the same time. samer may be able to give us some figures here.

Although i don't have numbers, i don't think that compression is more costy than a single fairly complex database operation

I am not sure about this yet, but it could be that reading directly from hard disk and sending to the network could be a great offload from the CPU. This would explain why scripts streaming files usually have a much worse performance than reading files directly on the server's disk.

So at worst you wouldn't want to compress frequently changing dynamic content, but at least you would want to compress all the static, js, css files..

This is of course assuming the web server knows how to cache the compressions it does.

Although i don't have numbers, i don't think that compression is more costy than a single fairly complex database operation

Database servers usually are distinct from the web servers, especially in shared environment, so your argument is moot.

and don't forget the network latency and the ARP packets

I can forget about them and assign just a network latency that will always exist, no matter what happens. But if that's so, why should I care about it. I guess we don't.

compression is still a good option in this environment because of how the TCP protocol works, so longer transmissions means longer CPU wait cycles, and don't forget the network latency and the ARP packets.

Usually, servers stream script output. (That's why in php, for example, you'd want to directly echo rather than save to a separate string). So, almost usually, whenever the script has finished executing, the client would have started to receive content a while ago. This is not possible with dynamic compression and means that the CPU will be more busy with each request a little bit more. It also means that the user will start to receive the request a little bit later. Which will finish later is what we're arguing. Am saying it's worth benchmarking since there's just a lot of variables, some of which we're not even familiar with, or can't conceive easily (concurrent client loads and their interplay, the threshold count of concurrent requests before the server starts having to queue the requests, whether the particular web server does use DMA to send files from disk to network without CPU intervention).

If compression is the bottleneck imo, your application is not more complex than a fairly simple html page lol

Your argument is that an html page, which a lot of people use, that's dynamically generated, is not worth being given the attention? Something is wrong somewhere, either in your argument, or in your thinking process.

Note: I do understand that compression ought to be enabled by default. However, not everyone can afford decent server technology (which starts at $100 a month). I can't accept the argument that if you're feeling limited by your budget, then you should spend more money. It's illogical.

Offline

#15 May 14 2011

J4D
Member

Re: Lebgeek text content is not compressed

Lebgeeks is fine. No need for that.

Offline

#16 May 14 2011

Georges
Member

Re: Lebgeek text content is not compressed

J4D wrote:

Lebgeeks is fine. No need for that.

I can browse the forums at 2KB/s. If compressed, should it drop to 1KB/s.

Sounds useful to me here in Lebanon.

Offline

#17 May 14 2011

XhacK
Member

Re: Lebgeek text content is not compressed

Server: nginx
Date: Sat, 14 May 2011 14:06:03 GMT
Content-Type: text/html; charset=iso-8859-1
Expires: Thu, 21 Jul 1977 07:30:00 GMT
Last-Modified: Sat, 14 May 2011 14:06:03 GMT
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Transfer-Encoding: chunked
Connection: Keep-Alive
Content-Encoding: gzip

200 OK

Offline

#18 May 14 2011

arithma
Member

Re: Lebgeek text content is not compressed

It's funny. But anyway, if you profile this particular page on a 256DSL connection, the "waiting" for the server is on the same order as the "receiving" period. It ranges from half of it to about the same to it.
I believe this is considered a significant figure.

Offline

#19 May 14 2011

samer
Admin

Re: Lebgeek text content is not compressed

Gzip is enabled (with a compression level of 2).
You can check the response header in Wireshark.

Offline

#20 May 15 2011

proners
Member

Re: Lebgeek text content is not compressed

arithma wrote:

With virtualization, and mobiles (for the more general case of everything), CPU load is an important factor.

we are not talking about running a webserver off a mobile device lol

Try this: Stream a file directly through a script and then directly off the hard drive. Enjoy the difference in performance. I am trying to research a hunch, but I couldn't find any concrete references about DMA that web servers could be using (from hard disk to memory) to serve static disk files.

Webservers caches in RAM commonly requested static files, so you gzip once and serve thounsands

 This is a very obtuse and strange statement. Under any allowed hardware, I should give my client, or deliver for my own software, the peak performance

So you want to spend a few researching this matter
therefore either your ego is blinding you from the simple truth
or you are a philanthropist and you won't charge these extra hours to your client.. which i would have to commend you for *cough*

This is of course assuming the web server knows how to cache the compressions it does.

of course it does, webservers have different handlers for different types of files.

Database servers usually are distinct from the web servers, especially in shared environment, so your argument is moot.

no comment

Your argument is that an html page, which a lot of people use, that's dynamically generated, is not worth being given the attention? Something is wrong somewhere, either in your argument, or in your thinking process.

I guess you missed my point here. If compression is "the" bottleneck of your application, then your application is fairly simple

Note: I do understand that compression ought to be enabled by default. However, not everyone can afford decent server technology (which starts at $100 a month)

no comment

I can't accept the argument that if you're feeling limited by your budget, then you should spend more money. It's illogical.

This is a topic where a universal general consensus has been reached.
I can't accept the argument that one should re-invent the wheel just because he feels the urge to do so

samer wrote:

Gzip is enabled (with a compression level of 2).
You can check the response header in Wireshark.

so YSlow is being dumb, anybody fancy file a bug ?

Offline

#21 May 15 2011

XhacK
Member

Re: Lebgeek text content is not compressed

samer wrote:

Gzip is enabled (with a compression level of 2).
You can check the response header in Wireshark.

Or the Web Developer FF extension. :-)

Offline

#22 May 15 2011

proners
Member

Re: Lebgeek text content is not compressed

@samer
I have inspected this matter using live http headers firefox addon
Here is a sample request from google
requestfromgoogle.png
now here is the request of this page
requestfromlebgeeks.png

No gzip encoding is present... ?

Offline

#23 May 15 2011

samer
Admin

Re: Lebgeek text content is not compressed

Here's what I get:
Screen_shot_2011-05-15_at_6.57.41_PM.png

Also, it is set in the nginx configuration file.

Offline

#24 May 15 2011

proners
Member

Re: Lebgeek text content is not compressed

samer wrote:

Here's what I get:
http://cl.ly/6nJG/Screen_shot_2011-05-1 … .41_PM.png

Also, it is set in the nginx configuration file.

can you post the GET request for this page, maybe therein lies the answer why some people are getting the pages uncompressed

Offline

#25 May 15 2011

rolf
Member

Re: Lebgeek text content is not compressed

Maybe the server omits the Content-encoding header, and the browser automatically sniffs it as gzip. Or maybe it is only compressing static content for example (css, etc.) or only some files... Or on the opposite maybe it's declaring content as gzip but sending it as cleartext... Any abnormality like these might produce conflicting reports like what we saw. Maybe it is also only serving compressed data only to some browsers.
A crude but reliable way to check compression would be to peek inside the captured packets for a text page, in Wireshark. If you look inside the data section of the packet, you should be able to see some HTML if it's not compressed, or probably a bunch of characters and symbols if it's compressed. Make sure you're not looking at the packets for a image file, because these are always compressed!

Last edited by rolf (May 15 2011)

Offline

Board footer