by Marliza Ramly Zurina Saaya Wahidah Md Shah Mohammad

Transcription

by Marliza Ramly Zurina Saaya Wahidah Md Shah Mohammad
version 2.6
by
Marliza Ramly
Zurina Saaya
Wahidah Md Shah
Mohammad Radzi Motsidi
Haniza Nahar
Faculty of Information and Communication Technology
Universiti Teknikal Malaysia Melaka (UTeM)
May 2007
Copyright © 2007 Fakulti Teknologi Maklumat dan Komunikasi, UTeM
TABLE OF CONTENT
1.
PROXY SERVERS......................................................................................... 1
1.2
2.
INTERNET CACHING ................................................................................... 4
2.1
2.2
2.3
2.4
3.
3.4
3.5
3.6
4.2
ACCESS CONTROLS ........................................................................ 25
List of ACL type......................................................................... 26
src .......................................................................................... 27
srcdomain ................................................................................ 28
dst .......................................................................................... 29
dstdomain ................................................................................ 29
srcdom_regex........................................................................... 30
dstdom_regex........................................................................... 30
time ........................................................................................ 31
url_regex ................................................................................. 32
urlpath_regex ........................................................................... 33
port......................................................................................... 34
proto ....................................................................................... 35
method .................................................................................... 36
browser ................................................................................... 36
proxy_auth............................................................................... 37
maxconn.................................................................................. 38
Create custom error page ........................................................... 39
EXERCISES ................................................................................. 40
CACHING.................................................................................................... 42
5.1
5.2
6.
HARDWARE AND SOFTWARE REQUIREMENT ............................................ 10
DIRECTORY STRUCTURE .................................................................. 11
GETTING AND INSTALLING SQUID ........................................................ 11
Custom Configuration for Network ............................................... 11
INSTALL SQUID............................................................................. 16
BASIC SQUID CONFIGURATION .......................................................... 17
Configure SQUID ....................................................................... 17
Basic Configuration.................................................................... 17
Starting Squid Daemon ............................................................. 19
Starting Squid Daemon .............................................................. 20
BASIC CLIENT SOFTWARE CONFIGURATION ............................................ 22
Configuring Internet Browser ...................................................... 22
Using proxy.pac File................................................................... 23
ACL CONFIGURATION............................................................................... 25
4.1
5.
HIERARCHICAL CACHING ...................................................................4
TERMINOLOGY FOR HIERARCHICAL CACHING .............................................5
INTERNET CACHE PROTOCOL ...............................................................7
BASIC NEIGHBOUR SELECTION PROCESS .................................................7
INTRODUCTION TO SQUID ......................................................................... 9
3.1
3.2
3.3
4.
KEY FEATURES OF PROXY SERVERS ........................................................2
Proxy Servers and Caching ...........................................................2
CONCEPTS .................................................................................. 42
CONFIGURING A CACHE FOR PROXY SERVER ............................................ 42
SQUID AND WEBMIN ................................................................................. 47
i
6.1 ABOUT WEBMIN ................................................................................ 47
6.2
OBTAINING AND INSTALLING WEBMIN .................................................. 47
Installing from a tar. gz.............................................................. 48
Installing from an RPM ............................................................... 48
After Installation ....................................................................... 49
6.3
USING SQUID IN WEBMIN ................................................................ 49
6.4
PORTS AND NETWORKING ................................................................ 50
Proxy port ................................................................................ 51
ICP port ................................................................................... 51
Incoming TCP address................................................................ 51
Outgoing TCP address ................................................................ 52
Incoming UDP address ............................................................... 52
Outgoing UDP address ............................................................... 52
Multicast groups........................................................................ 52
TCP receive buffer ..................................................................... 53
6.5
OTHER CACHES ............................................................................ 53
Internet Cache Protocol.............................................................. 53
Parent and Sibling Relationships .................................................. 54
When to Use ICP?...................................................................... 54
6.6
OTHER PROXY CACHE SERVERS ......................................................... 55
Edit Cache Host ........................................................................ 56
Hostname ................................................................................ 56
Type........................................................................................ 57
Proxy port ................................................................................ 57
ICP port ................................................................................... 57
Proxy only? .............................................................................. 58
Send ICP queries? ..................................................................... 58
Default cache ........................................................................... 58
Round-robin cache? ................................................................... 58
ICP time-to-live ........................................................................ 59
Cache weighting........................................................................ 59
Closest only.............................................................................. 59
No digest?................................................................................ 59
No delay?................................................................................. 60
Login to proxy .......................................................................... 60
Multicast responder ................................................................... 60
Query host for domains, Don’t query for domains .......................... 60
Cache Selection Options ............................................................. 61
Directly fetch URLs containing ..................................................... 61
ICP query timeout ..................................................................... 62
Multicast ICP timeout................................................................. 62
Dead peer timeout .................................................................... 62
Memory Usage.......................................................................... 63
Memory usage limit ................................................................... 63
Maximum cached object size....................................................... 64
6.7
LOGGING ................................................................................... 64
Cache metadata file................................................................... 65
Use HTTPD log format ................................................................ 65
Log full hostnames .................................................................... 66
Logging netmask....................................................................... 66
6.8
CACHE OPTIONS ........................................................................... 67
6.9
ACCESS CONTROL ......................................................................... 68
Access Control Lists ................................................................... 69
Edit an ACL .............................................................................. 69
Creating new ACL...................................................................... 70
Available ACL Types................................................................... 71
6.10 ADMINISTRATIVE OPTIONS ............................................................... 75
7.
ii
ANALYZER ................................................................................................ 78
7.1
7.2
7.3
7.4
7.5
STRUCTURE OF LOG FILE .................................................................. 78
Access log ................................................................................ 78
Cache log ................................................................................. 90
Store log.................................................................................. 93
METHODS ................................................................................... 96
Log Analysis Using Grep Command .............................................. 96
Log Analysis Using Sarg-2.2.3.1 .................................................. 96
SETUP SARG-2.2.3.1 .................................................................... 97
REPORT MANAGEMENT USING WEBMIN ................................................. 98
LOG ANALYSIS AND STATISTIC ........................................................ 105
iii
ABBREVIATIONS
Abbreviation
Details
ACL
Access Control List
CARP
Cache Array Routing Protocol
CD
Compact Disk
DNS
Domain Name Service
FTP
File Transfer Protocol
GB
Gigabyte
HTCP
Hyper Text Caching Protocol
HTTP
Hypertext Transfer Protocol
I/O
Input/Output
ICP
Internet Cache Protocol
IP
Internet Protocol
LAN
Local Area Network
MAC
Media Access Control
MB
Megabyte
RAM
Random Access Memory
RPM
Red Hat Package Manager
RTT
Round Trip Time
SNMP
Simple Network Management Protocol
SSL
Secure Socket Layer
UDP
User Datagram Protocol
URL
Uniform Resource Locator
UTeM
Universiti Teknikal Malaysia Melaka
WCCP
Web Cache Coordination Protocol
iv
1
Chapter
1. Proxy Servers
A Proxy Server is an intermediary server between the Internet browser
and the remote server. It acts like a "middleman" between the two
ends of the client/server network connection and also works with
browsers and servers or other application by supporting underlying
network protocols like HTTP. Furthermore, it store and download
documents in its local cache so that the downloading time from the
internet can be faster because the document is store in a local server.
For example, lets imagine when a user want to download documents
from the Internet browser with a specify URL address such as
http://www.yahoo.com, which then the document will be transfer to
workstation. (e.g UTeM to local workstation). In that situation, the
internet browser communicates directly with the proxy server UTem to
get the document.
In addition, a cache is combined with a proxy server which will make it
reliable for quicker transfer. In this matter, Internet browser will no
longer contact the remote server directly but it request document from
the proxy server.
1
Proxy Servers
1.2 Key features of proxy servers
Four main functions provided are:
ƒ
Firewalling and Filtering (security)
ƒ
Connection Sharing
ƒ
Administrative Control
ƒ
Caching service
Proxy Servers and Caching
Proxy Server with the caching of Web pages may leads to a better
improvement for QoS in network as in Figure 1-1. It can be specified in
three ways:
ƒ
Caching may preserve bandwidth on the network and proliferate
scalability
ƒ
Enhancement of response time (e.g: HTTP proxy cache can load
Web Pages more quickly into the browser)
ƒ
Proxy server caches boost to the availability, where Web pages
or other files in the cache remain accessible even if the original
source or an intermediate network link goes offline.
2
Proxy Servers
client
client
Proxy Server
Internet
client
client
client
Figure 1-1: Generic Diagram for Proxy Server
3
2
Chapter
2. Internet Caching
2.1 Hierarchical Caching
Cache Hierarchies are a logical extension of the caching concept. A
sharing concept might help and give some benefit for a group of Web
caches and a group of Web Clients. Figure 2-1 shows how it works.
However, there are some disadvantages as well. It will depends on the
specific situation discuss below whether the advantages will outweigh
the disadvantages.
4
3
Proxy server caches
returned page
5
client
Proxy server returns the
requested page to the client
Proxy Server
Web server returns
requested URL to
proxy server
Yes
internet
1
Client browser initiates
request to proxy server
for the URL
Is requested page in
proxy server cache?
2
No
Proxy server
requests the page
from the web server
Figure 2-1: Proxy Server Caching Process
4
Web server
Internet Caching
The major advantages are:
ƒ
Additional cache hits. In general, the cache hits that are
expected from the requested user will be at the neighbor caches.
ƒ
Request routing. The availability to direct the HTTP traffic along
a certain path can be done by routing requests to specific
caches. (e.g., accessing the Internet with two paths, one of it is
cheap and the other is being expensive, therefore, the user can
send HTTP traffic over the cheapest link using the request
routing.
The disadvantages among the concept:
ƒ
Configuration hassles. The coordination from both parties are
required to configure neighbors caches. As a result, it will put
some weight to the exacerbates membership
ƒ
Additional delay for cache misses. There are many factors to
consider due to the delay. For example, delays between peers,
link congestion, and whether or not ICP is used.
2.2 Terminology for Hierarchical Caching
Cache
It is refers to an HTTP proxy that store some requests.
Objects
It is a generic term for any document, image, or other type of data
that available on the Web. Nowadays, the Uniform Resource Locators
(URLs) will identify Web Page with objects (such as images, audio,
video and binary files) rather than documents or pages only from the
data available at HTTP, FTP, Gopher and other types of servers.
5
Internet Caching
Hit and misses
It is a valid copy when a cache hit the requested existing object in a
cache.
If the object does not exist or no longer valid, it is refer to cache miss.
That situation, a cache must forward cache misses toward the origin
server.
Origin Server
It is the authoritative source for an object. For example, the origin
server is the hostname in URL.
Hierarchy vs. Mesh
It is hierarchically arrange when the topology is like a tree structure or
in mesh when the structure is flat. In either case these terms simply
refer to the fact that caches can be ''connected'' to each other. In
squid it can be seen at directory cache after creating it.
Neighbours, Peers, Parents, Siblings
In general, the terms neighbour and peer are the same for caches in a
hierarchy or mesh. While, for parent and sibling will refer to the
relationship between a pair of caches.
Fresh, Stale, Refresh
The status of cached objects can be refer to
ƒ
A fresh object when a cache hit is returnable.
ƒ
A stale object and refresh object when the Squid refresh it by
including an IMS request header and forwarding the request on
toward the origin server.
6
Internet Caching
2.3 Internet Cache Protocol
A quick and efficient method of inter-cache communication in ICP's is
by offering a mechanism to establish a complex cache hierarchies. The
advantages by using are;
ƒ
ICP can be utilized by Squid to provide an indication of network
conditions.
ƒ
ICP messages are transmitted as UDP packets. It is easier to
implement because each cache needs to maintain only a single
UDP socket.
ICP may convey to some disadvantages as well. One of the failures in
ICP is when the links is highly congested, therefore the ICP become
useless where its caching is needed most. Furthermore, an extra delay
may be a factor in processing request due to the transmission time of
the UDP packet. As a result, ICP is not the appropriate for this delay in
some situation.
2.4 Basic Neighbour Selection Process
Before describing Squid features for hierarchical caching, first lets
briefly explain the neighbor selection process referring to. Squid must
decide where to forward the request when it is unable to satisfy the
request from cache. There are basically three choices can be use:
ƒ
parent cache
ƒ
sibling cache
ƒ
origin server
7
Internet Caching
How ICP can make decision for Squid?
ƒ
In parent and sibling cache, Squid will send an ICP query
requested URL message to its neighbors. Usually in a UDP
packets and Squid will remembers how many queries it sends for
a given request.
ƒ
By receiving ICP query in each neighboring, the URL will be
search in its own cache. If a valid copy of the URL exists, then
cache sends ICP_HIT, otherwise an ICP_MISS message.
ƒ
The querying cache now collects the ICP replies from its peers.
ƒ
If the cache receives an ICP_HIT reply from a peer, it
immediately forwards the HTTP request to that peer.
ƒ
If the cache does not receive an ICP_HIT reply, then all replies
will be ICP_MISS.
ƒ
Squid waits until it receives all replies, up to two seconds.
ƒ
If one of the ICP_MISS replies comes from a parent, Squid
forwards the request to the parent whose reply was the first to
arrive. We call this reply the FIRST_PARENT_MISS. If there is no
ICP_MISS from a parent cache, Squid forwards the request to
the origin server.
We have described the basic algorithm, to which Squid offers
numerous possible modifications, including mechanisms to:
ƒ
Send ICP queries to some neighbours and not to others.
ƒ
Include the origin server in the ICP “pinging” so that if the origin
server reply arrives before any ICP_HITs, the request is
forwarded there directly.
ƒ
8
Disallow or require the use of some peers for certain requests.
3
Chapter
3. Introduction to Squid
Squid is a high-performance proxy caching server for Web clients,
support FTP, gopher, and HTTP data objects. It has two basic
purposes;
ƒ
to provide proxy service from machines that must pass Internet
traffic through some form of masquerading firewall
ƒ
caching
Unlike traditional caching software, Squid handles all requests in a
single, non-blocking, I/O-driven process.
Squid keeps meta data and especially hot objects cached in RAM,
caches
DNS
lookups,
supports
non-blocking
DNS
lookups,
and
implements negative caching of failed requests.
Squid consists of a main server program, a Domain Name System
lookup program (dnsserver), a program for retrieving FTP data (ftpget)
and some management and client tools.
In other words Squid is
1. full featured Web proxy cache
2. free, open-source software
3. the result of many contributions by unpaid (and paid) volunteers
9
Introduction to Squid
Squid Support
ƒ
proxy and caching of Hypertext Transfer Protocol (HTTP), File
Transfer Protocol (FTP), and other
ƒ
Uniform Resource Locators (URLs)
ƒ
Proxiying for Secure Socket Layer (SSL)
ƒ
cache hierarchies
ƒ
Internet Cache Protocol (ICP), Hyper Text Caching Protocol
(HTCP), Cache Array Routing Protocol (CARP), Cache Digests
ƒ
transparent caching
ƒ
Web Cache Coordination Protocol (WCCP) (Squid v2.3 and
above)
ƒ
extensive access controls
ƒ
HTTP server acceleration
ƒ
Simple Network Management Protocol (SNMP)
ƒ
caching of DNS lookups
3.1 Hardware and Software Requirement
ƒ
RAM
Minimum RAM recommended = 128mb (scales by user count and
size of disk cache)
ƒ
Disk
Small user count = 512MB to 1G
Large user count = 16G to 24G
ƒ
Most version on UNIX
Also work on AIX, Digital UNIX, FreeBSD, Hp-UX, IRIX, LINUX,
NetBSD, NextStep,SCO, Solaris and SunOS
10
Introduction to Squid
3.2 Directory Structure
Squid normally creates a few directories shown in Table 3-1
Directories
Explaination
/var/cache
Stored the actual data
/etc/squid
Contains the squid.conf file which
it is only squid config file
/var/log
Query each connection (example
if the directory getting larger)
Table 3-1: Squid Directory
3.3 Getting and installing squid
Custom Configuration for Network
There are three configurations for proxy server in the network. The
configuration file will follow to the requirement for the usage in your
network. They are transparency proxy, reverse proxy and web cache
proxy.
11
Introduction to Squid
Configuring squid for transparency
Internet
Transparent Proxy Server
10.1.1.1
10.1.1.1
80
client
client
client
client
client
LAN
client
client
LAN
Figure 3-1: Transparent Proxy
A Transparent proxy (Figure 3-1) is configured when you want to
grab a certain type of traffic at your gateway or router and send it
through a proxy without the knowledge of the user or client. In other
words, router will forward all traffic to port 80 to proxy machine using
a route policy.
By using squid as transparent proxy, it will involve two part of process:
1. squid need to be configured properly to accept non-proxy
requests
2. web traffic gets redirected to the squid port
12
Introduction to Squid
This type of transparency proxy is suitable for
ƒ
Intercept the network traffic transparently to the browser
ƒ
Simplified administration- the browser does not need to be
configured to talk to a cache
ƒ
Central control – the user cannot change the browser to bypass
the cache
The disadvantages of using this type of proxy are
ƒ
Browser dependency – transparent proxy does not work very
well with certain web-browsers
ƒ
User control Transparent – caching takes control away from the
user where the user will change ISPs to either avoid it or get it
Configuring squid for reverse proxy
Internet
client
Reverse Proxy Server
Web Server Cluster
Figure 3-2: Reverse Proxy
13
Introduction to Squid
A Reverse Proxy (also known as Web Server Acceleration) (Figure
3-2) is a method of reducing the load on a busy web server by using a
web cache between the server and the internet.
In this case, when a client browser makes a request, the DNS will
route the request to the reverse proxy (this is not the actual web
server). Then the reverse proxy will check its cache to find out
whether the request contains is available to fulfill the client request. If
not, it will contact the real web server and downloads the requested
contains to its disk cache.
Benefits that can be gained are
1. security improvement
2. scalability improvement without increasing the complexity of
maintenance too much.
3. easy burden on a web server that provides both static and
dynamic content. The static content can be cached on the
reverse proxy while the web server will be freed up to better
handle the dynamic content.
To run Squid as an accelerator, you probably want to listen on port 80.
Hence, you have to define the machine you are accelerating for. (not
covered in this chapter).
14
Introduction to Squid
Configuring squid for Web Cache proxy
Internet
Router
Web Cache Proxy Server
Router
client
client
client
client
client
Figure 3-3 Web Cache Proxy
By default, squid is configured as a direct proxy (Figure 3-3). In order
to cache web traffic with squid, the browser must be configured to use
the squid proxy. This needs the following information
ƒ
proxy server's IP address
ƒ
port number by which the proxy server accepts connections
15
Introduction to Squid
3.4 Install squid
The Squid proxy caching server software package comes with Fedora
Core V6. Therefore, we do not have to install it. Just manage the
configuration file to make it work.
If no Squid installed in your server you can install it from Squid RPM
file. To do so, you need to download the RPM file from the Internet or
copy it from installation CD. Then run this command
# rpm –i squid-2.6.STABLE4-1.fc6.i386.rpm
NOTE: The RPM file name may be differ depends on the version of
Squid you have downloaded
Alternatively, you can install it from Squid installation script where it
can be downloaded from official Squid Proxy server web site,
http://www.squid-cache.org.
To
do
so,
you
need
to
copy
the
installation folder into your local drive and run the following command.
# ./configure
# make
# make install
NOTE: Make sure all the dependency files are already installed in
your machine before starting to install Squid
16
Introduction to Squid
3.5 Basic Squid Configuration
Configure SQUID
All Squid configuration files are kept in the directory /etc/squid.
The following paragraph of this chapter will works through the options
that may need some further changes to get Squid to run. Most people
will not need to change all of these settings. What usually needs to
change is at least one part of the configuration file though: the default
file in squid.conf, which denies the access to the browser. If you
don't change this, Squid will not be very useful.
Basic Configuration
All of squid configuration goes in one file - squid.conf. This section
details up the configuration of Squid as a caching proxy only, not as
http-accelerator.
Some basic configuration need to be implemented. First, uncomment
and edit the following lines in the configuration file found at default file
/etc/squid/squid.conf
To construct the squid server, do the following tasks
1. log in as root to the machine
2. type the following command
# vi /etc/squid/squid.conf
The above command will open Squid configuration file for editing
17
Introduction to Squid
Then, set the port on which Squid listens. Normally, Squid will listen
on port 3128. While it may convenient to listen on this port, network
administrators often configure the proxy to listen on port 8080 as well.
This is a non-well-known port, while (port 1024 are well-known ports
and are restricted from being used ordinary users processes), and is
therefore not going to be in conflict with other ports such as 80, 443,
22, 23, etc. Squid need not be restricted to one port. It could easily be
started in two or more ports.
At squid.conf file, find out the following sentence for some changes
or leave it as default if its port is 3128.
http_port
Check
http_port 3128 (is a default.)
or
http_port 8080 3128 (for multiple port)
.
18
Introduction to Squid
Additionally, if you have multiple networks cards in your proxy server,
and would like to restrict the proxy to start on port 8080 on the first
network card and port 3128 on the second network card. You can add
the following sentence.
http_port
10.1.5.49:8080
10.0.5.50:3128
http_access
By default http_access is denied. The Access Control Lists (ACL) rules
should be modified to allow access only to the trusted clients. This is
important because it prevents people from stealing your network
resources.
ACL will be discussed in Chapter 4.
cache_dir
This directive specifies the cache directory storage format and its size
as given below.
cache_dir ufs /var/spool/squid 100 16 256
The value 100 denotes 100MB cache size. This can be adjusted to the
required size. (cache will be discuss later in Chapter 5)
cache_effective_user
cache_effective_ group
NOTE: You can edit the squid.conf file by using gedit instead
of command line
19
Introduction to Squid
Starting Squid Daemon
In this chapter, we will learn how to start Squid. Make sure you have
finished editing the configuration file. Then you can start Squid for the
first time.
First, you have to check the error in conf file. Type this command at
your terminal
# squid -k parse
If error detected, for example
# squid –k parse
FATAL: could not determine fully qualified hostname, Please
set ‘visible hostname’
Squid Cache (versio 2.6.STABLE4):Terminated abnormally.
CPU Usage:0.0004 seconds=0.0004 user+0.000 sys
Maximum Resident Size:0KB
Page faults with physical i/o:0
Aborted.
Solution : Add the following sentence in squid.conf file
visible_hostname localhost
If no error detected, continue with the following command to start
squid. (This is temporarily step to start the squid)
# service squid start
If everything is working fine, then your console displays:
Starting squid: .
If you want to stop the service,
# service squid stop
Then your console will display:
20
[OK]
Introduction to Squid
Stopping squid: .
[OK]
You should be a privileged user to start or stop squid.
For permanent step, try this command
# chkconfig –list
# chkconfig –-level 5 squid on
You can restart the squid service by typing
#/etc/init.d/squid restart
While the daemon is running, there are several ways you can run the
squid command to change how the daemon works by using this
options:
# squid –k reconfigure
- causes Squid to read again its configuration file
#squid –k shutdown
- causes Squid to exit after waiting briefly for current connections to
exit
#squid –k interrupt
- shuts down Squid immediately, without waiting for connections to
close
#squid –k kill
– kills Squid immediately, without closing connections or log files. (use
this option only if other methods don’t work)
21
Introduction to Squid
3.6 Basic Client Software Configuration
Basic Configuration
To configure any browser, you need at least two pieces of information:
ƒ
Proxy server's IP Address
ƒ
Port number that the proxy server is accepting the requests
Configuring Internet Browser
The following section will explain the steps to configure proxy server in
Internet Explorer, Mozilla Firefox and Opera.
Internet Explorer 7.0
1. Select the Tools menu option
2. Select Internet Options
3. Click on the Connection tab
4. Select LAN settings
5. The Internet using a proxy server
6. Check the box in proxy server Æ Type in the proxy IP address in
the Address field, and the port number in the Port field.
Example:
Address : 10.0.5.10 Port : 3128
Mozilla Firefox
1. Click Tools Æ Options Æ Advanced
2. Click at Network Æ go to connection Æ Settings
3. At the configure proxies to Access Internet
22
Introduction to Squid
4. Choose manual proxy configuration
5. At HTTP Proxy: 10.0.5.10
Port: 3128
6. Check the box to use the proxy server for all protocols
7. Then click OK
8. Now, the client can access the internet.
Opera 9.1
1. Click Tools Æ Preferences Æ Advanced
2. Choose Network
3. Click at Proxy Sever
Check
HTTP
: 10.0.5.10
Port :3128
HTTPs
: 10.0.5.10
Port :3128
FTP
: 10.0.5.10
Port :3128
Gropher
: 10.0.5.10
Port :3128
4. Then, Click OK
Using proxy.pac File
This setting is for the clients when they want to have browsers pick up
proxy setting automatically. The browser can be configured with a
simple proxy.pac file as shown in the example below;
function FindProxyForURL(url, host)
{
if (isInNet(myIpAddress(), "10.0.5.0", "255.255.255.0"))
return "PROXY 10.0.5.10:3128";
else
return "DIRECT";
}
23
Introduction to Squid
proxy.pac needs to be installed in a web server such as Apache, and
the client can configure proxy server using the automatic configuration
script. This script is useful when there is possibility that the proxy
server will change its IP address. To access the script, client needs to
add the URL of proxy.pac in its automatic configuration proxy script
(Figure 3-4).
Figure 3-4: Using automatic configuration script
24
4
Chapter
4. ACL Configuration
4.1 Access controls
Access control lists (ACL) are the most important part in configuring
Squid. The main use of the ACL is to implement simple access control
where it is used to restrict other people from using cache infrastructure
without certain permission. Rules can be written for almost any type of
requirement. It can be very complex for large organisations or just a
simple configuration to home users.
ACL is written in squid.conf file using the following formats
acl name type (string|"filename") [string2] ["filename2"]
name is a variable defined by user and it should be descriptive while
type is defined accordingly and it will be described in the next section
.
25
ACL Configuration
There are two elements in access control: classes and operators.
Classes are defined by the acl, while the name of the operators varies.
The most common operators are http_access and icp_access. The
actions for this operator are allow and deny. allow is used to allow or
enable the ACL while deny used to deny or restrict the ACL
General format for operator
http_access
allow|deny
[!]aclname [!]aclname2 ... ]
List of ACL type
ACL Type
Details
src
client IP address
srcdomain
client domain name
dst
destination’s IP address
dstdomain
destination’s domain name
srcdom_regex
Regular expression describing client domain name
dstdom_regex
Regular expression describing destination domain
name
time
specify the time
url_regex
Regular
expression
describing
whole
URL
of
URL
of
destination (web server)
urlpath_regex
Regular
expression
describing
path
of
destination (not include its domain name)
port
Specify port number
proto
Specify protocol
method
Specify method
browser
Specify browser
proxy_auth
User authentication via external processes
maxconn
Specify number of connection
26
ACL Configuration
src
Description
This ACL allows server to recognize client (the computer which will use
server as proxy to get access to the internet ) using its IP address. The
IP address can be listed using single IP address, range of IP or using
defined IP address in an external file.
Syntax
acl
aclname
src
ip-address/netmask .. (clients IP address)
acl
aclname
src
addr1-addr2/netmask .. (range of addresses)
acl
aclname
src
“filename” ..(client's IP address in external file)
Example 1
acl fullaccess src “/etc/squid/fullaccess.txt”
http_access allow fullaccess
This ACL is using external file named fullaccess.txt where fullaccess.txt
consist of list of IP address of the client.
Example of fullaccess.txt
198.123.56.12
198.123.56.13
198.123.56.34
Example 2
acl office.net src 192.123.56.0/255.255.255.0
http_access allow office.net
This ACL set the source address for office.net in range 192.123.56.x to
access the Internet using http_access allow operator
27
ACL Configuration
srcdomain
Description
This ACL allows server to recognize client using client’s computer
name. To do so, squid needs to reverse DNS lookup (from client ipaddress to client domain-name) before this ACL is interpreted, it can
cause processing delays.
Syntax
acl
aclname
srcdomain domain-name..(reverse lookup client IP)
Example 1
acl staff.net srcdomain staff20 staff21
http_access allow
staff.net
This ACL is for clients with computer name staff20 and staff21. The
operator http_access is allowing the ACL named staff.net to access
the Internet. This option is not really effective since the computer must
do reverse name lookup to determine the source name.
NOTE: Please ensure the DNS server in running in order to use
DNS lookup service
28
ACL Configuration
dst
Description
This is same as src, the difference is only it refers to Server’s IP
address (destination). First, Squid will dns-lookup for IP Address from
the domain-name, which is in request header, and then interpret it
Syntax
acl
aclname
dst
ip_address/netmask .. (URL host's or the site
IP address)
Example 1
acl tunnel dst 209.8.233.0/24
http_access deny tunnel
This ACL deny any node with IP 209.8.233.x
Example 2
acl allow_ip dst 209.8.233.0-209.8.233.100/255.255.0.0
http_access allow allow_ip
This ACL is allowing destination with IP address range
from
209.8.233.0 to 209.8.233.100.
dstdomain
Description
This ACL recognize destination using its domain. This is the effective
method to control specific domain
Syntax
acl
aclname
dstdomain
domain.com
(domain name from the site's URL)
29
ACL Configuration
Example 1
acl banned_domain dstdomain www.terrorist.com
http_access deny banned_domain
This ACL deny destionation with domain www.terrorist.com
srcdom_regex
Description
This ACL is almost similar to srcdomain where the server needs to
reverse DNS lookup (from client ip-address to client domain-name)
before this ACL is interpreted. The difference is this ACL allow the
usage of regular expression in defining the client’s domain.
Syntax
acl
aclname
srcdom_regex -i
source_domain_regex
Example 1
acl staff.net srcdom_regex -i staff
http_access allow staff.net
This ACL allows all the node with the domain contains word staff to
access the internet. Option -i is used to make expression caseinsensitive
dstdom_regex
Description
This ACL allows server to recognize destination using its domain
regular expression.
Syntax
acl
30
aclname
dstdom_regex -i
dst_domain_regex
ACL Configuration
Example 1
acl banned_domain dstdom_regex -i terror porn
http_access deny banned_domain
This ACL denies client to access the destinations that contain word
terrorist or porn in its domain name. For example the access to the
domain www.terrorist.com and www.pornoragphy.net will be denied by
proxy server.
time
Description
This ACL allows server to control the service using time function. The
accessibility to the network can be set according the scheduled time in
ACL
Syntax
acl
aclname
time
day abbrevs h1:m1h2:m2
where h1:m1 must be less than h2:m2 and day will be represented
using abbreviation in Table 4-1
day
abbreviations
S
Sunday
M
Monday
T
Tuesday
W
Wednesday
H
Thursday
F
Friday
A
Saturday
Table 4-1 Abbreviation for Day
31
ACL Configuration
Example 1
acl SABTU time A 9:00-17:00
ACL SABTU refers to day of Saturday from 9:00 to 17:00
Example 2
acl pagi time 9:00-11:00
acl office.net 10.2.3.0/24
http_access deny pagi office.net
pagi refers time from 9:00 to 11:00, while office.net refer to the
clients' IP. This combination of ACLs deny the access for office.net if
the time is between 9.00am to 11.00 am
url_regex
Description
The url_regex means to search the entire URL for the regular
expression you specify. Note that these regular expressions are casesensitive. To make them case-insensitive, use the -i option
Syntax
acl
aclname
url_regex -i
url_regex ..
Example 1
acl banned_url url_regex -i terror porn
http_access deny banned_url
This ACL deny URL that contains word terrorist or porn.
For example, the following destination will be denied by the proxy
server;
http://www.google.com/pornography
http://www.news.com/terrorist.html
http://www.terror.com/
32
ACL Configuration
urlpath_regex
Description
The urlpath_regex is regular expression pattern matching from URL
but excluding protocol and hostname.
If
URL
is
http://www.free.com/latest/games/tetris.exe
then
this
acltype only looks after http://www.free.com/. It will leave out the http
protocol and www.free.com hostname.
Syntax
acl
aclname
urlpath_regex
pattern
Example 1
acl blocked_free urlpath_regex free
http_access deny blocked_free
This ACL will blocked any URL that only containing "free'' not "Free”,
and without referring to protocol and hostname.
These regular expressions are case-sensitive. To make them caseinsensitive, add the –i option.
Example 2
acl blocked_games urlpath_regex –i games
http_access deny blocked_games
blocked_games refers to the URL containing word “games” no matter if
the spelling in upper or lower case.
Example 3
To block several URL.
acl block_site urlpath_regex –i
“/etc/squid/acl/block_site”
http_access deny block_site
33
ACL Configuration
To block several URL, it is recommended to put the lists in one file. As
in Example 3, all block_site list is in /etc/squid/acl/block_site file.
File block_site may containing, for example
\.exe$
\.mp3$
port
Description
Access can be controlled by destination (server) port address
Syntax
acl
aclname
port port-number
Example 1
Deny requests to unknown ports
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443 563
# http
# ftp
# https, snews
http_access deny !Safe_ports
Example 2
Deny to several untrusted ports
acl safeport port “/etc/squid/acl/safeport”
http_access deny safeport
34
ACL Configuration
proto
Description
This specifies the transfer protocol
Syntax
acl
aclname
proto
protocol
Example 1
acl protocol proto HTTP FTP
This refers protocols HTTP and FTP
Example 2
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
Only allow cachemgr access from localhost.
Example 3
acl ftp proto FTP
http_access deny ftp
http_access allow all
This command should block every ftp request
35
ACL Configuration
method
Description
This specifies the type of the method of the request
Syntax
acl
aclname
method
method-type
Example 1
acl connect method CONNECT
http_access allow localhost
http_access allow allowed_clients
http_access deny connect
the CONNECT method to prevent outside people from trying to connect
to the proxy server
browser
Description
Regular expression pattern matching on the request's user-agent
header. To grep the user-agent header information, squid.conf
should be added this line:
useragent_log /var/log/squid/useragent.log
Then, try to run the Mozilla browser. The user-agent header for Mozilla
should be as in the example.
Syntax
acl
aclname
browser
pattern
Example 1
acl mozilla browser ^Mozilla/5\.0
http_access deny mozilla
This command will deny Mozilla browsers or any other browser related
to it.
36
ACL Configuration
proxy_auth
Description
User authentication via external processes. proxy_auth requires an
EXTERNAL
authentication
program
to
check
username/password
combinations. In this configuration, we use the NCSA authentication
method because it is the easiest method to implement.
Syntax
acl
aclname
proxy_auth
username...
Example 1
To validate a listing of users, we should do the following steps.
Creating passwd file
# touch
# chown
# chmod
/etc/squid/passwd
root.squid /etc/squid/passwd
640 /etc/squid/passwd
Adding users
# htpasswd
/etc/squid/passwd shah
You will be prompted to enter a passwd for that user. In the example is
the passwd for user shah.
Setting rules
auth_param basic program /usr/lib/squid/ncsa_auth
/etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid proxy-caching web-server
auth_param basic credentialsttl 2 hours
These listings are already in the configuration file but need to be
adjusted to suit your environments.
37
ACL Configuration
Authentication configuration
acl LOGIN proxy_auth REQUIRED
http_access allow LOGIN
This command will only allow user that have been authenticated during
accessing network connection.
CAUTION !! proxy_auth can't be used in a transparent proxy.
maxconn
Description
A limit on the maximum number of connections from a single client IP
address. It is an ACL that will be true if the user has more than
maxconn connections open.
Syntax
acl
aclname
maxconn
number_of_connection
Example 1
acl someuser src 10.0.5.0/24
acl 5conn maxconn 5
http_access deny someuser 5conn
The command will restrict users in 10.0.5.0/24 subnet to have only
five (5) maximum connections at once. If exceed, the error page will
appear. Other users are not restricted to this command by adding the
last line.
CAUTION !! The maxconn ACL requires the client_db feature. If
client_db is disabled (for example with client_db off) then maxconn
ALCs will not work.
38
ACL Configuration
Create custom error page
# vi /etc/squid/error/ERROR_MESSAGE
Append the following
<HTML>
<HEAD>
<TITLE> ERROR : ACCESS DENIED FROM PROXY SERVER </TITLE>
</HEAD>
<BODY>
<H1> The site is blocked due to IT policy</H1>
<p> Please contact helpdesk for more information: </p>
Phone: 06-2333333 (ext 33) <br>
Email: helpdesk@utem.edu.my <br>
CAUTION !!
Do not include HTML close tags </HTML></BODY>
Displaying custom error message
acl blocked_port port 80
deny_info ERROR_MESSAGE block_port
http_access deny block_port
39
ACL Configuration
4.2 Exercises
1.
Why the users still can do the download process with the
following configuration.
acl download urlpath_regex -i \.exe$
acl office_hours time 09:00-17:00
acl GET method GET
acl it_user1 src 192.168.1.88
acl it_user2 src 192.168.1.89
acl nodownload1 src 192.168.1.10
acl nodownload2 src 192.168.1.11
http_access
http_access
http_access
http_access
allow
allow
allow
allow
it_user1
it_user2
nodownload1
nodownload2
http_access deny GET office_hours nodownload1 nodownload2
http_access deny all
The configuration should deny the nodownload1 and nodownload2. the
allow lines should be deleted.
40
ACL Configuration
2.
Why this configuration still bypasses the game.free.com?
acl ban dstdomain free.com
http_access deny ban
3.
The following access control configuration will never work. Why?
acl ME src 10.0.0.1
acl YOU src 10.0.0.2
http_access allow ME YOU
41
Caching
5
Chapter
5. Caching
5.1 Concepts
ƒ
Caching (a.k.a proxy server) is the process of storing data on the
intermediate system between the Web server and the client.
ƒ
The proxy server can simply send the content requested by the
client form it copy in cache.
ƒ
The assumption is that later requests for the same data can be
serviced more quickly by not having to go all the way back to the
original server.
ƒ
Caching also can reduce demands on network resources and on
the information servers.
5.2 Configuring a cache for proxy server
There are a lot of parameters related to caching in Squid and these
parameters can be divided into three main groups as below:
A. Cache size
B. Cache directories and log file path name
C. Peer cache servers and Squid hierarchy
42
Caching
However, in the following subsection, only the first two groups will be
covered.
A. Cache Size
The following are the common parameters used in cache size.
i. cache_mem
Syntax
cache_mem size(MB)
This parameter specifies the amount of cache memory (RAM)
used to store in-transit object (ones that are currently being
used), hot objects (one that are used often) and negative-cached
object (recent failed request). Default size value is 8MB.
Example:
cache_mem 16 MB
ii. maximum_object_size
Syntax
maximum_object_size
size(MB)
This parameter used if you want not to cache file that are
larger or equal to the size set. Default size value is 4MB.
Example:
maximum_object_size 8 MB
43
Caching
iii. ipcache_size
Syntax
ipcache_size
size(MB)
This parameter used to set how many IP address resolution
values Squid stores. Default value size is 1MB.
Example:
ipcache_size 32MB
iv. ipcache_high
Syntax
ipcache_high percentage
This parameter specifies the percentage that causes Squid to
start clearing out the least-used IP address resolution. Usually
the default value is always used.
Example:
ipcache_high 95
v. ipcache_low
Syntax
ipcache_low
percentage
This parameter specifies the percentage that causes Squid to
stop clearing out the least-used IP address resolution. Usually
the default value is always used.
44
Caching
Example:
ipcache_low 90
B. Cache Directories
i. cache_dir
Syntax
cache_dir
type dir
size(MB)
L1
L2
This parameter specifies the directory/directories in which cache
swap files are stored. The default dir is /var/spool/squid
directory. We can specify how much disk space to use for cache
in megabytes (100 is the default), the default number of firstlevel directories (L1) and second-level directories (L2) is 16 and
256 respectively.
Example:
cache_dir aufs /var/cache01 7000 16 256
NOTE: /var/cache01 is a partition that have been created
during Linux Fedora installation
Formula to calculate the first-level directories (L1):
Given :
x=Size of cache dir in KB (e.g., 6GB = 6,000,000KB)
y=Average object size (e.g, 13KB)
z=Objects per L2 directories (Assuming 256)
calculate:
L1 = number of L1 directories
L2 = number of L2 directories
such that:
L1 x L2 = x / y / z
45
Caching
Example :
x = 6GB
= 6 * 1024 *1024 = 6291456 KB
so ;
x / y / z = 6291456 / 13 / 256
= 1890
and
L1 * L2 = x / y / z
L1 * 256 = 1890
L1
= 7
ii. access_log
Syntax
cache_log dir
This parameter specifies the location where the HTTP and ICP
accesses are stored. The default dir /var/log/squid/access.log is
always used.
Example:
cache_log /var/log/squid/access.log
46
6
Chapter
6. SQUID and Webmin
6.1 About Webmin
Webmin is a graphical user interface for system administration for
Unix. It is a web-based system and can be installed in most of the Unix
system. Webmin is a free software and the installation package can be
downloaded from the Net. Webmin is largely based on Perl, and it is
running as its own process, and web server. It usually uses TCP port
10000 for communicating, and can be configured to use SSL if
OpenSSL is installed.
6.2 Obtaining and Installing Webmin
Webmin installation package is available at the official Webmin site
http://www.webmin.com/download.html.
You can download the latest package and locate it in the local machine.
47
SQUID and Webmin
Installation of Webmin differs slightly depending on which type of
package you choose to install. Note that Webmin requires a relatively
recent Perl for any of these installation methods to work. Nearly all, if
not all, modern UNIX and UNIX-like OS variants now include Perl as a
standard component of the OS, so this should not be an issue.
Installing from a tar. gz
First you must untar and unzip the archive in the directory where you
would like Webmin to be installed. The most common location for
installation from tarballs is /usr/local. Some sites prefer /opt. If
you’re using GNU tar, you can do this all on one command line:
#tar zxvf webmin-1.340.tar.gz
If you have a less capable version of tar, you must unzip the file first
and then untar it:
# gunzip webmin-1.340.tar.gz
# tar xvf webmin-1.340.tar.gz
Next, you need to change to the directory that was created when you
untarred the archive, and execute the setup.sh script, as shown in
the following example. The script will ask several questions about your
system and your preferences for the installation. Generally, accepting
the default values will work. The command for installation as below:
# ./setup.sh
Installing from an RPM
Installing from an RPM is even easier. You only need to run one
command:
# rpm -Uvh webmin-1.340-1.noarch.rpm
48
SQUID and Webmin
This will copy all of the Webmin files to the appropriate locations and
run the install script with appropriate default values. For example, the
Webmin perl files will be installed in /usr/libexec/webmin while the
configuration files will end up in /etc/webmin. Webmin will then be
started on port 10000. You may log in using root as the login name
and your system root password as the password. It's unlikely you will
need to change any of these items from the command line, because
they can all be modified using Webmin. If you do need to make any
changes, you can do so in miniserv.conf in /etc/webmin.
After Installation
After
installation,
your
Webmin
installation
will
behave
nearly
identically, regardless of operating system vendor or version, location
of installation, or method of installation. The only apparent differences
between systems will be that some have more or fewer modules
because some are specific to one OS. Others will feature slightly
different versions of modules to take into account different functioning
of the underlying system. For example, the package manager module
may behave differently, or be missing from the available options
entirely, depending on your OS.
6.3 Using Squid in Webmin
To launch Webmin, open a web browser, such as Netscape or Mozilla
Firefox, on any machine that has network access to the server on
which you wish to log in. Browse to port 10000 on the IP or host name
of the server using http://computername:10000/. Go to menu Squid
Proxy Server (in submenu Server) to open the main panel (Figure 6-1)
49
SQUID and Webmin
Figure 6-1: Squid Proxy Main Page
6.4 Ports and Networking
The Ports and Networking page provides you with the ability to
configure most of the network level options of Squid. Squid has a
number of options to define what ports Squid operates on, what IP
addresses it uses for client traffic and intercache traffic, and multicast
options. Usually, on dedicated caching systems these options will not
be useful. But in some cases you may need to adjust these to prevent
the Squid daemon from interfering with other services on the system
or on your network.
50
SQUID and Webmin
Proxy port
Sets the network port on which Squid operates. This option is usually
3128 by default and can almost always be left on this address, except
when multiple Squids are running on the same system, which is
usually ill-advised. This option corresponds to the http_port option in
squid.conf.
ICP port
This is the port on which Squid listens for Internet Cache Protocol, or
ICP, messages. ICP is a protocol used by web caches to communicate
and share data. Using ICP it is possible for multiple web caches to
share cached entries so that if any one local cache has an object, the
distant origin server will not have to be queried for the object. Further,
cache hierarchies can be constructed of multiple caches at multiple
privately interconnected sites to provide improved hit rates and higherquality web response for all sites. More on this in later sections. This
option correlates to the icp_port directive.
Incoming TCP address
The address on which Squid opens an HTTP socket that listens for
client connections and connections from other caches. By default Squid
does not bind to any particular address and will answer on any address
that is active on the system. This option is not usually used, but can
provide some additional level of security, if you wish to disallow any
outside network users from proxying through your web cache. This
option correlates to the tcp_incoming_address directive.
51
SQUID and Webmin
Outgoing TCP address
Defines the address on which Squid sends out packets via HTTP to
clients and other caches. Again, this option is rarely used. It refers to
the tcp_ outgoing_address directive.
Incoming UDP address
Sets the address on which Squid will listen for ICP packets from other
web caches. This option allows you to restrict which subnets will be
allowed to connect to your cache on a multi-homed, or containing
multiple
subnets,
Squid
host.
This
option
correlates
to
the
udp_incoming_address directive.
Outgoing UDP address
The address on which Squid will send out ICP packets to other web
caches. This option correlates to the udp_outgoing_address.
Multicast groups
The multicast groups that Squid will join to receive multicast ICP
requests. This option should be used with great care, as it is used to
configure your Squid to listen for multicast ICP queries. Clearly if your
server is not on the MBone, this option is useless. And even if it is, this
may not be an ideal choice.
52
SQUID and Webmin
TCP receive buffer
The size of the buffer used for TCP packets being received. By default
Squid uses whatever the default buffer size for your operating system
is. This should probably not be changed unless you know what you’re
doing, and there is little to be gained by changing it in most cases.
This correlates to the tcp_recv_bufsize directive.
6.5 Other Caches
The Other Caches page provides an interface to one of Squid’s most
interesting, but also widely misunderstood, features. Squid is the
reference implementation of ICP, a simple but effective means for
multiple caches to communicate with each other regarding the content
that is available on each. This opens the door for many interesting
possibilities when one is designing a caching infrastructure.
Internet Cache Protocol
It is probably useful to discuss how ICP works and some common
usages for ICP within Squid, in order to quickly make it clear what it is
good for, and perhaps even more importantly, what it is not good for.
The most popular uses for ICP are discussed, and more good ideas will
probably arise in the future as the Internet becomes even more global
in scope and the web-caching infrastructure must grow with it.
53
SQUID and Webmin
Parent and Sibling Relationships
The ICP protocol specifies that a web cache can act as either a parent
or a sibling. A parent cache is simply an ICP capable cache that will
answer both hits and misses for child caches, while a sibling will only
answer hits for other siblings. This subtle distinction means simply that
a parent cache cans proxy for caches that have no direct route to the
Internet. A sibling cache, on the other hand, cannot be relied upon to
answer all requests, and your cache must have another method to
retrieve requests that cannot come from the sibling. This usually
means that in sibling relationships, your cache will also have a direct
connection to the Internet or a parent proxy that can retrieve misses
from the origin servers. ICP is a somewhat chatty protocol, in that an
ICP request will be sent to every neighbor cache each time a cache
miss occurs. By default, whichever cache replies with an ICP hit first
will be the cache used to request the object.
When to Use ICP?
ICP is often used in situations wherein one has multiple Internet
connections, or several types of paths to Internet content. Finally, it is
possible,
though
usually
not
recommended,
to
implement
a
rudimentary form of load balancing through the use of multiple parents
and multiple child web caches.
One of the common uses of ICP is cache meshes. A cache mesh is, in
short, a number of web caches at remote sites interconnected using
ICP. The web caches could be in different cities, or they could be in
different buildings of the same university or different floors in the same
office building. This type of hierarchy allows a large number of caches
to benefit from a larger client population than is directly available to it.
54
SQUID and Webmin
All other things being equal, a cache that is not overloaded will
perform better (with regard to hit ratio) with a larger number of
clients. Simply put, a larger client population leads to a higher quality
of cache content, which in turn leads to higher hit ratios and improved
bandwidth savings. So, whenever it is possible to increase the client
population without overloading the cache, such as in the case of a
cache mesh, it may be worth considering. Again, this type of hierarchy
can be improved upon by the use of Cache Digests, but ICP is usually
simpler to implement and is a widely supported standard, even on
non-Squid caches.
Finally, ICP is also sometimes used for load balancing multiple caches
at the same site. ICP, or even Cache Digests for that matter, are
almost never the best way to implement load balancing. Using ICP for
load balancing can be achieved in a few ways.
•
Through have several local siblings, which can each provide hits
to the others’ clients, while the client load is evenly divided
across the number of caches.
•
Using fast but low-capacity web cache in front of two or more
lower-cost, but higher-capacity, parent web caches. The parents
will then provide the requests in a roughly equal amount.
6.6 Other Proxy Cache Servers
This section of the Other Caches page provides a list of currently
configured sibling and parent caches, and also allows one to add more
neighbor caches. Clicking the name of a neighbor cache will allow you
to edit it. This section also provides the vital information about the
neighbor caches, such as the type (parent, sibling, multicast), the
proxy or HTTP port, and the ICP or UDP port of the caches. Note that
55
SQUID and Webmin
Proxy port is the port where the neighbor cache normally listens for
client traffic, which defaults to 3128.
Edit Cache Host
Clicking a cache peer name or clicking Add another cache on the
primary Other Caches page brings you to this page, which allows you
to edit most of the relevant details about neighbor caches (Figure 6-2)
Figure 6-2: Create cache Host page
Hostname
The name or IP address of the neighbor cache you want your cache to
communicate with. Note that this will be one-way traffic. Access
Control Lists, or ACLs, are used to allow ICP requests from other
caches. ACLs are covered later. This option plus most of the rest of the
options on this page correspond to cache_ peer lines in squid.conf.
56
SQUID and Webmin
Type
The type of relationship you want your cache to have with the neighbor
cache. If the cache is upstream, and you have no control over it, you
will need to consult with the administrator to find out what kind of
relationship you should set up. If it is configured wrong, cache misses
will likely result in errors for your users. The options here are sibling,
parent, and multicast.
Proxy port
The port on which the neighbor cache is listening for standard HTTP
requests. Even though the caches transmit availability data via ICP,
actual web objects are still transmitted via HTTP on the port usually
used for standard client traffic. If your neighbor cache is a Squid-based
cache, then it is likely to be listening on the default port of 3128. Other
common ports used by cache servers include 8000, 8888, 8080, and
even 80 in some circumstances.
ICP port
The port on which the neighbor cache is configured to listen for ICP
traffic. If your neighbor cache is a Squid-based proxy, this value can
be found by checking the icp_port directive in the squid.conf file on
the neighbor cache. Generally, however, the neighbor cache will listen
on the default port 3130.
57
SQUID and Webmin
Proxy only?
A simple yes or no question to tell whether objects fetched from the
neighbor cache should be cached locally. This can be used when all
caches are operating well below their client capacity, but disk space is
at a premium or hit ratio is of prime importance.
Send ICP queries?
Tells your cache whether or not to send ICP queries to a neighbor. The
default is Yes, and it should probably stay that way. ICP queries is the
method by which Squid knows which caches are responding and which
caches are closest or best able to quickly answer a request.
Default cache
This is switched to Yes if this neighbor cache is to be the last-resort
parent cache to be used in the event that no other neighbor cache is
present as determined by ICP queries. Note that this does not prevent
it from being used normally while other caches are responding as
expected. Also, if this neighbor is the sole parent proxy, and no other
route to the Internet exists, this should be enabled.
Round-robin cache?
Choose whether to use round-robin scheduling between multiple
parent caches in the absence of ICP queries. This should be set on all
parents that you would like to schedule in this way.
58
SQUID and Webmin
ICP time-to-live
Defines the multicast TTL for ICP packets. When using multicast ICP, it
is usually wise for security and bandwidth reasons to use the minimum
tty suitable for your network.
Cache weighting
Sets the weight for a parent cache. When using this option it is
possible to set higher numbers for preferred caches. The default value
is 1, and if left unset for all parent caches, whichever cache responds
positively first to an ICP query will be sent a request to fetch that
object.
Closest only
Allows
you
to
specify
that
your
cache
wants
only
CLOSEST_PARENT_MISS replies from parent caches. This allows your
cache to then request the object from the parent cache closest to the
origin server.
No digest?
Chooses whether this neighbor cache should send cache digests. No
NetDB exchange When using ICP, it is possible for Squid to keep a
database of network information about the neighbor caches, including
availability and RTT, or Round Trip Time, information. This usually
allows Squid to choose more wisely which caches to make requests to
when multiple caches have the requested object.
59
SQUID and Webmin
No delay?
Prevents accesses to this neighbor cache from affecting delay pools.
Delay pools, discussed in more detail later, are a means by which
Squid can regulate bandwidth usage. If a neighbor cache is on the
local network, and bandwidth usage between the caches does not need
to be restricted, then this option can be used.
Login to proxy
Select this if you need to send authentication information when
challenged by the neighbor cache. On local networks, this type of
security is unlikely to be necessary.
Multicast responder
Allows Squid to know where to accept multicast ICP replies. Because
multicast is fed on a single IP to many caches, Squid must have some
way of determining which caches to listen to and what options apply to
that particular cache. Selecting Yes here configures Squid to listen for
multicast replies from the IP of this neighbor cache.
Query host for domains, Don’t query for domains
These two options are the only options on this page to configure a
directive other than cache_peer in Squid. In this case it sets the
cache_peer_domain option. This allows you to configure whether
requests for certain domains can be queried via ICP and which should
not. It is often used to configure caches not to query other caches for
content within the local domain. Another common usage, such as in
60
SQUID and Webmin
the national web hierarchies discussed above, is to define which web
cache is used for requests destined for different TLDs. So, for example,
if one has a low cost satellite link to the U. S. backbone from another
country that is preferred for web traffic over the much more expensive
land line, one can configure the satellite-connected cache as the cache
to query for all .com, .edu, .org, net, .us, and .gov addresses.
Cache Selection Options
This
section
provides
configuration
options
for
general
ICP
configuration (Figure 6-3). These options affect all of the other
neighbor caches that you define.
Figure 6-3: Global ICP options
Directly fetch URLs containing
Allows you to configure a match list of items to always fetch directly
rather than query a neighbor cache. The default here is cgi-bin ? and
should continue to be included unless you know what you’re doing.
This helps prevent wasting intercache bandwidth on lots of requests
that are usually never considered cacheable, and so will never return
hits
from
your
neighbor
caches.
This
option
sets
the
hierarchy_stoplist directive.
61
SQUID and Webmin
ICP query timeout
The time in milliseconds that Squid will wait before timing out ICP
requests. The default allows Squid to calculate an optimum value
based on average RTT of the neighbor caches. Usually, it is wise to
leave this unchanged. However, for reference, the default value in the
distant past was 2000, or 2 seconds. This option edits the icp_ query_
timeout directive.
Multicast ICP timeout
Timeout in milliseconds for multicast probes, which are sent out to
discover the number of active multicast peers listening on a given
multicast address. This configures the mcast_icp_query_timeout
directive and defaults to 2000 ms, or 2 seconds.
Dead peer timeout
Controls how long Squid waits to declare a peer cache dead. If there
are no ICP replies received in this amount of time, Squid will declare
the peer dead and will not expect to receive any further ICP replies.
However, it continues to send ICP queries for the peer and will mark it
active again on receipt of a reply. This timeout also affects when Squid
expects to receive ICP replies from peers. If more than this number of
seconds has passed since the last ICP reply was received, Squid will
not expect to receive an ICP reply on the next query. Thus, if your
time between requests is greater than this timeout, your cache will
send more requests DIRECT rather than through the neighbor caches.
62
SQUID and Webmin
Memory Usage
This page provides access to most of the options available for
configuring the way Squid uses memory and disks (Figure 6-4). Most
values on this page can remain unchanged, except in very high load or
low resource environments, where tuning can make a measurable
difference in how well Squid performs.
Gambar memory usage
Figure 6-4: Memory and disk usage
Memory usage limit
The limit on how much memory Squid will use for some parts of its
core data. Note that this does not restrict or limit Squid’s total process
size. What it does do is set aside a portion of RAM for use in storing intransit and hot objects, as well as negative cached objects. Generally,
the default value of 8MB is suitable for most situations, though it is
safe to lower it to 4 or 2MB in extremely low load situations. It can
also be raised significantly on high-memory systems to increase
63
SQUID and Webmin
performance by a small margin. Keep in mind that large cache
directories increase the memory usage of Squid by a large amount,
and even a machine with a lot of memory can run out of memory and
go into swap if cache memory and disk size are not appropriately
balanced. This option edits the cache_mem directive. See the section on
cache directories for more complete discussion of balancing memory
and storage.
Maximum cached object size
The size of the largest object that Squid will attempt to cache. Objects
larger than this will never be written to disk for later use. Refers to the
maximum_object_size directive. IP address cache size, IP cache highwater mark, IP address low-water mark The size of the cache used for
IP addresses and the high and low water marks for the cache,
respectively. This option configures the ipcache_size, ipcache_high,
and ipcache_low directives, which default to 1024 entries, 95%, and
90%.
6.7 Logging
Squid provides a number of logs that can be used when debugging
problems and when measuring the effectiveness and identifying users
and the sites they visit (Figure 6-5). Because Squid can be used to
“snoop” on user’s browsing habits, one should carefully consider
privacy laws in your region and, more importantly, be considerate to
your users. That being said, logs can be very valuable tools in ensuring
that your users get the best service possible from your cache.
64
SQUID and Webmin
Figure 6-5: Logging configuration
Cache metadata file
Filename used in each store directory to store the Web cache
metadata, which is a sort of index for the Web cache object store. This
is not a human readable log, and it is strongly recommended that you
leave it in its default location on each store directory, unless you really
know what you're doing. This option correlates to the cache_swap_log
directive.
Use HTTPD log format
Allows you to specify that Squid should write its access.log in HTTPD
common log file format, such as that used by Apache and many other
Web servers. This allows you to parse the log and generate reports
using a wider array of tools. However, this format does not provide
several types of information specific to caches, and is generally less
65
SQUID and Webmin
useful when tracking cache usage and solving problems. Because there
are several effective tools for parsing and generating reports from the
Squid standard access logs, it is usually preferable to leave this at its
default of being off. This option configures the emulate_httpd_log
directive. The Calamaris cache access log analyzer does not work if
this option is enabled.
Log full hostnames
Configures whether Squid will attempt to resolve the host name, so the
the fully qualified domain name can be logged. This can, in some
cases, increase latency of requests. This option correlates to the
log_fqdn directive.
Logging netmask
Defines what portion of the requesting client IP is logged in the
access.log. For privacy reasons it is often preferred to only log the
network or subnet IP of the client. For example, a netmask of
255.255.255.0 will log the first three octets of the IP, and fill the last
octet with a zero. This option configures the client_netmask directive.
66
SQUID and Webmin
6.8 Cache Options
The Cache Options page provides access to some important parts of
the Squid configuration file. This is where the cache directories are
configured as well as several timeouts and object size options (Figure
6-6).
Figure 6-6: Configuring Squids Cache Directories
The directive is cache_dir while the options are the type of filesystem,
the path to the cache directory, the size allotted to Squid, the number
of top level directories, and finally the number of second level
directories. In the example, I've chosen the filesystem type ufs, which
is a name for all standard UNIX filesystems. This type includes the
standard Linux ext2 filesystem as well. Other possibilities for this
option include aufs and diskd.
The next field is simply the space, in megabytes, of the disk that you
want to allow Squid to use. Finally, the directory fields define the upper
and lower level directories for Squid to use
67
SQUID and Webmin
6.9 Access Control
There are three types of option for configuring ICP access control.
These three types of definition are separated in the Webmin panel into
three sections. The first is labeled Access control lists, which lists
existing ACLs and provides a simple interface for generating and
editing lists of match criteria (Figure 6-7). The second is labeled Proxy
restrictions and lists the current restrictions in place and the ACLs they
effect. Finally, the ICP restrictions section lists the existing access rules
regarding ICP messages from other Web caches.
Figure 6-7: Access Control Lists
68
SQUID and Webmin
Access Control Lists
The first field in the table represents the name of the ACL, which is
simply an assigned name, that can be just about anything the user
chooses. The second field is the type of the ACL, which can be one of a
number of choices that indicates to Squid what part of a request
should be matched against for this ACL. The possible types include the
requesting clients address, the Web server address or host name, a
regular expression matching the URL, and many more. The final field is
the actual string to match. Depending on what the ACL type is, this
may be an IP address, a series of IP addresses, a URL, a host name,
etc.
Edit an ACL
To edit an existing ACL, simply click on the highlighted name. You will
then be presented with a screen containing all relevant information
about the ACL. Depending on the type of the ACL, you will be shown
different data entry fields. The operation of each type is very similar,
so for this example, you'll step through editing of the localhost ACL.
Clicking the localhost button presents the page that's shown in Figure
6-8
Figure 6-8: Edit an ACL
69
SQUID and Webmin
The title of the table is Client Address ACL which means the ACL is of
the Client Address type, and tells Squid to compare the incoming IP
address with the IP address in the ACL. It is possible to select an IP
based on the originating IP or the destination IP. The netmask can also
be used to indicate whether the ACL matches a whole network of
addresses, or only a single IP. It is possible to include a number of
addresses, or ranges of addresses in these fields. Finally, the Failure
URL is the address to send clients to if they have been denied access
due to matching this particular ACL. Note that the ACL by itself does
nothing, there must also be a proxy restriction or ICP restriction rule
that uses the ACL for Squid to use the ACL.
Creating new ACL
Creating a new ACL is equally simple (Figure 6-9). From the ACL page,
in the Access control lists section, select the type of ACL you'd like to
create. Then click Create new ACL. From there, as shown, you can
enter any number of ACLs for the list.
Figure 6-9: Creating an ACL
70
SQUID and Webmin
Available ACL Types
Browser Regexp
A regular expression that matches the client’s browser type based on
the user agent header. This allows for ACL's operating based on the
browser type in use, for example, using this ACL type, one could
create an ACL for Netscape users and another for Internet Explorer
users. This could then be used to redirect Netscape users to a
Navigator enhanced page, and IE users to an Explorer enhanced page.
Probably not the wisest use of an administrators time, but does
indicate the unmatched flexibility of Squid. This ACL type correlates to
the browser ACL type.
Client IP Address
The IP address of the requesting client, or the clients IP address. This
option refers to the src ACL in the Squid configuration file. An IP
address and netmask are expected. Address ranges are also accepted.
Client Hostname
Matches against the client domain name. This option correlates to the
srcdomain ACL, and can be either a single domain name, or a list or
domain names, or the path to a file that contains a list of domain
names. If a path to a file, it must be surrounded parentheses. This ACL
type can increase the latency, and decrease throughput significantly on
a loaded cache, as it must perform an address-to-name lookup for
each request, so it is usually preferable to use the Client IP Address
type.
71
SQUID and Webmin
Client Hostname Regexp
Matches against the client domain name. This option correlates to the
srcdom_regex ACL, and can be either a single domain name, or a list
of domain names, or a path to a file that contains a list of domain
names. If a path to a file, it must be surrounded parentheses
Date and Time
This type is just what it sounds like, providing a means to create ACLs
that are active during certain times of the day or certain days of the
week. This feature is often used to block some types of content or
some sections of the Internet during business or class hours. Many
companies block pornography, entertainment, sports, and other clearly
non-work related sites during business hours, but then unblock them
after hours. This might improve workplace efficiency in some situations
(or it might just offend the employees). This ACL type allows you to
enter days of the week and a time range, or select all hours of the
selected days. This ACL type is the same as the time ACL type
directive.
Ethernet Address
The ethernet or MAC address of the requesting client. This option only
works for clients on the same local subnet, and only for certain
platforms. Linux, Solaris, and some BSD variants are the supported
operating systems for this type of ACL. This ACL can provide a
somewhat secure method of access control, because MAC addresses
are usually harder to spoof than IP addresses, and you can guarantee
that your clients are on the local network (otherwise no ARP resolution
can take place).
72
SQUID and Webmin
External Auth
This ACL type calls an external authenticator process to decide whether
the request will be allowed. Note that authentication cannot work on a
transparent proxy or HTTP accelerator. The HTTP protocol does not
provide for two authentication stages (one local and one on remote
Web sites). So in order to use an authenticator, your proxy must
operate as a traditional proxy, where a client will respond appropriately
to a proxy authentication request as well as external Web server
authentication requests. This correlates to the proxy_auth directive.
External Auth Regex
As above, this ACL calls an external authenticator process, but allows
regex pattern or case insensitive matches. This option correlates to the
proxy_auth_regex directive.
Proxy IP Address
The local IP address on which the client connection exists. This allows
ACLs to be constructed that only match one physical network, if
multiple interfaces are present on the proxy, among other things. This
option configures the myip directive.
Request Method
This ACL type matches on the HTTP method in the request headers.
This includes the methods GET, PUT, etc. This corresponds to the
method ACL type directive.
73
SQUID and Webmin
URL Path Regex
This ACL matches on the URL path minus any protocol, port, and host
name
information.
It
does
not
include,
for
example,
the
"http://www.swelltech.com" portion of a request, leaving only the
actual path to the object. This option correlates to the urlpath_regex
directive.
URL Port
This ACL matches on the destination port for the request, and
configures the port ACL directive.
URL Protocol
This ACL matches on the protocol of the request, such as FTP, HTTP,
ICP, etc.
URL Regexp
Matches using a regular expression on the complete URL. This ACL can
be used to provide access control based on parts of the URL or a case
insensitive match of the URL, and much more. This option is equivalent
to the url_regex ACL type directive.
Web Server Address
This ACL matches based on the destination Web server's IP address.
Squid a single IP, a network IP with netmask, as well as a range of
addresses
in
the
form
"192.168.1.1-192.168.1.25".
This
option
correlates to the dst ACL type directive.
Web Server Hostname
This ACL matches on the host name of the destination Web server.
74
SQUID and Webmin
Web Server Regexp
Matches using a regular expression on the host name of the
destination Web server.
6.10
Administrative Options
Administrative Options provides access to several of the behind the
scenes options of Squid. This page allows you to configure a diverse
set of options, including the user ID and group ID of the Squid process,
cache hierarchy announce settings, and the authentication realm
(Figure 6-10)
Figure 6-10: Administrative Options
Run as Unix user and group
The user name and group name Squid will operate as. Squid is
designed to start as root but very soon after drop to the user/group
specified here. This allows you to restrict, for security reasons, the
permissions that Squid will have when operating. By default, Squid will
operate as either nobody user and the nogroup group, or in the case of
some Squids installed from RPM as squid user and group. These
75
SQUID and Webmin
options
correlate
to
the
cache_effective_user
and
cache_effective_group directives.
Proxy authentication realm
The
realm
that
will
be
reported
to
clients
when
performing
authentication. This option usually defaults to Squid proxy-caching web
server, and correlates to the proxy_auth_realm directive. This name
will likely appear in the browser pop-up window when the client is
asked for authentication information.
Cache manager email address
The email address of the administrator of this cache. This option
corresponds to the cache_mgr directive and defaults to either
webmaster or root on RPM based systems. This address will be added
to any error pages that are displayed to clients.
Visible hostname
The host name that Squid will advertise itself on. This affects the host
name that Squid uses when serving error messages. This option may
need to be configured in cache clusters if you receive IP-Forwarding
errors. This option configures the visible_hostname.
Unique hostname
Configures the unique_hostname directive, and sets a unique host
name for Squid to report in cache clusters in order to allow detection of
forwarding loops. Use this if you have multiple machines in a cluster
with the same Visible Hostname.
Cache announce host, port and file
The host address and port that Squid will use to announce its
availability to participate in a cache hierarchy. The cache announce file
is simply a file containing a message to be sent with announcements.
76
SQUID and Webmin
These options correspond to the announce_host, announce_port, and
announce_file directives.
Announcement period
Configures the announce_period directive, and refers to the frequency
at which Squid will send announcement messages to the announce
host.
Most of the content in Chapter 6 is taken from Unix System Administration with
Webmin by Joe Cooper (2002) available online at
http://www.swelltech.com/support/webminguide/
77
7
Chapter
7. Analyzer
7.1 Structure of log file
In Fedora, the Squid log files are stored in the /var/log/squid directory
by default. It makes 3 log files which are:
ƒ
Access log
ƒ
Cache log
ƒ
Store log
Throughout this section, each log attribute will be discussed including it
content as well as how these logs might help admin debugging
potential problems.
Access log
Location : /var/log/squid/access.log
Description
ƒ
It contains entries of each time the cache has been hit or missed
when a client requests HTTP content.
78
Analyzer
ƒ
The identity of the host making the request (IP address) and the
content they are requesting.
ƒ
It also provides the expected time when content is being used
from cache and when the remote server must be accessed to
obtain the content.
ƒ
It contains the http transactions made by the users.
Format
Option 1 : This option will be used if the emulate http daemon log is
off.
Native format (emulate_httpd_log off)
Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content
Option 2 : This option will be used if the emulate http daemon log is
on.
Common format (emulate_httpd_log on)
Client Ident - [Timestamp1] "Method URI" Type Size
With:
Timestamp
The time when the request is completed (socket closed). The format is
"Unix time" (seconds since Jan 1, 1970) with millisecond resolution.
Timestamp1
When the request is completed
(Day/Month/CenturyYear:Hour:Minute:Second GMT-Offset)
Elapsed
The elapsed time of the request, in milliseconds. This is the time
between the accept() and close() of the client socket.
79
Analyzer
Client
The IP address of the connecting client, or the FQDN if the 'log_fqdn'
option is enabled in the config file.
Action
The Action describes how the request was treated locally (hit, miss,
etc).
Code
The HTTP reply code taken from the first line of the HTTP reply header.
For ICP requests this is always "000." If the reply code was not given,
it will be logged as "555."
Size
For TCP requests, the amount of data written to the client. For UDP
requests, the size of the request. (in bytes)
Method
The HTTP request method (GET, POST, etc), or ICP_QUERY for ICP
requests.
URI
The requested URI.
Ident
The result of the RFC931/ident lookup of the client username. If
RFC931/ident lookup is disabled (default: `ident_lookup off'), it is
logged as - .
Hierarchy
A description of how and where the requested object was fetched.
80
Analyzer
From
Hostname of the machine where we got the object
Content
Content-type of the Object (from the HTTP reply header).
The example of access.log file.
Figure 7-1 Access.log
From Figure 7-1, we know that the native format has been used. Here,
we try to understand each format fields over the contents of access.log
file. By taking the first line, we found the result as in Table 7-1
Format
Value
Timestamp
1173680297.727
Elapsed
450
Client
10.0.5.10
Action
TCP_MISS
Code
302
Size
786
Method
GET
URI
http://www.google.com/search?
Ident
–
Hierarchy
DIRECT
From
64.233.189.104
Content
text/html
Table 7-1 The format and its value
81
Analyzer
There are some elaborations on:
Timestamp
ƒ
The timestamp represents in UNIX time with a millisecond
resolution. However, it can be converted into more readable form
by using this short Perl script:
#!
/usr/bin/perl -p
s/^\d+\.\d+/localtime
$&/e;
Action
ƒ
The TCP_ codes (Table 7-2) refer to requests on the HTTP port
(usually 3128). Meanwhile the UDP_ codes refer to requests on
the ICP port (usually 3130)
Codes
Explanation
TCP_HIT
A valid copy of the requested object
was in the cache
TCP_MISS
The requested object was not in the
cache
TCP_REFRESH_HIT
The requested object was cached but
STALE. The IMS query for the object
resulted in "304 not modified"
TCP_REF_FAIL_HIT
The requested object was cached but
STALE. The IMS query failed and the
stale object was delivered
TCP_REFRESH_MISS
The requested object was cached but
STALE. The IMS query returned the new
content
82
Analyzer
TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma,
or
some
analogous
cache
control
command along with the request. Thus,
the cache has to re-fetch the object
TCP_IMS_HIT
The client issued an IMS request for an
object which was in the cache and fresh
TCP_SWAPFAIL_MISS
The object was believed to be in the
cache, but could not be accessed
TCP_NEGATIVE_HIT
Request for a negatively cached object,
e.g. "404 not found", for which the
cache
believes
inaccessible.
to
Also
know
refer
that
to
it
is
the
explainations for negative_ttl in your
squid.conf file
TCP_SWAPFAIL_MISS
The object was believed to be in the
cache, but could not be accessed
TCP_MEM_HIT
A valid copy of the requested object
was in the cache and it was in memory,
thus avoiding disk accesses
TCP_DENIED
Access was denied for this request
TCP_OFFLINE_HIT
The requested object was retrieved
from the cache during offline mode. The
offline mode never validates any object,
see offline_mode in squid.conf file.
UDP_HIT
A valid copy of the requested object
83
Analyzer
was in the cache
UDP_MISS
The requested object is not in this
cache
UDP_DENIED
Access was denied for this request
UDP_INVALID
An invalid request was received
UDP_MISS_NOFETCH
During "-Y" startup, or during frequent
failures, a cache in hit only mode will
return either UDP_HIT or this code.
Neighbours will thus only fetch hits
NONE
Seen
with
errors
and
cachemgr
requests
Table 7-2 TCP codes and Explanation
Code
ƒ
These codes are taken from RFC 2616 and verified for Squid.
Squid-2 uses almost all codes except 307 (Temporary Redirect),
416 (Request Range Not Satisfiable) and 417 (Expectation
Failed)
Code
84
Explanation
000
Used mostly with UDP traffic
100
Continue
101
Switching Protocols
102
Processing
200
OK
Analyzer
201
Created
202
Accepted
203
Non-Authoritative Information
204
No Content
205
Reset Content
206
Partial Content
207
Multi Status
300
Multiple Choices
301
Moved Permanently
302
Moved Temporarily
303
See Other
304
Not Modified
305
Use Proxy
[307
Temporary Redirect]
400
Bad Request
401
Unauthorized
402
Payment Required
403
Forbidden
404
Not Found
405
Method Not Allowed
406
Not Acceptable
407
Proxy Authentication Required
408
Request Timeout
409
Conflict
410
Gone
411
Length Required
412
Precondition Failed
413
Request Entity Too Large
414
Request URI Too Large
415
Unsupported Media Type
85
Analyzer
[416
Request Range Not Satisfiable]
[417
Expectation Failed]
*424
Locked
*424
Failed Dependency
*433
Unprocessable Entity
500
Internal Server Error
501
Not Implemented
502
Bad Gateway
503
Service Unavailable
504
Gateway Timeout
505
HTTP Version Not Supported
*507
600
Insufficient Storage
Squid header parsing error
Method
ƒ
Squid recognizes several request methods as defined in RFC
2616. Newer versions of Squid (2.2.STABLE5 and above) also
recognize RFC 2518 ``HTTP Extensions for Distributed Authoring
-- WEBDAV'' extensions (Table 7-3).
method
defined
cachabil. meaning
GET
HTTP/0.9
possibly
object
retrieval
and
simple
searches
HEAD
HTTP/1.0
possibly
POST
HTTP/1.0
CC
metadata retrieval
or submit data (to a program)
Exp.
PUT
HTTP/1.1
never
upload data (e.g. to a file)
DELETE
HTTP/1.1
never
remove resource (e.g. file)
TRACE
HTTP/1.1
never
appl. layer trace of request route
86
Analyzer
OPTIONS
HTTP/1.1
never
CONNECT
HTTP/1.1r3 never
request available comm. options
tunnel SSL connection
ICP_QUERY Squid
never
used for ICP based exchanges
PURGE
Squid
never
remove object from cache.
PROPFIND
rfc2518
?
retrieve properties of an object
PROPATCH
rfc2518
?
change properties of an object
MKCOL
rfc2518
never
create a new collection
COPY
rfc2518
never
create a duplicate of src in dst
MOVE
rfc2518
never
atomically move src to dst
LOCK
rfc2518
never
Lock
an
object
against
modifications
UNLOCK
rfc2518
never
unlock an object
Table 7-3 List of Methods
87
Analyzer
Hierarchy
The following hierarchy codes are used in Squid-2 (Table 7-4):
Codes
Explanation
NONE
For TCP HIT, TCP failures, cachemgr
requests and all UDP requests, there is no
hierarchy information.
DIRECT
The object was fetched from the origin
server.
SIBLING_HIT
The object was fetched from a sibling
cache which replied with UDP_HIT.
PARENT_HIT
The object was requested from a parent
cache which replied with UDP_HIT.
DEFAULT_PARENT
No ICP queries were sent. This parent was
chosen because it was marked ``default''
in the config file.
SINGLE_PARENT
The object was requested from the only
parent appropriate for the given URL.
FIRST_UP_PARENT
The object was fetched from the first
parent in the list of parents.
NO_PARENT_DIRECT
The object was fetched from the origin
server, because no parents existed for the
given URL.
FIRST_PARENT_MISS
The object was fetched from the parent
with the fastest (possibly weighted) round
88
Analyzer
trip time.
CLOSEST_PARENT_MISS
This
parent
was
chosen,
because
it
included the lowest RTT measurement to
the origin server. See also the closestsonly peer configuration option.
CLOSEST_PARENT
The parent selection was based on our
own RTT measurements.
CLOSEST_DIRECT
Our own RTT measurements returned a
shorter time than any parent.
NO_DIRECT_FAIL
The
object
could
not
be
requested
because of a firewall configuration, see
also never_direct and related material,
and no parents were available.
SOURCE_FASTEST
The origin site was chosen, because the
source ping arrived fastest.
ROUNDROBIN_PARENT
No ICP replies were received from any
parent. The parent was chosen, because
it was marked for round robin in the
config file and had the lowest usage
count.
CACHE_DIGEST_HIT
The peer was chosen, because the cache
digest predicted a hit. This option was
later replaced in order to distinguish
between parents and siblings.
CD_PARENT_HIT
The parent was chosen, because the
89
Analyzer
cache digest predicted a hit.
CD_SIBLING_HIT
The sibling was chosen, because the
cache digest predicted a hit.
NO_CACHE_DIGEST_DIR
This output seems to be unused?
ECT
CARP
The peer was selected by CARP.
ANY_PARENT
part of src/peer_select.c:hier_strings[].
INVALID CODE
part of src/peer_select.c:hier_strings[].
Table 7-4 Hierarchy Codes in Squid-2
Cache log
Location : /var/log/squid/cache.log
Description
ƒ
It contains various messages such as information about Squid
configuration, warnings about possible performance problems
and serious errors.
ƒ
Error and debugging messages of particular squid modules
Format
[Timestamp1]| Message
With
Timestamp1
When the event occurred (Year/Month/Day Hour:Minute:Second)
90
Analyzer
Message
Errors
Description of the event
ERR_READ_TIMEOUT
The
remote
site
or
network
is
unreachable - may be down.
ERR_LIFETIME_EXP
The remote site or network may be
too slow or down.
ERR_NO_CLIENTS_BIG_OBJ
All
clients
went
away
before
transmission completed and the object
is too big to cache.
ERR_READ_ERROR
The remote site or network may be
down.
ERR_CLIENT_ABORT
Client
dropped
connection
before
transmission completed. Squid fetches
the Object according to its settings for
`quick_abort'.
ERR_CONNECT_FAIL
The remote site or server may be
down.
ERR_INVALID_REQ
Invalid HTTP request
ERR_UNSUP_REQ
Unsupported request
ERR_INVALID_URL
Invalid URL syntax
ERR_NO_FDS
Out of file descriptors
ERR_DNS_FAIL
DNS name lookup failure
91
Analyzer
ERR_NOT_IMPLEMENTED
Protocol Not Supported
ERR_CANNOT_FETCH
The requested URL can not currently
be retrieved.
ERR_NO_RELAY
There is no WAIS relay host defined
for this cache.
ERR_DISK_IO
The system disk is out of space or
failing.
ERR_ZERO_SIZE_ OBJECT
The
remote
server
closed
the
connection before sending any data.
ERR_FTP_DISABLED
This
cache
is
configured
to
NOT
retrieve FTP objects.
ERR_PROXY_DENIED
Access
Denied.
The
user
must
authenticate himself before accessing
this cache.
Table 7-5 List of Error Messages
92
Analyzer
The example of cache.log file (Figure 7-2).
Figure 7-2 Cache.log
Store log
Location : /var/log/squid/store.log
Description
ƒ
It contains the information and status of [not] stored objects
Format
Timestamp Tag Code Date LM Expire Content Expect/Length Methods Key
With:
Timestamp
The time entry was logged. (Millisecond resolution since 00:00:00 UTC,
January 1, 1970)
Tag
SWAPIN (swapped into memory from disk), SWAPOUT (saved to disk)
or RELEASE (removed from cache)
Code
The HTTP replies code when available. For ICP requests this is always
"0". If the reply code was not given, it will be logged as "555."
93
Analyzer
The following three fields are timestamps parsed from the HTTP reply
headers. All are expressed in Unix time (i.e.(seconds since 00:00:00
UTC, January 1, 1970). A missing header is represented with -2 and an
unparsable header is represented as -1.
Date
The time captures from the HTTP Date reply header. If the Date
header is missing or invalid, the time of the request will be used
instead.
LM
The value of the HTTP Last-Modified: reply header.
Expires
The value of the HTTP Expires: reply header.
Content
The HTTP Content-Type reply header.
Expect
The value of the HTTP Content-Length reply header. The Zero value
will be returned if the Content-Length was missing.
/Length
The number of bytes of content actually read. If the Expect is nonzero, and not equal to the Length, the object will be released from the
cache.
Method
The request method (GET, POST, etc).
94
Analyzer
Key
The cache key. Often this is simply the URL. Cache objects which never
become public will have cache keys that include a unique integer
sequence number, the request method, and then the URL.
( /[post|put|head|connect]/URI )
The example of store.log file (Figure 7-3).
Figure 7-3 Store.log
Based on Figure 7-3, we try to understand each format fields over the
contents of store.log file. By taking the second line, we found that
(Table 7-6):
Format
Value
Timestamp
1173680297.727
Tag
Release
Code
-1
Date
FFFFFFFF
LM
7832CBDDD1604B89D0F75A2437F37AD7
Expire
302
Content
1173680306 -1 -1 text/html
Expect
-1
/Length
/278
Methode
GET
Key
http://www.google.com/search?
Table 7-6 Format in Store.log
95
Analyzer
7.2 Methods
Log Analysis Using Grep Command
The log files also can be analysed using Linux or UNIX command such
as grep. It is used to filter the required information from any log files.
By using a terminal, follow the following commands in order to start
analysis the related log file.
For example:
# cat /var/log/squid/access.log | grep www.google.com
By referring Figure 7-4, the output shows the result of grep command
for the access.log file. The same technique can be applied for cache.log
and store.log files.
Figure 7-4 Analysis the Access.log using Grep command
Log Analysis Using Sarg-2.2.3.1
Basically, the preferred log file for analysis is the access.log file in the
native format. We choose to use Squid Analysis Report Generator
(Sarg) as a tool. It is used to analyze the users pattern concerning the
Internet surfing. It generates reports in html including many fields
such as users, IP addresses, bytes, sites and times.
This tool can be downloaded from:
http://linux.softpedia.com/get/Internet/Log-Analyzers/sarg-102.shtml
96
Analyzer
7.3 Setup Sarg-2.2.3.1
Step:
Download software named Sarg-2.2.3.1.tar.gz for Linux and Unix
environment.
Make a new directory called installer located in the root path.
# mkdir /installer
Copy the downloaded file into the installer directory
# copy sarg-2.2.3.1.tar.gz
/installer
Then, go into the directory and extract file Sarg-2.2.3.1.tar.gz using
the following command.
# tar –zxvf sarg-2.2.3.1.tar.gz
After successfully extracted, go into sar-2.2.3.1 directory and start
configure it. Follow these command:
#
#
#
#
cd /installer/sar-2.2.3.1
./configure
make
make install
NOTE: Make sure the Squid already started before run the following
script.
Go into sarg-2.2.3.1 directory, run the sarg script.
# ./sarg
The generated result will be kept at /var/www/html/squid-reports. It is
recommended to view using GUI enviroment.
97
Analyzer
7.4 Report Management Using Webmin
For managing the report, we choose to use Webmin which is a webbased interface for system administration for Unix. In our case, it helps
admin to set some information such as the location of log source and
report destination, the format of generated report, the size of report
and also the schedule of automatic report to be generated.
Step:
1. Make sure the webmin is already setup in the server. Then, open
the browser and type http://127.0.0.1:10000/ to find the webmin.
After that, login the webmin.
Figure 7-5 Login
2. Choose Server tab, and then click on Squid Analysis Report
Generator. There are four (4) modules being offer such as Log
Source and Report Destination, Report Option, Report Style and
Scheduled Report Generation.
98
Analyzer
Figure 7-6 Sarg Main Modules in Webmin
3. Click on Log Source and Report Destination icon. In this module,
admin allows to set the source of log file and also define the
destination of generated report. For report maintenance, it also
allows admin to set the number of report to keep in certain location
and acknowledgement can be sent to admin’s e-mail.
Note: Please check the sarg.conf file which is located in
/usr/local/sarg/sarg.conf to ensure the correct path for locating
the source of log files.
Figure 7-7 Setting on Source and Destination Report
99
Analyzer
After setting the changes, click on Save button.
4. Click on Report Option icon. In this module, admin can manages the
pattern of generated report including data ordering, size of data
displayed, data format and log file rotation.
There are several types of report can be generates depending on
the implementation of access control list (ACL) that has been set
before.
For log file rotation, it becomes important to ensure enough disk
space to handle log file storage especially when it involves the long
term evaluations. This can covers more in Scheduled Report
Generation.
100
Analyzer
Figure 7-8 Setting on Report Content and Generation Option
101
Analyzer
5. Click on Report Style icon. Here, it allows admin to make the
generated report looks more interesting in terms of language, title
and other common style setting.
Figure 7-9 Setting on HTML Report Style and Colour Option
6. Click on Scheduled Report Generation icon. In this module, admin
allows to define the frequency of generated report by enabling the
selected or default schedule stated.
Regarding to rotate feature in Squid, it is recommended to apply
simple schedule. During a time of some idleness, the log files are
safely transferred to the report destination in one burst. Before
transport, the log files can be compressed during off-peak time. On
the destination, the log file is concatenated into one file. Therefore
one file for selected hour is the yield. However, it is depends on
company’s requirement on how to generate report.
102
Analyzer
Figure 7-10 Setting on Scheduled Reporting Options
7. After setting some information in Scheduled Report Generation, the
following statement will be displayed on the main page.
Figure 7-11 Generate Report Setting
103
Analyzer
There are some considerations to be taken:
1.
Should never delete access.log, store.log, cache.log while
Squid is running. There is no recovery file.
2.
In squid.conf file, the following statements can be applied if
admin wants to disable certain log file. For example:
To disable access.log:
cache_access_log /dev/null
To disable store.log:
cache_store_log none
To disable cache.log:
cache_log /dev/null
However, the cache.log is not suitable to be disabled because it
has
file
messages.
104
contains
many
important
status
and
debugging
Analyzer
7.5 Log Analysis and Statistic
After running the Sarg analyser, the reports will generated for
access.log. This can be found in /var/www/html/squid-reports.
Figure 7-12 Collection of Squid Report for Access.log
From Figure 7-12, throughout this example we found that there are
three (3) reports generated. Basically, the latest version has no
number at the end of the filename. Each time the access log file being
analysed, the filename will renamed and an incremental number will be
placed automatically at the end of the file.
For example 2007Mar22-2007Mar22.2 was the first report had been
generated compared to 2007Mar22-2007Mar22 which indicated as the
latest version report.
105
Analyzer
Based on (Figure 7-13), the index.html file shows the list of reports
that have been generated by Sarg. To get more detail information for a
specific report, we need to click on the selected file name.
Figure 7-13 Summary of Squid reports
For example, a folder named as 2007Mar22-2007Mar22 has been
selected and opened. From (Figure 7-14), there are several standard
files which can be found in all Squid reports. Briefly, there are five (5)
html reports show statistical information regarding to index, denied,
download, siteuser and topsites. Besides, the folder also presents
collection of report for specific user by their IP addressess.
106
Analyzer
Figure 7-14 Contents of 2007Mar22-2007Mar22 as example
The following figure will show the html reports:
1. Index html
2.
Figure 7-15 Index html
107
Analyzer
3. Denied html
Figure 7-16 Denied html
4. Download html
Figure 7-17 Download html
108
Analyzer
5. Sites and Users
Figure 7-18 Siteuser html
109
Analyzer
6. Top 100 Sites
Figure 7-19 Topsites html
If we click on specific IP address, we will view all information as in
Figure 7-20
Figure 7-20 Reports generated for specific user (IP Address)
110