12. Chapter - Statistics and Data Privacy

12.1. Proxy Statistics

12.1.1. Proxy Logging

Under Information > Statistics > Settings it is possible to configure whether or not the proxy server within the Intra2net system (see 11. Chapter, „Proxy“) should log all website accesses to a log file. Furthermore, these log files can be analyzed and processed automatically.

If activated, the proxy log files are saved as monthly files. These are available in the Information > System > Logfiles menu. They are saved in the standard format of Squid Proxy. The time is given as Unix time in seconds since 1/1/1970 0:00h, UTC. If you want to search the files manually, it is advisable to convert the time by using "Download with normal time".

12.1.2. Analysis

If activated, the proxy log files are analyzed on a monthly basis and displayed as statistics. The current month is always updated on the hour. This statistics are available under Information > Statistics > Proxy.

The statistics can be filtered by web pages, clients or users by using the drop-down menu above. Displaying user logins only makes sense if the proxy is used with authentication.

By default, the rows are sorted by access time. They can be rearranged and sorted by the other values displayed by using the header line.

The statistics can be further narrowed down to individual clients, web pages or days from the overview of web pages, clients and users. This can be done by clicking on the initially displayed column.

By clicking the arrow symbol beside each web page, it can be opened in the browser and its contents can be examined. If a page is to be blocked in the future, it can be marked with the checkbox in the last column and directly added to a URL-blacklist by clicking the button below.

Many websites load their content from different sources, be it text or banner advertising. Hence, there will be servers like google-analytics.com, doubleclick.net etc. in the "Top 50 websites" evaluation, which were loaded passively on websites. This content was not actively visited by a user.

12.1.3. Methodology

The following describes how each access is compiled and converted into the displayed values.

In order to provide an overview, the statistics only stores a shortened name of the visited web address. for example both "http://www.web.com/shopping/" and "web.com/mail/" would become "web.com".

Most web pages consist not only of text formatted in HTML, but also of graphics, flash animations etc. In order to get a meaningful figure for the number of web pages visited, only the calls for which one of the following data types was transmitted are counted as the number displayed under page accesses:

  • text/html

  • text/plain

  • text/javascript

After retrieving a web page, there is unfortunately no way for the proxy to determine for how long the page has actually been read. Therefore, the proxy statistics can only roughly calculate the duration.

For each first visit to a website, a 60-second retention period is used. If there is another access to the same server within this minute, the time interval is added to the last access period. If the interval between two accesses is more than 60 seconds, the original 60 seconds are reapplied. For the retention period, only calls of data types that are also counted as page accesses are counted (see above).

With period overviews, the number of page views in an hour is summarized and the displayed box becomes darker as more hits have occurred in this hour.

If access to a web page is blocked by a proxy filter mechanism, the access is still logged and evaluated as a normal access. A separate evaluation based on allowed and blocked access is not possible.