Optimizing Network Performance through Advanced Caching Presenter Caroline Namuddu cnamuddu@renu.ac.ug Knowledge | Community | Solutions Knowledge | Community | Solutions 1 Overview 31st October 2024 Outline Background What is Caching Caching Flow Process Cache Management Implementation Observations Conclusion Knowledge | Community | Solutions Knowledge | Community | Solutions Background (Online Activity for LMCs) Google Cloud Location Vs Internet Activity for LMCs Knowledge | Community | Solutions Knowledge | Community | Solutions Caching is the term for storing reusable responses in order to make subsequent requests faster. Reduced Bandwidth Usage: By storing frequently accessed content locally, caching reduces the need to fetch the same data repeatedly from external servers. This saves significant bandwidth, especially for large files and repeated requests. Enhanced User Experience: Faster page load times and quick access to frequently used content significantly improve the browsing experience, leading to higher user satisfaction Background (RENU Traffic Composition) Knowledge | Community | Solutions Knowledge | Community | Solutions Caching is the term for storing reusable responses in order to make subsequent requests faster. Reduced Bandwidth Usage: By storing frequently accessed content locally, caching reduces the need to fetch the same data repeatedly from external servers. This saves significant bandwidth, especially for large files and repeated requests. Enhanced User Experience: Faster page load times and quick access to frequently used content significantly improve the browsing experience, leading to higher user satisfaction Background (Problem Statement) Most internet users in Africa, particularly in Uganda, rely on services hosted internationally for education, research, and personal business. However, the limited bandwidth capacity subscribed to by members, often constrained by budget limitations, results in the overutilization of available resources. This overuse leads to degraded service quality, poor latency, and ultimately dissatisfied users due to the inability of the network to handle the high demand effectively Knowledge | Community | Solutions Knowledge | Community | Solutions Caching is the term for storing reusable responses in order to make subsequent requests faster. Reduced Bandwidth Usage: By storing frequently accessed content locally, caching reduces the need to fetch the same data repeatedly from external servers. This saves significant bandwidth, especially for large files and repeated requests. Enhanced User Experience: Faster page load times and quick access to frequently used content significantly improve the browsing experience, leading to higher user satisfaction RENU Solution Solution: setup a web caching service in RENU Objectives Reduce bandwidth cost on RENU Improve performance (user experience) Knowledge | Community | Solutions Knowledge | Community | Solutions Caching is the term for storing reusable responses in order to make subsequent requests faster. Reduced Bandwidth Usage: By storing frequently accessed content locally, caching reduces the need to fetch the same data repeatedly from external servers. This saves significant bandwidth, especially for large files and repeated requests. Enhanced User Experience: Faster page load times and quick access to frequently used content significantly improve the browsing experience, leading to higher user satisfaction What is Caching Caching is the process of storing copies of frequently accessed data in a temporary location (cache) to speed up future access to that data. Modes of Caching: Transparent Proxy Reverse Proxy Forward Proxy Knowledge | Community | Solutions Knowledge | Community | Solutions Modes of Caching: Transparent Proxy Caching: Definition: A proxy server that intercepts and caches web traffic without requiring any client-side configuration. Users are unaware that their requests are being cached. Use Case: Often used by ISPs or network administrators to reduce bandwidth usage by storing frequently accessed content. Example: A user's web request is intercepted by the proxy, and if the requested content is cached, it is served from the cache without contacting the origin server. Forward Proxy Caching: Definition: A forward proxy caches content on behalf of the client and forwards requests to the internet. This proxy sits between the client and the web server. Use Case: Organizations or users can configure their devices to use a forward proxy for content filtering, privacy, or caching frequently accessed websites. Example: A company might use a forward proxy to cache large files requested by multiple employees, reducing the need to download them repeatedly. Reverse Proxy Caching: Definition: A reverse proxy caches content on behalf of the server. It sits between the client and the server, serving cached content to users when available. Use Case: Popular in large web infrastructures, where caching static content (like images, videos) offloads the origin server. Example: A website using a reverse proxy like NGINX or Varnish to cache static assets, improving response times for users. Content Delivery Network (CDN) Caching: Definition: CDNs are distributed networks of servers that cache web content globally, delivering it from the server closest to the user. Use Case: Used to improve website performance by reducing latency and server load. Example: A user in Africa requests content from a U.S.-based website, and the CDN serves the content from a nearby server in Africa instead of fetching it from the U.S. Uses of Caching Caching: Store frequently accessed web content users for faster. Content Filtering: Block access to certain websites or content. Monitoring: monitor user activity and bandwidth. Knowledge | Community | Solutions Knowledge | Community | Solutions Caching: Often used for web caching to store frequently accessed web content and serve it to users faster. Content Filtering: It can also be used to block access to certain websites or content without the user knowing. Monitoring: Administrators can monitor user activity and bandwidth usage without disrupting the user experience. Transparent Operation: Since it's "transparent," users don’t realize their data is being routed through the proxy server. Network Positioning: Typically deployed at the network's gateway, where it can inspect and manage traffic before it reaches its destination. Caching Flow Process Client Request Interception by cache. Cache Check(Cache Hit, Cache Miss) Retrieving Content from the Internet (if Cache Miss) Caching &User Response Subsequent Requests (Cache Hit) Knowledge | Community | Solutions Knowledge | Community | Solutions Client Request (User Initiates Web Traffic) A user on the network attempts to access a web resource, such as a website, without knowing about the cache. They make a request by typing a URL into their browser or by making an HTTP/HTTPS request. The user’s device sends this request to the router, expecting it to be forwarded to the internet. 2. Interception by Transparent Proxy The router is configured to redirect web traffic to a transparent proxy/cache server. The router uses traffic filtering rules (such as IP tables or firewall rules) to divert HTTP or HTTPS traffic to the caching system. This interception occurs without any modifications on the user’s device or knowledge that their request is being processed through a proxy. 3. Cache Check When the request reaches the transparent proxy or caching server, it checks if the requested content (such as a webpage, image, or video) is already stored in the local cache. Cache Hit: If the content is found in the cache, it serves the content directly from the cache, eliminating the need to fetch it from the original web server. Cache Miss: If the content is not in the cache, the proxy forwards the request to the internet (the original web server) to fetch the content. 4. Retrieving Content from the Internet (if Cache Miss) For a cache miss, the proxy fetches the content from the original web server as usual. Once it retrieves the content, it serves it to the user and simultaneously stores a copy of the content in the cache for future requests. This storage happens according to the caching policies set on the proxy, such as time-to-live (TTL) for how long the content remains in the cache. 5. Caching and Response to the User After fetching the content from the original server (in the case of a cache miss) or retrieving it from the local cache (in the case of a cache hit), the transparent proxy forwards the content to the user’s browser. The user receives the content as if they were interacting directly with the website, without knowing that the traffic was routed through a caching system. 6. Subsequent Requests (Cache Hit) When another user (or the same user) requests the same content that has already been cached, the caching server serves the content directly from the cache. This reduces bandwidth consumption and speeds up the response time since the request doesn’t need to go out to the internet again. 7. Cache Management The proxy has algorithms to manage cached content: Eviction Policies: Determine which content to remove when the cache is full (e.g., Least Recently Used (LRU)). Time-to-Live (TTL): Determines how long the cached content is valid before it needs to be refreshed from the original server. Content Filtering: Determines which content is cacheable (e.g., static content like images, CSS, and HTML pages) versus non-cacheable content (e.g., dynamic content, user-specific data). Cache Management The proxy has algorithms to manage cached content: Eviction Policies: Determine which content to remove when the cache is full (e.g., Least Recently Used (LRU)). Time-to-Live (TTL): Determines how long the cached content is valid before it needs to be refreshed from the original server. Content Filtering: Determines which content is cacheable (e.g., static content like images, CSS, and HTML pages) versus non-cacheable content (e.g., dynamic content, user-specific data). Knowledge | Community | Solutions Knowledge | Community | Solutions Why These Algorithms Matter: Efficiency: These algorithms ensure that the cache is being used optimally, storing the most relevant and frequently accessed data while discarding less important or outdated content. Performance: Proper eviction and TTL policies help maintain a balance between keeping fresh data in the cache and minimizing the need for repeated requests to the origin server, improving performance for users. Resource Management: Since cache storage is typically limited, these algorithms help ensure that valuable storage space is allocated wisely, improving overall system performance and user satisfaction. RENU Implementation (Transparent Mode) Operation: Transparent Mode. Positioning: Edge Router Software: Apache Traffic Server Knowledge | Community | Solutions Knowledge | Community | Solutions Operation: Users don’t realize their data is being routed through the proxy server. Positioning: Typically deployed at the network's gateway. Software: Apache Traffic Server(Recommended for large networks.) Observations Knowledge | Community | Solutions Knowledge | Community | Solutions the description of the graph -> TCP-MISS describe the types cache events the URLs -> Dynamic looks + content 1. TCP_MISS: Definition: A TCP_MISS means that the requested content was not found in the cache. When the user made the request, the proxy server had to fetch the content from the origin server because it wasn’t cached. Implication: Higher numbers of TCP_MISS indicate that many requests are not being served from the cache, leading to increased bandwidth usage as the proxy must retrieve the content from the original web server. 2. TCP_MEM_HIT: Definition: A TCP_MEM_HIT indicates that the requested content was found in the cache, and it was retrieved from memory (RAM) rather than from disk. This is the fastest way to serve cached content. Implication: The higher the TCP_MEM_HIT, the better the cache performance, as content stored in memory can be delivered to users quickly without accessing the origin server. 3. TCP_HIT: Definition: A TCP_HIT occurs when the requested content was found in the cache, and the proxy served it directly from the local cache, usually from disk. Implication: This indicates effective caching, as the user received the content directly from the proxy without requiring external bandwidth to fetch the content. 4. TCP_REFRESH_MISS: Definition: A TCP_REFRESH_MISS means the content was cached, but it had expired. The proxy had to revalidate the content with the origin server and possibly update the cache. Implication: It indicates that while the content was cached, it was not fresh, so the cache had to be updated. This can still reduce the load on the network, as only part of the content needs to be fetched. 5. ERR_CLIENT_READ_ERROR: Definition: This indicates that the client made a request, but the proxy encountered an error while reading the client’s request. The request might have been incomplete, or there could have been network issues. Implication: Errors like this could be caused by connection failures, incorrect requests from the client, or other network-related problems. 6. Cache Event Count Over Time (Graph) This graph represents the number of cache events (e.g., TCP_MISS, TCP_HIT, etc.) over a specific period of time. Observation: Most events are TCP_MISS, especially at the beginning of the time frame. This indicates that the cache was often required to fetch data from the internet rather than serving it directly from local storage. 7. Client Request Stats Table This section lists specific client requests made to different URLs and how the cache responded (e.g., TCP_MISS or TCP_MEM_HIT). Example: The first row shows that the client with the IP address 102.34.0.7 made a request to http://connectivitycheck.gstatic.com/generate_204, which resulted in a TCP_MISS, meaning the cache had to fetch the content from the origin server. The number under "Count" indicates how many times this type of event occurred for that specific URL. Observations HTTP content can be cached, hence LESS international traffic Majority of RENU HTTP content is NOT cachable. Dynamic URLs Dynamic content Possibility of filtering malicious URLs HTTPS content can not cached with no TLS/SSL certificates (reverse proxy mode) Knowledge | Community | Solutions Knowledge | Community | Solutions Significant Improvement with Caching: The implementation of caching has shown a clear improvement in network performance, reducing bandwidth usage and improving content delivery speeds for users. Better User Experience: Faster content access and reduced latency have led to enhanced user satisfaction, addressing the issues of poor service experience and slow load times. Sustainable Network Optimization: Caching provides a sustainable way to optimize network resources, especially in environments with budgetary and bandwidth constraints, such as Uganda. By reducing external traffic, it also alleviates the load on international servers. Potential for Further Enhancements: While the current caching setup is effective, future enhancements, including handling HTTPS traffic and improving cache hit ratios, can further optimize performance. Closing Statement: Caching is a key solution for improving network efficiency and user satisfaction, especially in resource-constrained environments. Continued optimization and monitoring will ensure long-term success. 13 Challenges Dynamic Content HTTPS and Encrypted Traffic Stale or Expired Data Memory and Storage Constraints Cache Miss Penalties Knowledge | Community | Solutions Knowledge | Community | Solutions Dynamic Content: Content that changes frequently (such as live data, personalized user data) cannot be effectively cached. This limits the benefits of caching for certain types of websites and applications. HTTPS and Encrypted Traffic: HTTPS content cannot be cached unless additional measures (like SSL/TLS termination) are taken. This limits caching effectiveness as most websites today use HTTPS for security. Stale or Expired Data: Cached content may become outdated or stale if not refreshed regularly. Managing cache expiration and invalidation can be complex, leading to potential inconsistencies. Memory and Storage Constraints: Caching consumes storage or memory resources, which must be managed effectively to avoid overloading the system. Large-scale caching requires sufficient infrastructure. Cache Miss Penalties: If data is not in the cache (cache miss), fetching it from the origin server can result in delays, especially if the request volume is high and the cache hit ratio is low. Conclusion Caching provides a sustainable way to optimize network resources, especially in environments with budgetary and bandwidth constraints, such as Uganda. By reducing external traffic, it also alleviates the load on international servers. Caching is a key solution for improving network efficiency and user satisfaction, especially in resource-constrained environments. Continued optimization and monitoring will ensure long-term success. Knowledge | Community | Solutions Knowledge | Community | Solutions Conclusion Significant Improvement with Caching: The implementation of caching has shown a clear improvement in network performance, reducing bandwidth usage and improving content delivery speeds for users. Better User Experience: Faster content access and reduced latency have led to enhanced user satisfaction, addressing the issues of poor service experience and slow load times. Sustainable Network Optimization: Caching provides a sustainable way to optimize network resources, especially in environments with budgetary and bandwidth constraints, such as Uganda. By reducing external traffic, it also alleviates the load on international servers. Potential for Further Enhancements: While the current caching setup is effective, future enhancements, including handling HTTPS traffic and improving cache hit ratios, can further optimize performance. Closing Statement: Caching is a key solution for improving network efficiency and user satisfaction, especially in resource-constrained environments. Continued optimization and monitoring will ensure long-term success. Thank You! Knowledge | Community | Solutions Knowledge | Community | Solutions image1.png image2.png image3.png image4.png image5.png image6.png image7.png image8.png image9.jpeg image10.png image11.png image12.png image13.png image14.png image15.png