Skip to main content

Wondering Cache

cache-indexing

Some of you may have already heard about the computer cache in your curriculum or some of you may have not. But this article is intended for all of those who want to understand cache concepts. The next part of it consists of developing a cache simulator using C programming language. I believe that this article will give a clear insight of cache and how it works. So lets see what we will be learning in this series of articles.

Cache Introduction

The first thing we should know what are we talking about. Cache is a highly efficient (really fast) memory which can bring data to processor at a much more speed than our typical RAM. CPU cache memory operates between 10 to 100 times faster than RAM, requiring only a few nanoseconds to respond to the CPU request. Cache operates faster because it resides usually on the same chip as the processor. But it makes them expensive. Also caches are smaller in size (around 512KB to 3MB) whereas computer RAM can be upto 16GB. The reason why caches are smaller is because it makes the mapping techniques faster and also reduces cost for a normal computer. So cache behaves a temporary storage for RAM data. Computer processor looks for data in cache first and if it is not present, then it gets the data from RAM and stores it in cache for future reference.

cache-basic-block-diagram

Cache Organisation

As we know now that cache is smaller than RAM, it cannot store all the RAM data. In RAM memory, a memory cell number serves the purpose of index for the memory which is also our physical address. For example, if we want to store x on some address 32 then this means that we just go to 32nd cell number into RAM and writes our data. So we don't need to consider an index table for RAM because RAM is all the memory we have got.

ram-indexing

But the same address cannot work for cache memory because lets say our cache has only 16 cells. This means we need to create a seperate index table for our cache memory into which we will search for cache cell number which contains data for given physical address. For our earlier example, if our cache has only 16 memory cells and we want to search for data for address 32 then we will look into cache index and find which cache cell holds data for address 32. One way would be to have a 16-entry index table where each entry represents the cache cell number and contains the physical address for which the data is stored in that particular cell. The following picture will give you a better idea than my words (But read my words again after looking at the picture, you will understand what i am saying).

cache-indexing

So this is how it works.There are several ways to find a physical address in cache index. One of which would be to search all index entries one by one but that will make our cache inefficient. Thats why we use mapping techniques which maps RAM addresses to Cache addresses and make it efficient. The next article will make you understand what are the main mapping techniques are and how they work.

Understanding a few Concepts

There are 3 mapping techniques all of which have their advantages and disadvantages. But all of them have some things in common. Each technique breaks a physical address into some components which we will discuss first. If you don't understand what these terms mean then its not a problem as the picture will make their meanings more clear. There are also some formulas related but you don't need to cram them. You will understand how these formulas are derived on your own.

  • Word: This is the unit of data which is transferred for each operation.
  • Word Length: This is the maximum number of bytes allowed to transfer in one operation. For eg., In a 32-bit CPU, word length is 4 bytes (32 bits/8 bits)
  • Block: A group of words which is transferred between cache and main memory is called a block. This implies that whenever the requested data is missing from cache then a whole block of words is transferred from main memory to cache instead of just one word. The significance of transferring a whole block comes from the fact that most of the times data is stored sequentially by a program (stack memory) and thus transferring a whole block reduces the chances of next failed request. So all data transfer between cache and main memory happens in forms of blocks but data transfer between any memory and CPU happens in word.
  • Cache Hit: When the processor requests data from cache and finds it in the cache, then it is said to be a cache hit.
  • Cache Miss: Just the opposite of cache hit. When requested data is not in cache then it is a cache miss. Caches are designed to reduce cache misses and to increase cache hits. In case of a cache miss during data read, the required data is read from main memory after transfering a block of data to cache.
  • Byte Select Bits: These are the lower significance bits of an address which are used to select a specific byte of a word to select for operation. This is internal for a CPU. That is when a CPU asks for some data, whole word is transferred and which byte of that word is used for operation is out of our concern. For eg., In a 16-bit CPU, total 2 bytes (word length) are transferred for each request. This means only 1 bit (log(word_length)) is used for byte select. If the address is of 16 bits (64KB RAM) then we look at only more significant 15 bits (address bits - byte select bits) to look for data in RAM and transfer the 2 bytes regardless of what the last bit was. So if CPU asks for address XXX0010 or XXX0011 (XXX - first 12 bits) then we look for same location that is XXX001 (last bit ignored) and send both bytes in each case i.e. XXX0010 , XXX0011
  • ram-data-transfer
  • Cache Line bits: Consider the cache index is index of some book and it contains several pages. So our index is written on several pages. Lines bits determine the page number on which to search for our physical address. This way we save our time by limiting our search to only one page instead of many. Remember that more entries on one page means that we will have to match each entry against given address. This means a linear search for the address is necessary after finding the page number of our memory address. Number of line bits depend on mapping techniques which we will cover in next article. Line bits are next bits after byte select bits which help us to limit our search for address in cache index
  • Tag Bits: After finding the page number, we will have to search our address. But we do not need to store the whole address as index entries because we already know some part of it which are the line bits and word bits. So we store the remaining bits of the address (excluding byte select, word and line bits) which are called the tag bits. Hence an address is broken into 4 parts : tag bits, line bits, word bits, byte select bits.
  • Word Bits: As the data transfer between main memory and cache memory happens in blocks, some bits are needed to transfer the required word from cache to processor. These are the word bits. Cache index tells the address of memory block in cache memory. The correct word which needs to be transferred is decided by word bits.
  • Valid Bit: This bit is used to indicate whether the data stored in the cache is valid or not (1- valid, 0-invalid). We need this bit in our index so we can keep track of memory locations which contain valid data. If we need to clear the cache, we can just set valid bit of each entry to 0
  • Dirty Bit: This bit is important in write back policy. When some data is loaded into cache and it is updated then it is marked as dirty. So that when the CPU needs to replace (which happens frequently) the data , it can save the updated data into RAM before replacing it.
cache-index-structure

Now lets see how the whole process of reading data from cache takes place

  1. Cache index contains several pages depending on the mapping technique
  2. Each entry of index contains tag bits, valid bit and dirty bit
  3. Entry number represents the location of block in cache memory.
  4. Processor sends a physical address to cache of which the data is required.
  5. Physical address is broken into components in the following manner (Starting from lower bits)
    • Byte Select Bits = log(word_length)
    • Word Bits = log(number_of_words_per_block)
    • Line Bits = log(number_of_pages)
    • Tag Bits = remaining_bits_of_address
  6. Correct Index Page is found using line bits
  7. Each entry of the page is matched with tag bits
  8. Correct word is found by adding word bits to matched entry.
  9. Found word is transferred to processor if it is valid (valid=1)

This was pretty much about the terms we need to understand before moving on. Make sure you understand these concepts because in the next article, these terms will be repeating a lot.

Comments

Popular posts from this blog

Cordova viewport problem solved

Include the viewport settings in Cordova If you are facing the auto zooming problem of cordova then go read on the full article. Cordova actually ignores the viewport meta tag which causes the pixel density problem. So we need to tell cordova that viewport tag is equally important as other tags. To do this, we need to add some code to a file which is specify in the article. Corodva messes with pixels If you are using the latest cordova version or creating the cordova app for latest android versions then you may have faced the zoom malfunctioning.I also faced it when creating an app. Many of you may have already searched the web and found the answer of changing the meta tag attributes to get it working. But adding target-densitydpi=medium-dpi does not solve the problem for latest android versions. It may work for gingerbread but not for kitkat and others. So the final solution which i found was one of the stackexchange answer but rarely found. So i am gonna two things here, i ...

Understanding Python Decorators

If you have ever wondered what those @something mean above a python function or method then you are going to have your answers now. This @something line of code is actually called a decorator. I have red from various articles about them but some of them were not able to clarify the concept of a decorator and what we can achieve with them. So in this post we'll learn a lot about python decorators. Here is a list of topics we'll be covering. What is python decorator Understanding the concept Multiple decorators on same function class method decorator Where can we use decorators What is python decorator A python decorator is nothing but a function which accepts your given function as a parameter and returns a replacement function. So its like something this def decorator(your_func): def replacement(your_func_args): #do some other work return replacement @decorator your_func(your_func_args): #your_func code Now when your_func gets called then...

Image Search Engine Using Python

Images provide a lot more information than audio or text. Image processing is the prime field of research for robotics as well as search engines. In this article we will explore the concept of finding similarity between digital images using python. Then we will use our program to find top 10 search results inside a dataset of images for a given picture. It won't be as good as google's search engine because of the technique we will be using to find similarity between images. But what we are going to make will be pretty cool. So lets start. Setting up the Environment Our Algorithm How the code looks Lets build the GUI Additional Techniques Setting up the Environment The code we are going to write requires a few tools which we need to install first. I will try to be as precise as i can and if you get stuck into installing some tool then you can drop a comment below and i will help you sort out the problem. So here are the tools and the steps to install ...