Friday, March 28, 2008

How File Compression Works

If you download many programs and files off the Internet, you've probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users, because it lets you reduce the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk. Once you download the file, your computer uses a program such as WinZip or to expand the file back to its original size. If everything works correctly, the expanded file is identical to the original file before it was compressed.

At first glance, this seems very mysterious. How can you reduce the number of bits and bytes and then add those exact bits and bytes back later? As it turns out, the basic idea behind the process is fairly straightforward. In this article, we'll examine this simple method as we take a very small file through the basic process of compression.

Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.

As an example, let's look at a type of information we're all familiar with: words.

In John F. Kennedy's 1961 inaugural address, he delivered this famous line:

"Ask not what your country can do for you -- ask what you can do for your country."

The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If each letter, space or punctuation mark takes up one unit of memory, we get a total file size of 79 units. To get the file size down, we need to look for redundancies.

Immediately, we notice that:

  • "ask" appears two times
  • "what" appears two times
  • "your" appears two times
  • "country" appears two times
  • "can" appears two times
  • "do" appears two times
  • "for" appears two times
  • "you" appears two times

Ignoring the difference between capital and lower-case letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote. To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.

We'll look at how file-compression systems deal with redundancy in more detail


Redundancy and Algorithms

Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data.

The system for arranging dictionaries varies, but it could be as simple as a numbered list. When we go through Kennedy's famous words, we pick out the words that are repeated and put them into the numbered index. Then, we simply write the number instead of writing out the whole word.

So, if this is our dictionary:

  1. ask
  2. what
  3. your
  4. country
  5. can
  6. do
  7. for
  8. you

Our sentence now reads:

"1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4"

If you knew the system, you could easily reconstruct the original phrase using only this dictionary and number pattern. This is what the expansion program on your >computer does when it expands a downloaded file. You might also have encountered compressed files that open themselves up. To create this sort of file, the programmer includes a simple expansion program with the compressed file. It automatically reconstructs the original file once it's downloaded.

But how much space have we actually saved with this system? "1 not 2 3 4 5 6 7 8 -- 1 2 8 5 6 7 3 4" is certainly shorter than "Ask not what your country can do for you; ask what you can do for your country;" but keep in mind that we need to save the dictionary itself along with the file.

In an actual compression scheme, figuring out the various file requirements would be fairly complicated; but for our purposes, let's go back to the idea that every character and every space takes up one unit of memory. We already saw that the full phrase takes up 79 units. Our compressed sentence (including spaces) takes up 37 units, and the dictionary (words and numbers) also takes up 37 units. This gives us a file size of 74, so we haven't reduced the file size by very much.

But this is only one sentence! You can imagine that if the compression program worked through the rest of Kennedy's speech, it would find these words and others repeated many more times. And, as we'll see in the next section, it would also be rewriting the dictionary to get the most efficient organization possible

Searching for Patterns

In our previous example, we picked out all the repeated words and put those in a dictionary. To us, this is the most obvious way to write a dictionary. But a compression program sees it quite differently: It doesn't have any concept of separate words -- it only looks for patterns. And in order to reduce the file size as much as possible, it carefully selects which patterns to include in the dictionary.

If we approach the phrase from this perspective, we end up with a completely different dictionary.

If the compression program scanned Kennedy's phrase, the first redundancy it would come across would be only a couple of letters long. In "ask not what your," there is a repeated pattern of the letter "t" followed by a space -- in "not" and "what." If the compression program wrote this to the dictionary, it could write a "1" every time a "t" were followed by a space. But in this short phrase, this pattern doesn't occur enough to make it a worthwhile entry, so the program would eventually overwrite it.

The next thing the program might notice is "ou," which appears in both "your" and "country." If this were a longer document, writing this pattern to the dictionary could save a lot of space -- "ou" is a fairly common combination in the English language. But as the compression program worked through this sentence, it would quickly discover a better choice for a dictionary entry: Not only is "ou" repeated, but the entire words "your" and "country" are both repeated, and they are actually repeated together, as the phrase "your country." In this case, the program would overwrite the dictionary entry for "ou" with the entry for "your country."

The phrase "can do for" is also repeated, one time followed by "your" and one time followed by "you," giving us a repeated pattern of "can do for you." This lets us write 15 characters (including spaces) with one number value, while "your country" only lets us write 13 characters (with spaces) with one number value, so the program would overwrite the "your country" entry as just "r country," and then write a separate entry for "can do for you." The program proceeds in this way, picking up all repeated bits of information and then calculating which patterns it should write to the dictionary. This ability to rewrite the dictionary is the "adaptive" part of LZ adaptive dictionary-based algorithm. The way a program actually does this is fairly complicated, as you can see by the discussions on Data-Compression.com.

No matter what specific method you use, this in-depth searching system lets you compress the file much more efficiently than you could by just picking out words. Using the patterns we picked out above, and adding "__" for spaces, we come up with this larger dictionary:

  1. ask__
  2. what__
  3. you
  4. r__country
  5. __can__do__for__you

And this smaller sentence:

"1not__2345__--__12354"

The sentence now takes up 18 units of memory, and our dictionary takes up 41 units. So we've compressed the total file size from 79 units to 59 units! This is just one way of compressing the phrase, and not necessarily the most efficient one. (See if you can find a better way!)

So how good is this system? The file-reduction ratio depends on a number of factors, including file type, file size and compression scheme.

In most languages of the world, certain letters and words often appear together in the same pattern. Because of this high rate of redundancy, text files compress very well. A reduction of 50 percent or more is typical for a good-sized text file. Most programming languages are also very redundant because they use a relatively small collection of commands, which frequently go together in a set pattern. Files that include a lot of unique information, such as graphics or MP3 files, cannot be compressed much with this system because they don't repeat many patterns (more on this in the next section).

If a file has a lot of repeated patterns, the rate of reduction typically increases with file size. You can see this just by looking at our example -- if we had more of Kennedy's speech, we would be able to refer to the patterns in our dictionary more often, and so get more out of each entry's file space. Also, more pervasive patterns might emerge in the longer work, allowing us to create a more efficient dictionary.

This efficiency also depends on the specific algorithm used by the compression program. Some programs are particularly suited to picking up patterns in certain types of files, and so may compress them more succinctly. Others have dictionaries within dictionaries, which might compress efficiently for larger files but not for smaller ones. While all compression programs of this sort work with the same basic idea, there is actually a good deal of variation in the manner of execution. Programmers are always trying to build a better system.

Lossy and Lossless Compression

The type of compression we've been discussing here is called lossless compression, because it lets you recreate the original file exactly. All lossless compression is based on the idea of breaking a file into a "smaller" form for transmission or storage and then putting it back together on the other end so it can be used again.

Lossy compression works very differently. These programs simply eliminate "unnecessary" bits of information, tailoring the file so that it is smaller. This type of compression is used a lot for reducing the file size of bitmap pictures, which tend to be fairly bulky. To see how this works, let's consider how your computer might compress a >scanned photograph.

A lossless compression program can't do much with this type of file. While large parts of the picture may look the same -- the whole sky is blue, for example -- most of the individual pixels are a little bit different. To make this picture smaller without compromising the resolution, you have to change the color value for certain pixels. If the picture had a lot of blue sky, the program would pick one color of blue that could be used for every pixel. Then, the program rewrites the file so that the value for every sky pixel refers back to this information. If the compression scheme works well, you won't notice the change, but the file size will be significantly reduced.

Of course, with lossy compression, you can't get the original file back after it has been compressed. You're stuck with the compression program's reinterpretation of the original. For this reason, you can't use this sort of compression for anything that needs to be reproduced exactly, including software applications, databases and presidential inauguration speeches.



Saturday, March 22, 2008

How Hackers Work



Thanks to the media, the word "hacker" has gotten a bad reputation. The word summons up thoughts of malicious computer users finding new ways to harass people, defraud corporations, steal information and maybe even destroy the economy or start a war by infiltrating military computer systems. While there's no denying that there are hackers out there with bad intentions, they make up only a small percentage of the hacker community.

The term computer hacker first showed up in the mid-1960s. A hacker was a programmer -- someone who hacked out computer code. Hackers were visionaries who could see new ways to use computers, creating programs that no one else could conceive. They were the pioneers of the computer industry, building everything from small applications to operating systems. In this sense, people like Bill Gates, Steve Jobs and Steve Wozniak were all hackers -- they saw the potential of what computers could do and created ways to achieve that potential.

A unifying trait among these hackers was a strong sense of curiosity, sometimes bordering on obsession. These hackers prided themselves on not only their ability to create new programs, but also to learn how other programs and systems worked. When a program had a bug -- a section of bad code that prevented the program from working properly -- hackers would often create and distribute small sections of code called patches to fix the problem. Some managed to land a job that leveraged their skills, getting paid for what they'd happily do for free.

As computers evolved, computer engineers began to network individual machines together into a system. Soon, the term hacker had a new meaning -- a person using computers to explore a network to which he or she didn't belong. Usually hackers didn't have any malicious intent. They just wanted to know how computer networks worked and saw any barrier between them and that knowledge as a challenge.

*german hackers have hacked into european passport system and stolen biometric information

In fact, that's still the case today. While there are plenty of stories about malicious hackers sabotaging computer systems, infiltrating networks and spreading computer viruses, most hackers are just curious -- they want to know all the intricacies of the computer world. Some use their knowledge to help corporations and governments construct better security measures. Others might use their skills for more unethical endeavors.

In this article, we'll explore common techniques hackers use to infiltrate systems. We'll examine hacker culture and the various kinds of hackers as well as learn about famous hackers, some of whom have run afoul of the law.

Hackers And Crackers
Many computer programmers insist that the word "hacker" applies only to law-abiding enthusiasts who help create programs and applications or improve computer security. Anyone using his or her skills maliciously isn't a hacker at all, but a cracker.

Crackers infiltrate systems and cause mischief, or worse. Unfortunately, most people outside the hacker community use the word as a negative term because they don't understand the distinction between hackers and crackers.

The Hacker Tool Box

The main resource hackers rely upon, apart from their own ingenuity, is code. While there is a large community of hackers on the, only a relatively small number of hackers actually program code. Many hackers seek out and download code written by other people. There are thousands of different programs hackers use to explore computers and networks. These programs give hackers a lot of power over innocent users and organizations -- once a skilled hacker knows how a system works, he can design programs that exploit it.

Malicous hackers use programs to

  • Hack passwords: There are many ways to hack someone's password, from educated guesses to simple algorithms that generate combinations of letters, numbers and symbols. The trial and error method of hacking passwords is called a brute force attack, meaning the hacker tries to generate every possible combination to gain access. Another way to hack passwords is to use a dictionary attack, a program that inserts common words into password fields.
  • Infect a computer or system with a >virus: Computer viruses are programs designed to duplicate themselves and cause problems ranging from crashing a computer to wiping out everything on a system's hard drive. A hacker might install a virus by infiltrating a system, but it's much more common for hackers to create simple viruses and send them out to potential victims via email, instant messages, Web sites with downloadable content or peer-to-peer networks.
  • Log keystrokes: Some programs allow hackers to review every keystroke a computer user makes. Once installed on a victim's computer, the programs record each keystroke, giving the hacker everything he needs to infiltrate a system or even steal someone's identity.
  • Gain backdoor access: Similar to hacking passwords, some hackers create programs that search for unprotected pathways into network systems and computers. In the early days of the Internet, many computer systems had limited security, making it possible for a hacker to find a pathway into the system without a username or password. Another way a hacker might gain backdoor access is to infect a computer or system with a Trojan horse.
  • Create zombie computers: A zombie computer, or bot, is a computer that a hacker can use to send spam or commit Distributed Denial of Service (DDoS) attacks. After a victim executes seemingly innocent code, a connection opens between his computer and the hacker's system. The hacker can secretly control the victim's computer, using it to commit crimes or spread spam
  • Spy on : Hackers have created code that lets them intercept and read e-mail messages the Internet's equivalent to wiretapping. Today, most e-mail programs use encryption formulas so complex that even if a hacker intercepts the message, he won't be able to read it.
Hacker Hierarchy
Psychologist Marc Rogers says there are several subgroups of hackers -- newbies, cyberpunks, coders and cyber terrorists. Newbies are hackers who have access to hacking tools but aren't really aware of how computers and programs work. Cyberpunks are savvier and are less likely to get caught than a newbie while hacking a system, but they have a tendency to boast about their accomplishments. Coders write the programs other hackers use to infiltrate and navigate computer systems. A cyber terrorist is a professional hacker who infiltrates systems for profit -- he might sabotage a company or raid a corporation's databases for proprietary information

Famous Hackers

Steve Jobs and Steve Wozniak, founders of Apple Computers, are both hackers. Some of their early exploits even resemble the questionable activities of some malicious hackers. However, both Jobs and Wozniak outgrew their malicious behavior and began concentrating on creating computer hardware and software. Their efforts helped usher in the age of the >personal computer -- before Apple, computer systems remained the property of large corporations, too expensive and cumbersome for average consumers.
Linus Torvalds, creator of >Linux, is another famous honest hacker. His open source>operating system is very popular with other hackers. He has helped promote the concept of open source software, showing that when you open information up to everyone, you can reap amazing benefits.
Richard Stallman, also known as "rms," founded the GNU Project, a free operating system. He promotes the concept of free software and computer access. He works with organizations like the Free Software Foundation and opposes policies like Digital Rights Management.
On the other end of the spectrum are the black hats of the hacking world. At the age of 16, Jonathan James became the first juvenile hacker to get sent to prison. He committed computer intrusions on some very high-profile victims, including NASA and a Defense Threat Reduction Agency server. Online, Jonathan used the nickname (called a handle) "c0mrade." Originally sentenced to house arrest, James was sent to prison when he violated parole.


Kevin Mitnick gained notoriety in the 1980s as a hacker who allegedly broke into the North American Aerospace Defense Command (NORAD) when he was 17 years old. Mitnick's reputation seemed to grow with every retelling of his exploits, eventually leading to the rumor that Mitnick had made the FBI's Most Wanted list. In reality, Mitnick was arrested several times for hacking into secure systems, usually to gain access to powerful computer software.

Kevin Poulsen, or Dark Dante, specialized in hacking phone systems. He's famous for hacking the phones of a radio station called KIIS-FM. Poulsen's hack allowed only calls originating from his house to make it through to the station, allowing him to win in various radio contests. Since then, he has turned over a new leaf, and now he's famous for being a senior editor at Wired magazine.

Adrian Lamo hacked into computer systems using computers at libraries and Internet cafes. He would explore high-profile systems for security flaws, exploit the flaws to hack into the system, and then send a message to the corresponding company, letting them know about the security flaw. Unfortunately for Lamo, he was doing this on his own time rather than as a paid consultant -- his activities were illegal. He also snooped around a lot, reading sensitive information and giving himself access to confidential material. He was caught after breaking into the computer system belonging to the New York Times.

It's likely that there are thousands of hackers active online today, but an accurate count is impossible. Many hackers don't really know what they are doing -- they're just using dangerous tools they don't completely understand. Others know what they're doing so well that they can slip in and out of systems without anyone ever knowin.

What is year 2038 problem?

Year 2000 problem is understood by most people these days because of the large amount of media attention it received. Most programs written in the "C programming language are relatively immune to the Y2K problem, but suffer instead from the Year 2038 problem. This problem arises because most C programs use a library of routines called the standard time library . This library establishes a standard 4-byte format for the storage of time values, and also provides a number of functions for converting, displaying and calculating time values. The standard 4-byte format assumes that the beginning of is January 1, 1970, at 12:00:00 a.m. This value is 0. Any time/date value is expressed as the number of seconds following that zero value. So the value 919642718 is 919,642,718 seconds past 12:00:00 a.m. on January 1, 1970, which is Sunday, February 21, 1999, at 16:18:38 (U.S.). This is a convenient format because if you subtract any two values, what you get is a number of seconds that is the time difference between them. Then you can use other functions in the library to determine how many minutes/hours/days/months/years have passed between the two times. you know that a signed 4-byte integer has a maximum value of 2,147,483,647, and this is where the Year 2038 problem comes from. The before it rolls over to a negative (and invalid) value is 2,147,483,647, which translates into January 19, 2038. On this date, any C programs that use the standard time library will start to have problems with date calculations.This problem is somewhat easier to fix than the Y2K problem on mainframes, fortunately. Well-written programs can simply be recompiled with a new version of the library that uses, for example, 8-byte values for the storage format. This is possible because the library encapsulates the whole time activity with its own time types and functions (unlike most mainframe programs, which did not standardize their date formats or calculations). So the Year 2038 problem should not be nearly as hard to fix as the Y2K problem was.

An alert reader was kind enough to point out that IBM PC hardware suffers from the Year 2116 problem. For a PC, the beginning of time starts at January 1, 1980, and increments by seconds in an unsigned 32-bit integer in a manner similar to UNIX time. By 2116, the integer overflows.

Windows NT uses a 64-bit integer to track time. However, it uses 100 nanoseconds as its increment and the beginning of time is January 1, 1601, so NT suffers from the Year 2184 problem.

Apple states that the Mac is okay out to the year 29,940!

Friday, March 21, 2008

how caching works

If you have been shopping for a computer, then you have heard the word "cache." Modern computers have both L1 and L2 caches, and many now also have L3 cache. You may also have gotten advice on the topic from well-meaning friends, perhaps something like "Don't buy that Celeron chip, it doesn't have any cache in it!"

It turns out that caching is an important computer-science process that appears on every computer in a variety of forms. There are memory caches, hardware and software disk caches, page caches and more. virtual memory is even a form of caching. In this post, we will explore caching so you can understand why it is so important.


A Simple Example: Before Cache

Caching is a technology based on the memory subsystem of your computer. The main purpose of a cache is to accelerate your computer while keeping the price of the computer low. Caching allows you to do your computer tasks more rapidly.

To understand the basic idea behind a cache system, let's start with a super-simple example that uses a librarian to demonstrate caching concepts. Let's imagine a librarian behind his desk. He is there to give you the books you ask for. For the sake of simplicity, let's say you can't get the books yourself -- you have to ask the librarian for any book you want to read, and he fetches it for you from a set of stacks in a storeroom (the library of congress in Washington, D.C., is set up this way). First, let's start with a librarian without cache.

The first customer arrives. He asks for the book Harry potter. The librarian goes into the storeroom, gets the book, returns to the counter and gives the book to the customer. Later, the client comes back to return the book. The librarian takes the book and returns it to the storeroom. He then returns to his counter waiting for another customer. Let's say the next customer asks for Harry Potter (you saw it coming...). The librarian then has to return to the storeroom to get the book he recently handled and give it to the client. Under this model, the librarian has to make a complete round trip to fetch every book -- even very popular ones that are requested frequently. Is there a way to improve the performance of the librarian?

Yes, there's a way -- we can put a cache on the librarian. In the next section, we'll look at this same example but this time, the librarian will use a caching system.


A Simple Example: After Cache

Let's give the librarian a backpack into which he will be able to store 10 books (in computer terms, the librarian now has a 10-book cache). In this backpack, he will put the books the clients return to him, up to a maximum of 10. Let's use the prior example, but now with our new-and-improved caching librarian.

The day starts. The backpack of the librarian is empty. Our first client arrives and asks for Harry Potter. No magic here -- the librarian has to go to the storeroom to get the book. He gives it to the client. Later, the client returns and gives the book back to the librarian. Instead of returning to the storeroom to return the book, the librarian puts the book in his backpack and stands there (he checks first to see if the bag is full -- more on that later). Another client arrives and asks for Harry Potter. Before going to the storeroom, the librarian checks to see if this title is in his backpack. He finds it! All he has to do is take the book from the backpack and give it to the client. There's no journey into the storeroom, so the client is served more efficiently.

What if the client asked for a title not in the cache (the backpack)? In this case, the librarian is less efficient with a cache than without one, because the librarian takes the time to look for the book in his backpack first. One of the challenges of cache design is to minimize the impact of cache searches, and modern hardware has reduced this time delay to practically zero. Even in our simple librarian example, the latency time (the waiting time) of searching the cache is so small compared to the time to walk back to the storeroom that it is irrelevant. The cache is small (10 books), and the time it takes to notice a miss is only a tiny fraction of the time that a journey to the storeroom takes.

From this example you can see several important facts about caching:

  • Cache technology is the use of a faster but smaller memory type to accelerate a slower but larger memory type.

  • When using a cache, you must check the cache to see if an item is in there. If it is there, it's called a cache hit. If not, it is called a cache miss and the computer must wait for a round trip from the larger, slower memory area.

  • A cache has some maximum size that is much smaller than the larger storage area.

  • It is possible to have multiple layers of cache. With our librarian example, the smaller but faster memory type is the backpack, and the storeroom represents the larger and slower memory type. This is a one-level cache. There might be another layer of cache consisting of a shelf that can hold 100 books behind the counter. The librarian can check the backpack, then the shelf and then the storeroom. This would be a two-level cache.

Computer Caches

A computer is a machine in which we measure time in very small increments. When the microprocessor accesses the main memory (RAM), it does it in about 60 nanoseconds (60 billionths of a second). That's pretty fast, but it is much slower than the typical microprocessor. Microprocessors can have cycle times as short as 2 nanoseconds, so to a microprocessor 60 nanoseconds seems like an eternity.

What if we build a special memory bank in the motherboard, small but very fast (around 30 nanoseconds)? That's already two times faster than the main memory access. That's called a level 2 cache or an L2 cache. What if we build an even smaller but faster memory system directly into the microprocessor's chip? That way, this memory will be accessed at the speed of the microprocessor and not the speed of the memory bus. That's an L1 cache, which on a 233-megahertz (MHz) Pentium is 3.5 times faster than the L2 cache, which is two times faster than the access to main memory.

Some microprocessors have two levels of cache built right into the chip. In this case, the motherboard cache -- the cache that exists between the microprocessor and main system memory -- becomes level 3, or L3 cache.

There are a lot of subsystems in a computer; you can put cache between many of them to improve performance. Here's an example. We have the microprocessor (the fastest thing in the computer). Then there's the L1 cache that caches the L2 cache that caches the main memory which can be used (and is often used) as a cache for even slower peripherals like hard disks and CD-ROMs. The hard disks are also used to cache an even slower medium -- your Internet connection.

aching Subsystems

Your Internet connection is the slowest link in your computer. So your browser (Internet Explorer, Netscape, Opera, etc.) uses the hard disk to store HTML pages, putting them into a special folder on your disk. The first time you ask for an HTML page, your browser renders it and a copy of it is also stored on your disk. The next time you request access to this page, your browser checks if the date of the file on the Internet is newer than the one cached. If the date is the same, your browser uses the one on your hard disk instead of downloading it from Internet. In this case, the smaller but faster memory system is your hard disk and the larger and slower one is the Internet.

Cache can also be built directly on peripherals. Modern hard disks come with fast memory, around 512 kb, hardwired to the hard disk. The computer doesn't directly use this memory -- the hard-disk controller does. For the computer, these memory chips are the disk itself. When the computer asks for data from the hard disk, the hard-disk controller checks into this memory before moving the mechanical parts of the hard disk (which is very slow compared to memory). If it finds the data that the computer asked for in the cache, it will return the data stored in the cache without actually accessing data on the disk itself, saving a lot of time.

Here's an experiment you can try. Your computer caches your floppy drive with main memory, and you can actually see it happening. Access a large file from your floppy -- for example, open a 300-kilobyte text file in a text editor. The first time, you will see the light on your floppy turning on, and you will wait. The floppy disk is extremely slow, so it will take 20 seconds to load the file. Now, close the editor and open the same file again. The second time (don't wait 30 minutes or do a lot of disk access between the two tries) you won't see the light turning on, and you won't wait. The operating system checked into its memory cache for the floppy disk and found what it was looking for. So instead of waiting 20 seconds, the data was found in a memory subsystem much faster than when you first tried it (one access to the floppy disk takes 120 milliseconds, while one access to the main memory takes around 60 nanoseconds -- that's a lot faster). You could have run the same test on your hard disk, but it's more evident on the floppy drive because it's so slow.

To give you the big picture of it all, here's a list of a normal caching system:

  • L1 cache - Memory accesses at full microprocessor speed (10 nanoseconds, 4 kilobytes to 16 kilobytes in size)
  • L2 cache - Memory access of type SRAM (around 20 to 30 nanoseconds, 128 kilobytes to 512 kilobytes in size)
  • Main memory - Memory access of type RAM (around 60 nanoseconds, 32 MB to 128 megabytes in size)
  • Hard disk - Mechanical, slow (around 12 milliseconds, 1 GB to 10 gigabytes in size)
  • Internet - Incredibly slow (between 1 second and 3 days, unlimited size)
As you can see, the L1 cache caches the L2 cache, which caches the main memory, which can be used to cache the disk subsystems, and so on.

Cache Technology

One common question asked at this point is, "Why not make all of the computer's memory run at the same speed as the L1 cache, so no caching would be required?" That would work, but it would be incredibly expensive. The idea behind caching is to use a small amount of expensive memory to speed up a large amount of slower, less-expensive memory.

In designing a computer, the goal is to allow the microprocessor to run at its full speed as inexpensively as possible. A 500-MHz chip goes through 500 million cycles in one second (one cycle every two nanoseconds). Without L1 and L2 caches, an access to the main memory takes 60 nanoseconds, or about 30 wasted cycles accessing memory.

When you think about it, it is kind of incredible that such relatively tiny amounts of memory can maximize the use of much larger amounts of memory. Think about a 256-kilobyte L2 cache that caches 64 megabytes of RAM. In this case, 256,000 bytes efficiently caches 64,000,000 bytes. Why does that work?

In computer science, we have a theoretical concept called locality of reference. It means that in a fairly large program, only small portions are ever used at any one time. As strange as it may seem, locality of reference works for the huge majority of programs. Even if the executable is 10 megabytes in size, only a handful of bytes from that program are in use at any one time, and their rate of repetition is very high.

how to convert FAT into NTFS

  1. To convert a FAT partition to NTFS, perform the following steps.
  2. Click Start, click Programs, and then click Command Prompt.
  3. In Windows XP, click Start, click Run, type cmd and then click OK.
  4. At the command prompt, type CONVERT [driveletter]: /FS:NTFS.
  5. Convert.exe will attempt to convert the partition to NTFS.

NOTE: Although the chance of corruption or data loss during the conversion from FAT to NTFS is minimal, it is best to perform a full backup of the data on the drive that it is to be converted prior to executing the convert command. It is also recommended to verify the integrity of the backup before proceeding, as well as to run RDISK and update the emergency repair disk (ERD).