Networking Basics

To build an ethernet network, you need to understand a few basic concepts. (Actually, if you are prepared to follow blindly the instructions given in these pages, you don't need to understand any of this, but you will get into difficulties when things go wrong.)

Networking hardware - the actual bits of wire - are divided into two main types, those suitable for local area networks (LANs) and those suitable for wide area networks (WANs). As the name implies, a LAN spans a small area, maybe a couple of kilometres wide. A WAN can span the world. WAN connections are often used to link distant LANs together. Fast LAN connections are cheap, fast WAN connections expensive, so WANs tend to be slower than LANs.

The world-wide network known as the Internet is a collection of LANs linked by WAN connections. The complexities of the internet are hidden by the TCP/IP protocol, which lets you address any computer by its IP address, regardless of how convoluted is the path between your computer and it.

TCP/IP is a networking protocol, a set of rules that two computers use to communicate with each other. Strictly speaking, IP and TCP are separate protocols, but they are closely associated. TCP depends upon IP, so you can have IP without TCP, but not TCP without IP.

Other protocols include NetBEUI, developed by Microsoft for Windows PCs and IPX/SPX, developed by Novell and used in NetWare networks.

The networking protocol and the networking hardware are independent of each other. A set of computers connected together on a network can communicate with each other using different protocols - some could be using NetBEUI while others were using TCP/IP. In fact, two computers could communicate simultaneously using different protocols for different purposes. Conversely, a network of equipment all using the same protocol can be built from several bits based on different hardware - ethernet, ADSL and so on.

Protocols can be used together. For example, a web server and a browser communicate via the HyperText Transport Protocol (HTTP), but this usually runs over TCP, which itself runs over IP. (When we say that one protocol "runs over" another, we mean that the data is encoded according to the one protocol, and then the result of that is encoded according to the next, and so on.)

The IP protocol

According to the Internet Protocol (IP) , each computer on a network has a unique address. This is expressed as four numbers, each 0-255, for example 212.250.9.243. If you know your binary numbers, you can see that this is just an easy way to represent a 32-bit number divided into four fields, eight bits each. (It will help if you understand binary numbers, but don't worry if you don't. Just skip the parts that mention them.)

Many devices on the network have IP addresses. A printer may have its own IP address, for example, so that it can be shared between a set of computers. A computer may have no IP address (in which case it's not on the Internet), one IP address or many.

Data travelling through the Internet is broken into pieces called packets and each packet contains the IP address of the computer that sent it (the source address) and the computer that it's sent to (the destination address). The sending computer breaks the stream of data into packets and the receiving computer reassembles the stream as it receives the packets.

The packets are delivered across the network using control devices called routers (see below). The sending computer only needs to know how to get the packet to its local router.

If two computers on the internet have the same IP address, this will usually cause serious problems - packets will be sent to one or the other unpredictably. There is a central authority responsible for doling out the addresses.

A number of address ranges are reserved for local networks, for example 192.168.x.x (192.168.0.0 to 192.168.127.127). The router at the edge of a local network should not let traffic in or out destined for one of these addresses. This allows you to assign addresses within your own local network without having to ask the central authority. There's millions of other computers around the world with the same address, but they are all isolated from each other.

If I want one of the computers at my house to communicate with one outside and get an answer back, these private local addresses will not do, because the routers around the world won't pass traffic for them. I need a unique public IP address, and I need to connect to a router which will cooperate by passing my traffic to other routers on the internet. For that, I need an Internet Service Provider (ISP). I use the cable TV company NTL. A fixed monthly fee buys me a public IP address for one of my PCs plus a connection to one of NTL's routers and from there onto the rest of the internet. My access is through a cable modem, which connects over the same cable that they use to deliver TV to my house

NTL could give each customer's PC a fixed (or static) IP address, but some of the time, some of those devices are turned off, and don't need an address. NTL supply a temporary (dynamic) IP address on demand taken from a small pool. These addresses are supplied on a lease of a few hours. At the end of the lease, the computer can ask for a lease extension, but it may get a different address.

So, with NTL's cable modem system, I get a public IP address for one computer which will work for the next few hours. To connect a local area network containing several computers, I need a gateway of some sort. More on that later.

Most people connect a single computer at their house to an ISP via a telephone modem. This provides a TCP/IP connection just like any other, usually with a dynamic IP address which lasts until the end of the phone call. There is nothing to stop you connecting a small network using a telephone modem. A cable modem is just faster.

One final thing about IP. Most of the current networking equipment implements IP version 4, with its 32-bit addresses. That only allows about 4,000,000,000 unique addresses, and the day is approaching when that won't be enough. IP version 6 has 128-bit addresses, which solves the problem, but to use it, you have to buy new equipment. Everybody isn't going to do that overnight, so IP V6 is being rolled out in the core parts of the internet first and will gradually spread outwards. Wherever a bit of equipment that complies with version 4 meets something that complies with version 6, various tricks are done to join them up. The cheap equipment that I describe in these pages runs IP version 4. One day it will become obsolete, but not for a while yet. You will get your money's worth out of it.

Subnets and the Subnet Mask

There is a lot of confusion about this matter, but really, it is very simple (although you do need to understand binary numbers).

A set of computers which are close to each other often need to communicate when they don't know each others' IP addresses. Various discovery protocols work by broadcasting packets to all local computers. Effectively, these say "This is my IP address. Is anybody there?". The others reply with their IP addresses, and then everybody can communicate.

You don't want these broadcast packets circulating round the entire internet, or it would get swamped, so broadcasts are only allowed locally, within a range of IP addresses. This is called a subnet. The subnet can be any address range, for example, the address range 192.168.0.0 to 192.168.0.15. A computer in a subnet needs to know its own IP address and the subnet address range.

The subnet address range is defined by the subnet mask. For a sixteen-address subnet, the mask is 255.255.255.240. This is clearer in binary:

11111111 11111111 11111111 11110000

The 1 bits at the top define those parts of the address which are fixed, the 0 bits at the bottom define those that can vary. This information, combined with any address in the subnet, gives you the list of addresses. If my computer knows its address and the subnet mask, it can figure out the subnet address range. For example, if my address is 192.168.0.4, then the addresses in the range are 192.168.0.x, where x is a four-bit number (0 to 15).

Exercise

What is the subnet mask for the address range 192.168.0.0 to 192.168.255.255?

What range does the mask 255.255.255.0 give?

Work these out before reading further.

To broadcast within the subnet, a computer just sends to the highest address (for the mask 255.255.255.240, that is 192.168.0.15). The network interface cards take care of the details.

This means that you can't assign that address to a computer. For historical reasons, you shouldn't assign the lowest address in the range either (in our case, 192.168.0.0) because that was used as the broadcast address a long time ago, and certain very old equipment may get confused. That leaves the range 192.168.0.1 to 192.168.0.14.

By convention, the first address in the range is used for the local gateway, the device which connects the subnet to the rest of the network.

To communicate, each of the computers in a subnet needs a unique IP address, and they all need to agree on the same subnet mask. To communicate with devices outside the subnet, each computer needs to know the address of the gateway.

In a large LAN, the network managers might divide the network into several subnets, maybe of different sizes (that is, with different subnet masks). If you are building your own small network, you can just use a single subnet, for example 192.168.0.x (192.168.0.0 to 192.168.0.255, subnet mask (255.255.255.0). All your computers can now broadcast to each other when they need to, and those broadcast packets are kept within your house, which is as it should be.

The Internet is composed of a set of subnets within subnets, each identified by a unique network number and a subnet mask. If you have an IP address and a subnet mask, the network number is the logical and of those two 32-bit numbers. Using the example above, if your IP address is 192.168.0.4 and your subnet mask is 255.255.255.240, the network number of your subnet is 192.168.0.0).

Many people find network numbers and subnet masks very mysterious, and it takes a bit of work to get your head around them. The subnet is defined that way to make it easier to build certain types of networking equipment, particularly routers (see next section). If the equipment is easy to build, then it's going to be cheap and run quickly.

Routers

To understand IP fully, you need to know how routers work.

The internet is a set of interconnected networks, controlled by routers. Each router has a number of network cables plugged into it and it mediates the traffic between them. It's called a router, because it knows which cable to send each packet down, ie it knows the next bit of the route through the network that the packet should take. Routers maintain lists of other routers that they know about, and exchange information with them. If a router knows the route to get to another router, it will share this information when asked. If it doesn't know, it will ask other routers for help, they may ask others, and so on.

A router may have more than one cable (or network interface) connected to it, ie it acts a gateway connecting many subnets together. Each of these interfaces has its own IP address so a computer can send data packets to it. To get a packet delivered to a computer in another subnet, send it to your local gateway router. That router is connected to a set of routers connected to other subnets, and so on.

Routers exchange routing information, so if one router doesn't know where to send a data packet, it can ask nearby routers if they know, and so on. (It only need to know the first bit of the route - "If I have a packet of data for this address, I send it to this router, and it knows where to send it next".) Once a router has figured out a route to a distant computer, it remembers it, so it doesn't go through that rigmarole for every packet.

If a part of the network goes out of action for a while, the routers around it try to figure out an alternative route, so the network is self-healing to an extent.

A router spends most of its time passing packets of data between its network interfaces, so when a packet arrives, it has to figure out what to do with it quickly. It maintains a routing table which is little more than a list of subnets (network numbers and subnet masks) and a note of which router deals with each subnet. For each incoming packet, the router takes the destination IP address and the subnet mask from the packet header, finds the network number, then looks down its routing table to see whether to deliver it locally or send it to another router.

The computers in your house or office get data packets into the internet by sending them to an Internet Service Provider (ISP). They have network connections to lots of other ISPs, all controlled by routers. This forms a complex net of cross-connections. The big backbone providers likw WorldCom/Pipex offer connections between countries, A local Internet Service Provider rents capacity from one or more backbone providers. You can think of your ISP as an organisation that buys network capacity wholesale from other ISPs and sells it retail to you.

You can get an insight into routing by running the tracert tool under MS Windows. On a PC that is connected to the internet, go to the start menu, choose run, and type the following into the resulting box:

	 tracert www.amazon.com

(On a Linux PC, use the traceroute command.)

This will produce a list of information showing the chain of routers that will be used to get a packet of data from your computer to Amazon's web server.

The TCP protocol

The IP protocol lets you send a packet of data from computer A to computer B across the network, but that's pretty much all it does. The Transmission Control Protocol (TCP) uses the facilities of IP to create and maintain a large stream of data between the two ends. The data is broken into packets (packetised, also called segmented) , and put back together (reassembled) at the other end. Packets may take different routes through the network. They may arrive out of order or get lost altogether. The TCP protocol specifies that the sending computer numbers the packets so that the receiving computer can put reassemble them correctly and request that any missing ones be resent. The protocol also specifies that the sender adds extra data which allows the receiver to detect if the data in the packet got scrambled. The receiving computer can then guarantee that the data has been transmitted faithfully. (Technically, it either makes that guarantee or announces that it has failed. The point is that it doesn't lose data and not tell you.)

TCP works using the facilities of IP. This is called a layered protocol, and the TCP part is said to run "on top of" the IP part. There are more layers. Underneath IP is the electrical part of the network. This can be ethernet, token ring, voice telephone lines, fibre optic, or whatever. If you connect to a distant computer, then it's highly likely that the connection will be made in stages, using different networking systems at each stage. IP conceals all these details from you, and lets you direct packets to an IP address.

There are layers on top of TCP as well. Many computers around the Internet provide specific services, using TCP connections to exchange data. For example, a computer acting as a web server uses the HTTP protocol over TCP over IP to deliver pages described using the HTML protocol.

One very important service that runs on top of TCP is the Domain Name Service DNS. A domain name server translate a name to an IP address and an IP address to a name. This allows you to address a computer by a name ("www.amazon.com" or whatever) rather than by its IP address. For example, a web browser issues HTTP requests containing the DNS name of a web server. Behind the scenes, the networking software on the client computer uses a DNS server to translate the web server's name into its IP address. It packages up the HTTP request into a one or more packets encoded according to the TCP protocol, encodes them according to the IP protocol, and sends them to that IP address. As already explained, the server sends each packet to its local gateway router, which knows which router is at the other end of the next leg of the journey, and it sends the packet on, and so on. TCP handles the integrity of the data as it is broken up into packets and IP takes care of getting each packet to their destination. The web server software (Apache, IIS or whatever) interprets the resulting HTTP request and sends a stream of HTML in response. These travel back to the client computer by a similar process. The browser software interprets the stream of HTML and displays it on the screen.

It's worth mentioning that when a computer sends a long stream of packets across the network using TCP/IP (for example when a web server sends a page with lots of graphics to a web browser), the receiver has to send packets back - acknowledgements, requests to resend, requests to slow down the rate, speed up the rate and so on. This housekeeping data is quite substantial. The designers of my cable modem assumed that I will mainly use it for web browsing, so at first sight, it only needs to handle a tiny amount of data out (a request for a web page) but a lot of data in (the resulting page), so the incoming channel can be much faster than the outgoing one. However, a stream of incoming packets generates so many housekeeping packets that the outgoing channel has to be about one quarter as fast as the incoming one. If the outgoing channel is too slow, housekeeping packets will be lost. The sending computer will then tend to send more data than the receiver can handle, causing resend requests, which waste network capacity.

The UDP Protocol

The Unified Datagram Protocol (UDP) is an alternative to TCP. It also runs over IP (so it should really be UDP/IP). It does a similar job to TCP, but it doesn't guarantee delivery. If packets go astray, they just get lost. Without the overhead of these checks, it is much faster.

UDP is used for applications such as video and audio, where the odd lost packet does not matter, but the stream has to keep up with a schedule. If you are listening to music online, you probably won't notice the glitch when the odd packet is lost, and you don't want the player to pause for a tenth of a second while it negotiates a resend.

Ethernet

Ethernet is a set of networking technologies. The most common are 10base-T and 100base-T. These run at 10 MHz and 100MHz respectively and use twisted pair cables. These are just pairs of copper wires twisted together, packaged inside a plastic sheath. Twisting a pair of wires makes them less susceptable to interference - The effect of the interference on a section of cable tends to be cancelled out by its effect on the next section, which is facing the opposite direction.

Category 5 cable can handle both speeds. It comes in either Shielded Twisted Pair (STP) cable (which has an earthed screen around the pairs to protect them further) or Unshielded Twisted Pair (UTP) cable, terminated at each end with RJ45 connectors. STP is better, UTP is cheaper and easier to get. There's two further options patch cable, which is more flexible and trunk cable, which is less so, and meant for permanent installations underneath the floor, or whatever. For home use, you will probably buy UTP Cat 5 patch cable. It's easily available and fairly cheap.

(Prices can vary by a factor of ten so it's worth shopping around.)

Cat 5 cable can also carry telephone signals, so it's often used in offices to carry network and phone traffic between desks. (The voltages are different, so don't get the cables mixed up. If you plug a 50 volt phone cable into an ethernet card, you are liable to wreck the card.)

To connect to an ethernet network, a PC needs an adapter card. Just about all the cards you can buy now support 10base-T and 100base-T and are auto-sensing, which means that they figure out what speed the equipment they are connected to will handle, and run at that.

A cat 5 cable connects two points. If you have just two computers, you can just connect them together. To create a bigger network, you need to link the connections together. The easy way to do this is to use a hub with one port per computer. The hub takes the signal on any wire and replicates it on the others - it's also called a repeater. That means that all the wires must all go at the same speed: 10 MHz if the slowest device goes at that speed.

The alternative is a switch. A switch acts like a hub, but it figures out the addresses of the computers attached to each port. It only sends traffic for those addresses to that port. Most switches can also handle multiple speeds - one port can run at 10 Mbps while another runs at 100 Mbps. If two computers are connected to fast ports, they can communicate at full speed. Switches used to be a bit too expensive for home use, but both switches and hubs are now very cheap.

When I built a network at home a few years ago I used a five-port hub. My 3COM CMX cable modem only has a 10base-T ethernet port on it, so I got a 10base-T hub, and everything ran at that speed. A few months ago, I ran out of ports on my hub and bought a sixteen-port dual-speed autosensing switch. This is complete overkill, but it was very cheap.

If the wire carries a 10MHz signal, then theoretically, it can carry 10 Mbits of data per second. In fact, protocol overheads reduce the speed to about 6 Mbit/sec, and that's only achievable if one device is sending a constant stream of a data to one other. If there's more than two devices connected to the hub, there's a high probability that two will try to send a packet at once. The resulting interference is called a collision. Ethernet does not handle collisons very well, and the resulting tizwaz reduces the speed of the data rate to about 3Mbit/sec. By restricting the traffic to the right wires, a switch reduces the number of collisions and localises the bad effects.

Multiply all those numbers by ten for a 100 base-T network.

At present, a 10 base-T hub is ample for a network of half a dozen PCs in a house. A PC just cannot stream out data quickly enough to swamp the network, regardless of the nominal speed of its interface card. In any case, and my cable modem will only carry 500 Kbps.

In the future, when we all run heavy video-based network applications, 10 Mbps may not be enough, but by that time, better home networking systems will probably be available. Meanwhile, you can build an effective LAN at home for less than 50 GBP.

For a bigger network, say in an office, you should definitely use a switch instead of a hub.

Another aspects of hub which is worth mentioning is that every computer on the network sees every packet, including all the ones that are not aimed at it. It is fairly easy to set a computer up as a "network sniffer" and spy on the passing traffic. Hackers catch passwords this way. If you are worried about your flat-mates monitoring your email, don't build a shared network with a hub. Use a switch.

As I explained elsewhere in these notes, ethernet and TCP/IP are not inextricably linked. A single ethernet network can handle different networking protocols at once, in fact there could be two connections open between two PCs running over the same wires, one connection using TCP/IP and the other using Novell's IPX. Furthermore, you can run a TCP/IP connection over a network which is not ethernet - the NTL side of my cable modem is not ethernet, for example. Ethernet and TCP/IP are just common networking standards which are often used together.

NAT routers

A Network Address Translation (NAT) router connects a subnetwork into a bigger network making it look as if all traffic from the internal network comes from a single IP address. This address can be dynamic (assigned through the Dynamic Host Control Protocol (DHCP) or static (fixed).

The local network usually uses addresses such as 192.168.0.x, which are reserved for private local use, as explained above. If a computer on the internal network with address 192.168.0.5 tries to connect to an external server, it sends the request packets to the NAT router, which is its gateway. Each packet contains the destination address and the sending address, so that the recipient knows where to send replies. The gateway passes the packets from one network interface to the other, but it changes the sending address so that the packet appears to come from it (using the address of the external interface), so that replies come back to it. When a reply arrives, the router passes it back to the address which is expecting it. It appears to all the servers on the main network that they are talking to a single computer when they are in fact talking to many.

It is not possible to initiate a connection from outside the local network, only reply to a request. This protects the computers inside from attack. Also, the gateway does not pass broadcast packets between the external network and the internal. This ensures that the computers inside can only offer services to each other. Computers outside the local network cannot even "see" the ones inside.

A NAT router can be used for many purposes. In particular, it can be used to extend a network connection that is intended to support one computer so that it supports many. For example, this is how we connect all the computers in a home network to the internet through a single phone line.

Firewalls

If you want to run a web server on your premises and you want people outside to access it, then a simple NAT router won't do. You need a set of static IP addresses (one per public server) and a device that gives limited access to them. This is the job of a firewall.

A firewall is a router (it has external and internal network interfaces and passes traffic between them). The firewall examines each incoming packet and only allows it through if it conforms to a set of rules which are preset by the network manager.

For example, inside the network are three server computers, one running database server software, one running web server software and the third running mail server software. The firewall only allows incoming requests addressed to the web server and the mail server. You can't reach the database server at all from outside. The firewall also ensures that you can only send web (HTTP) requests to the web server, not any other sort of request, and you can only send mail (SMTP) requests to the mail server. Under this scheme, the web server can get information from the database server, but computers outside cannot. The servers are said to be "behind the firewall".

The web server can create pages to order using data from the database. This gives restricted access to the data from outside, mediated by the web server.

The computer running the web server software may be running other server software too, and the network manager may not even know they are there. These may contain bugs that a hacker could exploit. Only allowing HTTP requests through the firewall to the web server prevents this.

If we have two computers offering public services and a firewall, then we need to buy three public static IP addresses from our ISP. We need to buy a domain and we need to assign names to our addresses: www.mycompany.com, mail.mycompany.com, gateway.mycompany.com or whatever.

We could run the mail server software and the web server software on the same computer. If so, we could assign two public IP addresses to the one computer, or use just one address (in which case www.mycompany.com and mail.mycompany.com will have to refer to the same address). It's best to use separate equipment for each service, to keep any problems localised. Using other networking tricks you can also use many computers to respond to the same name, so that if one fails, you still offer a reduced service.

The firewall may also control traffic from within the company to servers outside, in fact, it is usually the only route for data in and out. It may also offer a DHCP and NAT routing service to client PCs within the company. In that case, all access to external servers from within that company will appear to come from the same address - that of the firewall. This makes analysis of access logs something of an inexact science.

Any computer which is publicly accessible across the internet should either be protected by a firewall or it should itself be a firewall - ie provide the protection itself. This applies to your PC at home. If it is directly connected to the internet by any means, you should run some "personal firewall" software on it.