Archives for posts with tag: ping

IPv4 MTU issues can be hard to spot initially, there is a solution and its called Path MTU Discovery (RFC1191). The RFC describes it as the following “a technique for using the Don’t Fragment (DF) bit in the IP header to dynamically discover the PMTU of a path”

Further to that the RFC states “The basic idea is that a source host initially assumes that the PMTU of a path is the (known) MTU of its first hop, and sends all datagrams on that path with the DF bit set. If any of the datagrams are too large to be forwarded without fragmentation by some router along the path, that router will discard them and return ICMP Destination Unreachable messages with a code meaning “fragmentation needed and DF set” (Type 3, code 4)

The unfortunate issue is that the message that’s sent back doesn’t actually say what the MTU is.

A colleague of mines who is a Windows 7 expert, has reliably informed me that by default Windows 7 has PMTUD enabled.

The important point to focus on is the ICMP unreachable (Type 3, code 4). To put this quite simply, if you don’t receive an ICMP message back with the code for fragmentation needed then, your PC will assume that the MTU is fine and continue to send the packets even though somewhere in the path the packets are potentially being dropped.

There can be a number of reasons for this, including firewalls blocking the message, ICMP unreachable disabled on an interface, a transparent host between 2 endpoints (Often done in service provider networks) that has a lower MTU value.

I recently ran into an issue where IP connectivity between 2 sites looked to be fine, ping, traceroute and SSH were all working, but certain applications and protocols were not, most notably HTTPS.

Below I will explain how to spot this issue.

Take a look at the diagram below, i have deliberately used a transparent device as its most likely what you might see in a L3VPN (MPLS) network. The last mile provider provides a layer 2 path (perhaps a L2TPv3) from CE to PE and the underlying hops are hidden from us.  From the service provider perspective the routers are directly connected.

This is perhaps where an MTU issue has occurred. For this scenario I have reduced it quite significantly for effect.

Capture3

Lets say for example you have a perfectly functioning network where MTU is fine along the path. Initially you can send a ping with 1460bytes and you will get a reply. Lets increase this to something we know is to big (1550bytes). This works great in a perfectly functioning network where you receive an ICMP type 3, you will get the “packet needs to be fragmented but DF set” message.

Capture2

Now lets try that through our network where the MTU is set lower but the sending device doesn’t know about it.

Capture4

At first you think its OK because you can ping along the path and get a reply, you try SSH and it works too. Now lets try to ping with different MTU sizes. Remember your PC doesn’t receive the ICMP message this time, so what happens is you get a “request timed out” message.

Capture5

The reason for that is the packet is being dropped and the ICMP message isn’t being returned. If I ping with an MTU that is lower than the 1000 i get a reply.

Capture6

Now the question, why would HTTPS not work? well in some cases web applications or your client might set the Do Not Fragement bit in the IP header SYN request. This means the packet should not be fragmented, so when we send this on our network with the bad MTU in the path, the packet is dropped and the sending device never receives the ICMP message. It never knows that it has to reduce the MTU value. The packet capture below shows where the DF bit is set.

Capture7

I had a look through the RFC2246 for TLS1.0 and it doesn’t specify that the DF bit should be set. It’s most likely a vendor or O/S specific setting, so your observed results may differ from vendor to vendor.

RH

This lab is setup for Multicast Source Discovery Protocol (MSDP) I will also be applying Source Active (SA) filtering on our Rendezvous Points (RP). On this occasion I will use Auto-RP to discover RP on each domain then use MSDP to enable us to share SA messages about our multicast sources. The SA filtering will be applied to R2 and R3. When using MSDP setup a multicast boundary, imagine a line between domains where you create an access-list which permits or allows any multicast address which you specify.

The reason I am using Auto-RP is to show the significance of using the multicast boundary, we will block the Auto-RP multicast address of 224.0.1.39 and 224.0.1.40 this will guarantee the boundaries between our domains, but not limited to only that purpose.

The main commands you will need are “ip msdp peer [ip address] connect-source [interface]” for SA filtering use the command “ip msdp sa-filter in | out [peer ip] list [access-list]” for the multicast boundary under interface use the command “ip multicast boundary [access-list] in | out filter-autorp” My previous blog “Multicast Notes” will provide a little more detail, and also some examples of access-lists.

MSDP Topology

MSDP Topology

 

Attached are the config’s for each router (Router Configs). As I’m using GNS3 I had to enter commands “no ip route-cache” and “no ip mroute-cache” on each PIM interface. Also add “no ip cef”. You won’t need this for physical routers.

Verify our msdp peers with the command “show ip msdp summary” from R3 shows we have 2 MSDP peers, both show as UP. All is looking good at this point.

R3#sh ip msdp summary
MSDP Peer Status Summary
Peer Address     AS   State   Uptime/  Reset    SA   Peer Name
Downtime Count  Count
10.255.255.2     2     Up         00:09:29      0          0            ?
10.255.255.6     6     Up         00:10:19      0          0            ?

We can also use the command “show ip msdp peer 10.255.255.2” from R3 to show us is if any SA filtering is being applied and also how many SA messages received from the peer. In our example we have received 1 SA message from our peer 10.255.255.2 highlighted in red, and we are filtering inbound SA messages, also highlighted in red.

MSDP Peer 10.255.255.2 (?), AS 2
Connection status:
State: Up, Resets: 0, Connection source: Loopback1 (10.255.255.3)
Uptime(Downtime): 00:17:41, Messages sent/received: 18/19
Output messages discarded: 0
Connection and counters cleared 00:19:31 ago
SA Filtering:
Input (S,G) filter: 101, route-map: none
Input RP filter: none, route-map: none
Output (S,G) filter: none, route-map: none
Output RP filter: none, route-map: none
SA-Requests:
Input filter: none
Peer ttl threshold: 0
SAs learned from this peer: 1
Input queue size: 0, Output queue size: 0
Message counters:
RPF Failure count: 0
SA Messages in/out: 1/0
SA Requests in: 0
SA Responses out: 0
Data Packets in/out: 0/0

To display any SA messages the router has received we use the command “show ip msdp sa-cache” this will display the source tree entry for any multicast sources our neighbour is aware of, bearing in mind all SA messages are flooded to MSDP neighbours unless filtered by an SA-filtering

R3#sh ip msdp sa-cache
MSDP Source-Active Cache – 1 entries
(74.74.74.5, 239.4.4.4), RP 2.2.2.2, BGP/AS 2, 00:01:50/00:05:04, Peer 10.255.255.2

Now let’s consider SA-filtering, as per my multicast notes blog I will use this to filter messages sent from R2 to R3. The scenario is as follows, R5 is the multicast source of 239.4.4.4 and 232.8.8.8 but R7 doesn’t need to receive multicast traffic from R5 on 232.8.8.8 only traffic from R5 239.4.4.4 is required. So on R2 which is the nearest RP to R5 I will filter outbound messages to R3. In reverse we could do an inbound filter on R2. Configuration is as follows on R2.

access-list 101 permit ip host 74.74.74.5 host 239.4.4.4
!
ip msdp sa-filter out 10.255.255.3 list 101

Let’s test this with a ping to the multicast address.

R5#ping 239.4.4.4 rep 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 239.4.4.4, timeout is 2 seconds:
Reply to request 0 from 74.74.74.1, 28 ms
Reply to request 0 from 17.17.17.7, 156 ms
Reply to request 1 from 74.74.74.1, 52 ms
Reply to request 1 from 17.17.17.7, 180 ms
Reply to request 2 from 74.74.74.1, 48 ms
Reply to request 2 from 17.17.17.7, 176 ms

R5#ping 232.8.8.8 rep 3
Type escape sequence to abort.
Sending 3, 100-byte ICMP Echos to 232.8.8.8, timeout is 2 seconds:
Reply to request 0 from 74.74.74.1, 52 ms
Reply to request 1 from 74.74.74.1, 48 ms
Reply to request 2 from 74.74.74.1, 40 ms

So as you can see above R7 replies to 239.4.4.4 but not 232.8.8.8, so the filter works as expected.

Also on R2 I used an inbound SA-Filter this does the opposite from an outbound filter. So the commands will only accepts SA-messages from peer 10.255.255.3 with (10.10.10.4 , 232.8.8.8).

ip msdp sa-filter in 10.255.255.3 list 102
access-list 102 permit ip host 10.10.10.4 host 232.8.8.8

Below you can see some packet captures on the routers showing the source active messages being sent from R2 to R3 and also from from R3 to R6.

MSDP SA from R2 -> R3

MSDP SA from R2 -> R3

MSDP SA from R3 -> R6

MSDP SA from R3 -> R6

RH

This lab is setup for static Rendezvous Point, you may use this in a small environment with only one or two multicast sources, static RP is managed by manually configuring each router with “ip pim rp-address x.x.x.x”. As you can probably tell manually updating each router with a new RP could be time conusming on larger networks,  so we will cover other ways to assign or elect the RP in my future blogs.

Static RP topology

Static RP topology

Attached are the config’s for each router (Router Configs). As I’m using GNS3 I had to enter commands “no ip route-cache” and “no ip mroute-cache” on each PIM interface. Also add “no ip cef”. You won’t need this for physical routers.

Now lets confirm our configurations with some well known commands.

From R4 we need to confirm the IGMP memberships “show ip igmp groups” will show all the IGMP joins the router has recieved from hosts wanting to join the multicast stream.

R4#sh ip igmp groups
IGMP Connected Group Membership
Group Address    Interface                Uptime    Expires   Last Reporter   Group Accounted
239.4.4.4        FastEthernet0/0          00:21:26  stopped   10.10.10.4

Next we confirm the RP with the command  ” show ip pim rp” lets do this from R2, this will show us all RP’s and the multicast group address they are responsible for.

R2#sh ip pim rp
Group: 239.4.4.4, RP: 10.10.10.3, v2, uptime 00:03:55, expires never
Group: 224.0.1.40, RP: 10.10.10.3, v2, uptime 00:24:12, expires never

Next from R1 we initate some traffic using “ping 239.4.4.4” you should get a reply, now we can run the command from R2 “show ip mroute summary”. This will show us the shared tree and source tree for the multicast source.

R2#sh ip mroute summary
IP Multicast Routing Table
(*, 239.4.4.4), 00:02:47/stopped, RP 10.10.10.3, OIF count: 0, flags: SPF
(56.56.56.1, 239.4.4.4), 00:02:47/00:00:46, OIF count: 1, flags: FT

This confirms our multicast setup is working correctly.

If you are struggling with source and shared tree check out this blog (http://packetlife.net/blog/2008/oct/20/pim-sm-source-versus-shared-trees)

RH

 

Link aggregation control protocol (LACP) is one way in which we can group switch interfaces together as one, this group of interfaces is called a port-channel. The main advantages of this are redundancy and increased bandwidth. For my example we have 2 switches these are layer 2 switches, you can in fact have a layer 3 etherchannel I’ll cover this in a future blog.

Etherchannel

With your 2 switches cabled up as they are, the usual spanning-tree rules apply, so some of the ports will be blocked and the root bridge will be elected etc. Here’s how you make them into a single link.

switch1(config)#interface range fa 1/0/1 – 4
switch1(config-if)#switchport trunk encapsulation dot1q
switch1(config-if)#switchport mode trunk
switch1(config-if)#channel-protocol lacp
switch1(config-if)#channel-group 1 mode active

Once you’ve put in the last command your switch will create the port-channel interface, any changes you want to make to the config should be made under the port-channel and not the individual interfaces.

Now do the same on the other switch exactly the same commands, something worth mentioning here is if both sides are set to active they will both attempt to form the etherchannel, you could set one side to passive and have only one switch negotiate, but I’d recommend both set to active. Cisco recommend that the Access layer switch be set to passive and the Distribution layer switch be set to active.

Confirm your configs by using the commands below

sh int trunk
sh etherchannel summary
sh etherchannel port-channel
sh etherchannel protocol
sh etherchannel detail

Next you will want to test the redundancy so what you can do is set up IP address’ on VLAN 1 and send a continuous ping across to the other switch, then pull the cable’s out one by one, the pings wont fail until all the cables are pulled.

The bandwidth of this port-channel is 4 x 100mbps so with full-duplex this amounts to 800mbps.

Another topic that should be mentioned here is load-balancing, this is done by an algorithm based on source MAC address, if you use the command “sh etherchannel load-balance” you will see the default is “src-mac” so what happens is the etherchannel load balances traffic based on source MAC. This can be changed, just use the command “port-channel load-balance ?” To see the options and choose what best suits your environment.
RH

I was asked a question some time ago, it went something like this “Robert, why does it take so long for me to open a file over the network? Its taking over 5 minutes” So I went through the usual troubleshooting steps like asking what is the file?, where is the file?, how big is the file? etc. It quickly dawned on me that he was opening a file somewhere in the region of 150Megabytes so my reply was “I’m sorry but this is normal time you can expect for downloading a file of 150Megabytes over the WAN” the users reply was “Robert we have a 50mbps link at each end surely this should take like 3 seconds?”

Firstly let me explain the difference between Megabytes and Megabits, Megabytes are what you and I refer to as file size’s so you might say to your friend “my flash drive is 2Gigabytes” which is  2048Megabytes or the song you just bought off iTunes is 6Megabytes we are familiar with this and accept it as the standard unit of measuring file size and your home Broadband speed………………WRONG!

Your ISP will give you a broadband connection of say 10Megabits per second (mpbs) which is in fact only 1.25Megabytes per second, this is because bandwidth is measured in Megabits per second and not Megabytes.

There are 8Megabits in 1Megabyte so just divide your current Broadband speed by 8 to get you speed in Megabytes

So to answer the users question downloading 150Megabytes at 50mpbs should download in 24 seconds? right? lets work it out.

50mpbs / 8 to get Megabytes = 6.25Megabytes / 150Megabytes = 24 seconds ………….. WRONG!

We haven’t thought about latency and how that can affect your TCP traffic flow.  When you open up a command prompt and you ping across the WAN to your file server you can see something called the RTT see below example.

C:\Users\Robert>ping 8.8.8.8
Pinging 8.8.8.8 with 32 bytes of data:
Reply from 8.8.8.8: bytes=32 time=32ms TTL=44
Reply from 8.8.8.8: bytes=32 time=32ms TTL=44
Reply from 8.8.8.8: bytes=32 time=32ms TTL=44
Reply from 8.8.8.8: bytes=32 time=32ms TTL=44

Ping statistics for 8.8.8.8:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 32ms, Maximum = 32ms, Average = 32ms

Highlighted in Red is your RTT or your round time trip this is the time it takes the packet to go to the destination and back again the further away the host the higher the RTT.

How does latency affect download speed? Well we have to think how TCP traffic works when transferring files over the WAN, your default TCP window size is 64KiloBytes this means that for every 64KiloBytes of data sent the receiving end has to send an acknowledgement, so this means in a 150Megabyte file there are 2400 acknowledgements.

So you can now see why latency can affect throughput, you could increase the default TCP window size on both sending and receiving devices, but this is not the answer as if there is any packet loss on your link the you will have to resend any data that gets lost on the way, this is the fundamental reason behind TCP traffic and guaranteed packet transmission.

So what can you do to speed things up? You could bring the 2 countries closer together? or work out a way of making the speed of light faster?

Luckily there are technologies out there and they fall under the category of WAN Acceleration or WAN Optimization. I wont mention any particular company but there are quite a few out there some even optimize UDP traffic.

They use many techniques like De-duplication, Compression, Caching, CIFS, Latency Improvements etc. All of this together can make the connection between 2 offices appear a lot quicker and even reduce the amount of traffic sent over the link by as much as 50%. Both ends of the connection must have an accelerator and usually connecting back to a main bridgehead or control server, in each accelerator there is usually some sort of hard drive perhaps a SSD for faster read/write times which stores all the data previously accessed over the link.

The actual calculation for working out maximum throughput on a WAN link no matter what the bandwidth is as follows.

TCP window size in bits / latency in seconds = throughput in bits per second / 1048576(bits in a megabit) = mbps

So here’s an example to answer the users question. The latency was 145ms at the time.

524288 bits / 0.145 = 3615799 bits per second.

Lets turn that into mbps so:  3615799 / 1048576 = 3.4mbps 

Divide this by 8 to get Megabytes: 3.4mpbs / 8 = 0.425 MBps

So take the: 150Megabytes / 0.425MBps = 352 seconds which is the same as 5+ minutes

This calculation is based on optimal network conditions with zero packet loss and zero jitter.

RH