Candela Technologies Logo
 
http://www.candelatech.com
sales@candelatech.com
+1 360 380 1618 [PST, GMT -8]
Network Testing and Emulation Solutions

Trouble-Shooting and Debugging for LANforge

This document aims to provide information on how to trouble-shoot various issues seen when using LANforge. Unless otherwise noted, commands are to be run on the Linux command line.

  1. Gathering Logs

    The first and easiest method to gather logs is to run the lfdebug.bash script to automatically gather logs and system info.

    su - root
    cd /home/lanforge
    ./lfdebug.bash
    
    The lfdebug.tar.gz file will be created. Upload that or otherwise send it to support personnel. The upload link is: https://www.candelatech.com/private/upload.php If you do not know your user name and password, use: user: guest, password: guest

    If you know how to reproduce the problem, please tell support as many details as possible. If unsure, please send the logs with whatever info you have...maybe the problem will be obvious from the logs.

    Specific Logs of Interest
    • LANforge logs are good for debugging LANforge configuration issues, problems starting/stopping auxillary services (such as wpa_supplicant) and other LANforge related problems. The LANforge manager log is: /home/lanforge/lanforge_log_0.txt
      The LANforge resource log is: /home/lanforge/lanforge_log_1.txt for resource 1, and /home/lanforge/lanforge_log_2.txt for resource 2, etc.
    • 'dmesg' output shows kernel-specific logs. This includes ath10k (AC NIC) crash and debug logs, interface information, some wireless connection related info, driver errors, etc. Save this to a file with a command similar to: dmesg > /home/lanforge/dmesg_log.txt
    • The system logs are found in /var/log/messages on older systems, and on newer systems, the logs may be seen with the 'journalctl' command. 'man journalctl' for details. To follow logs, journalctl -f for instance.
    • Wifi station (vUE) logs are generated by the wpa_supplicant program, and are located in /home/lanforge/wifi/wpa_supplicant_log_<radio-name>.txt Radios are normally named wiphyX or vphyX. You may view the whole log, or you may wish to use tail -f <log-file-name> to view the new messages as you take some action (like reset a wifi station, perhaps).
    • The serial console (if connected) shows serious operating system errors such as fatal crashes, ath10k firmware crash logs, and other such problems. If you have general system instability this log may be very useful.

    Teamviewer Access
    • If you can reproduce the problem or leave the system in the errored state (assuming the OS is basically functional), providing teamviewer access to the system can be extremely useful.
    • Send teamviewer ID and password to support.
    • Make sure the teamviewer PC is set to US keyboard layout.
    • Ensure that the system running teamviewer has an ssh client that can connect to the LANforge. Notify support of the IP address of the lanforge, and root password if it has been changed from default.
    • Ensure the teamviewer machine will not go to sleep, as support may access the system while you are not at work if you are in a different time zone. And/or, tell support the password of the teamviewer machine if you think it might go to sleep or enable a screen-saver..
    • Describe problem so that support personell may better understand what needs to be fixed.
  2. Phantom Objects

    LANforge reports an object as Phantom if it is configured to exist but does not actually exist in the operating system. For instance, if you remove a physical NIC, the old ports will remain in LANforge, but they will be marked Phantom.

    When dealing with virtual interfaces, such as VLANs, wifi Stations, etc, you may see phantom devices if there was problem creating the virtual interfaces or if the virtual interface drivers could not be properly loaded. Unloading a driver will also cause phantom ports, and manaully deleting an interface outside of LANforge on the Linux command line will do the same. Please always manage interfaces from within LANforge.

    Virtual Radios and Stations (vUE) that use them.
    • Ensure that the kernel is version 3.17.8+ or higher. You can check the kernel version with the 'uname -a' command on the Linux command line.
    • Use 'lsmod' to make sure the mac80211_hwsim module is loaded.
    • In LANforge, check to see if the Port-Type is WIFI-RADIO by double-clicking on the virtual radio device in the Port-Mgr tab of the LANforge GUI.

    Ath10k virtual Stations
    • Ensure that the kernel is version 3.14.14 or higher. You can check the kernel version with the uname -a command on the Linux command line.
    • Use lsmod to make sure the ath10k_pci module is loaded.
    • Make sure firmware does not crash on boot: rmmod ath10k_pci; modprobe ath10k_pci; dmesg
    • If it does crash on startup, verify that /etc/modprobe/ath10k.conf is configured properly, for instance:
                 [greearb@ath10k ~]$ cat /etc/modprobe.d/ath10k.conf 
                 options ath10k_core nohwcrypt=1
                 options ath10k_core override_eeprom_regdomain=840
                 options ath10k_core num_vdevs_ct=64
                 options ath10k_core num_peers_ct=128
                    
    • Use lspci to make sure the OS was able to find the Ath10k hardware.
    • In LANforge, check to see if the Port-Type is the proper type (WIFI-STATION, VAP, etc).
  3. Cannot Authenticate/Connect Wireless Stations (vUE)

    LANforge uses wpa_supplicant to manage it's wireless stations. For 'interesting' problems, you often have to go look at the supplicant logs in /home/lanforge/wifi to better understand why it is not working. Some other things to check are below.

    Check Scan Results.
    • Wireless Events window in the LANforge GUI should be continuously showing scan requests. If not, make sure that the station is Admin-UP. You can set the admin status with the 'Batch Modify' window in the Port-Mgr screen.
    • Double-click the station interface in the Port-Mgr screen, then select the 'Display Scan' window. Check that the expected APs are in the scan. If not, AP may be out of range or mis-configured. You can force a full scan of all channels with the 'Scan' button.
    • If AP appears in scan results, you can double-click the line in the table to see the scan details. Also, check that encryption is proper (ie, WPA2 is enabled if AP is WPA2, or WPA is enabled if AP is just configured to use WPA.
    • If AP is 'hidden', then you have to enable the 'Scan Hidden' button on the 'Misc Configuration' tab in the Port-Modify widget. Double-click the interface in the LANforge Port-Mgr to get this window.

    Check logs
    • Check dmesg output on linux command line to see if there are any obvious errors.
    • Make supplicant logs be verbose by double-clicking the radio (wiphy) device in the LANforge Port-Mgr screen and selecting the 'Verbose Debug' checkbox and click OK.
    • Check /home/lanforge/wifi/wpa_supplicant_log_<phy-name>.txt. You might 'tail -f' this file while you reset the station in LANforge to see what happens when it tries to scan and connect. Check for connection denied messages, supplicant not using the AP because of encryption configuration mismatches, etc. Connection timeout isues may indicate poor signal strength, very poor RF environment, buggy AP, or buggy LANforge.

    Sniff packets
    • Try sniffing the air (or virtual transport if using virtual radios) with another wifi NIC on LANforge or an external machine. In many cases, only a sniff can determine whether it is the LANforge or the AP that is at fault.
  4. Cannot train to expected Wifi Link Speed

    The normal problem here is that the AP does not have 'WMM' enabled, or is otherwise mis-configured. LANforge can also be configured to disable HT speeds or otherwise force its stations to run at legacy protocol, so make sure that you are not disabling HT or forcing to /b speeds or something like that.

    Check Scan Results.
    • Double-click the station interface in the Port-Mgr screen, then select the 'Display Scan' window. Check that the AP supports the expected rates and has WMM enabled, etc. (Double-click the AP line in the Scan-Results window for details on that AP.)

    Check logs
    • Check dmesg output on linux command line to see if there are any obvious errors about disabling HT and/or VHT
    • Reset station in the LANforge-GUI's Port-Mgr screen to see if rate-control normalizes at higher rate.
    • While running at least some traffic in both directions, check the reported TX-Rate and RX-Rate rates in the Port-Mgr tab. This is the link speed, not the current 'bits per second' for actual traffic flowing across the interfaces.

    Sniff on air
    • Try sniffing the air with another wifi NIC on LANforge or an external machine. Missing ACKs would cause rate-control to back-off, and can be caused by congested RF environments. For ath9k a/b/g/n radios, check the 'Activity' column in the Port-Mgr tab. Unfortunately, ath10k does not support this yet.
  5. No IP address on Interface (WiFi, Ethernet, etc)

    There can be multiple reasons why an interface will not have an IP address. First, make sure it is connected. For wireless, this means the AP field in the Port-Mgr tab of the LANforge-GUI shows the BSSID of the AP. For Ethernet, double-click and make sure it reports Link status.

    Check configuration.
    • Double-click the interface in the Port-Mgr screen, make sure DHCP is enabled if attempting DHCP, or otherwise assign a fixed IP address.
    • Make sure port is not admin-down. Batch-Modify on the Port-Mgr tab can be used to enable the interface if it is down.

    Check logs and Alerts
    • Check the 'Alerts' tab in the LANforge GUI. Some DHCP failures are shown there. Link-Down will also indicate that the port has no connectivity, so it cannot even try to do DHCP yet.
    • Reset station in the LANforge-GUI's Port-Mgr screen to see if bouncing the interface makes it come up properly.
    • Check dmesg output on Linux command line to see if there are any obvious errors after resetting the interface.
  6. General Network Connectivity

    When the interfaces have IP addresses, and you have configured Layer-3 or some other protocol to generate traffic, you may find that the traffic still does not flow properly. Here are some things to check...

    Check connectivity on Command Line
    • See if you can ping: ping -I <port-name> <destination-ip> Use similar command to ping the gateway if one is expected to exist on your network.
    • Check ARP tables with command: ip neigh show
    • Check route tables: ip route show table <table-id> The table-id is the numeric port-id (last number in the EID as seen in the Port-Mgr tab of the GUI). For EIDs greater than 252, then it is EID + 4. The management port uses table-id 0 (default table)
    • For DNS related issues, you can use 'dig' to query the expected DNS server.

    LANforge Configuration
    • Try Layer-3 UDP first when testing connectivity. It is a simpler protocol and often easier to debug.
    • If you are using NAT or a Firewall in the system under test, use TCP protocol because NAT and Firewalls may reject UDP traffic from the outside network. Make sure the 'B' side of the network is on the outside of the NAT/Firewall since the B side acts as the server.
    • Sniff the ports using tshark/tcpdump on command line, or wireshark through the Port-Mgr 'Sniff' button to better understand what is happening on the interface.
  7. LANforge System Hangs

    There can be many reasons for real or appearant system hangs. If possible, connect a serial port to the system and make sure it is logging console output. True system crashes will normally dump debug info to the serial console.

    Check Total OS Hang.
    • From serial console or keyboard/monitor, see if you can get any response or log into the system. If not, then OS is truly hung. Ensure serial console is properly set up, and the serial console monitor system is set to log data to send to Candela support.

      Reboot system and contact support.

    Check Network Connectivity.
    • From serial console or keyboard/monitor, see if you can ping from LANforge to another nearby system (gateway, for instance).
    • Try to ping the LANforge from a nearby system if you do not have console access.
    • If you cannot ping, but OS otherwise seems fine, double-check networking, especially the management port. Make sure you do not have duplicate IP addresses (especially management IP) on different interfaces. 'ip addr show' will list all IP addresses.
    • Make sure /etc/resolv.conf points to valid name-server, or set it to 'nameserver 127.0.0.1' so that it fails fast.

    Check Process Hangs.
    • If OS and networking seems OK, you might try to strace the 'btserver' processes to see if it is stuck. A common cause of appearant LANforge lockups is LANforge being stuck making a system('foo') call, which shows up as btserver being stuck in the 'wait' system call. The PID will be printed in that wait call trace, and a separate command line window could run ps -auxwww | grep <pid> to find what is hanging. This info will help support staff better understand the problem.
    • If system is sluggish, waiting several seconds in wait() but then continuing, you may be experiencing ath10k (AC NIC) problems. Check dmesg output to see if it is showing ath10k errors like -11 or firmware crashes. Often in this case, the best you can do is to save dmesg output to the file system and reboot. Send output to support staff upon reboot.
    • If strace shows no btserver activity at all for many seconds, you might try attaching to the btserver process with 'gdb' and then issue the 'bt' command to get the program backtrace.
  8. Plugin Crashes in LANforge GUI

    If you get an exception message from running a LANforge GUI Plugin, please copy the exception message and email it to us. In case you have to install a new plugin, refer to this FAQ on adding GUI plugins.


Candela  Technologies, 2417 Main Street, Suite 201, Ferndale, WA 98248, USA
www.candelatech.com | sales@candelatech.com | +1.360.380.1618
Google+ | Facebook | LinkedIn | Blog