This document aims to provide information on how to troubleshoot various
issues seen when using LANforge. Unless otherwise noted, commands are
to be run on the Linux command line.
-
Gathering Logs
The first and easiest method to gather logs is to run the lfdebug.bash script
to automatically gather logs and system info.
su - root
cd /home/lanforge
./lfdebug.bash
The lfdebug.tar.gz file will be created. Upload that or otherwise send it
to support personnel. The upload link is: https://www.candelatech.com/private/upload.php
If you do not know your user name and password, use: user: guest, password: guest
If you know how to reproduce the problem, please tell support as many details as possible.
If unsure, please send the logs with whatever info you have...maybe the problem will be
obvious from the logs.
- Specific Logs of Interest
-
- LANforge logs are good for debugging LANforge configuration issues, problems
starting/stopping auxillary services (such as wpa_supplicant) and other LANforge
related problems. The LANforge manager log is: /home/lanforge/lanforge_log_0.txt
The LANforge resource log is: /home/lanforge/lanforge_log_1.txt for
resource 1, and /home/lanforge/lanforge_log_2.txt for resource 2, etc.
- 'dmesg' output shows kernel-specific logs. This includes ath10k (AC NIC) crash and
debug logs, interface information, some wireless connection related info, driver errors, etc.
Save this to a file with a command similar to: dmesg > /home/lanforge/dmesg_log.txt
- The system logs are found in /var/log/messages on older systems, and on newer systems, the logs
may be seen with the 'journalctl' command. Use 'man journalctl' for details.
To follow logs, journalctl -f for instance.
- Wifi station (vUE) logs are generated by the wpa_supplicant program, and are located in
/home/lanforge/wifi/wpa_supplicant_log_<radio-name>.txt Radios are normally
named wiphyX or vphyX. You may view the whole log, or you may wish to use
tail -f <log-file-name>
to view the new messages as you take some action (like reset a wifi station, perhaps).
- The serial console (if connected) shows serious operating system errors such as fatal
crashes, ath10k firmware crash logs, and other such problems. If you have general system
instability this log may be very useful.
- Teamviewer Access
-
- If you can reproduce the problem or leave the system in the errored state (assuming
the OS is basically functional), providing teamviewer access to the system can be
extremely useful.
- Send teamviewer ID and password to support.
- Make sure the teamviewer PC is set to US keyboard layout.
- Ensure that the system running teamviewer has
an ssh client that can connect to the LANforge. Notify support of the IP address of
the lanforge, and root password if it has been changed from default.
- Ensure the teamviewer machine will not go to sleep, as support may access the system while
you are not at work if you are in a different time zone. And/or, tell support the password
of the teamviewer machine if you think it might go to sleep or enable a screen-saver..
- Describe problem so that support personell may better understand what needs to be fixed.
-
Phantom Objects
LANforge reports an object as Phantom if it is configured to exist but does
not actually exist in the operating system. For instance, if you remove a physical
NIC, the old ports will remain in LANforge, but they will be marked Phantom.
When dealing with virtual interfaces, such as VLANs, wifi Stations, etc, you may
see phantom devices if there was problem creating the virtual interfaces or if the
virtual interface drivers could not be properly loaded. Unloading a driver will
also cause phantom ports, and manaully deleting an interface outside of LANforge on
the Linux command line will do the same. Please always manage interfaces from within
LANforge.
- Virtual Radios and Stations (vUE) that use them.
-
- Ensure that the kernel is version 3.17.8+ or higher. You
can check the kernel version with the 'uname -a' command on the Linux command line.
- Use 'lsmod' to make sure the mac80211_hwsim module is loaded.
- In LANforge, check to see if the Port-Type is WIFI-RADIO by double-clicking on the
virtual radio device in the Port-Mgr tab of the LANforge GUI.
- Ath10k virtual Stations
-
-
Cannot Authenticate/Connect Wireless Stations (vUE)
LANforge uses wpa_supplicant to manage it's wireless stations. For 'interesting'
problems, you often have to go look at the supplicant logs in /home/lanforge/wifi
to better understand why it is not working. Some other things to check are below.
- Check Scan Results.
-
- Wireless Events window in the LANforge GUI should be continuously showing
scan requests. If not, make sure that the station is Admin-UP. You can set
the admin status with the 'Batch Modify' window in the Port-Mgr screen.
- Double-click the station interface in the Port-Mgr screen, then select the
'Display Scan' window. Check that the expected APs are in the scan. If not,
AP may be out of range or mis-configured. You can force a full scan of all
channels with the 'Scan' button.
- If AP appears in scan results, you can double-click the line in the table to
see the scan details. Also, check that encryption is proper (ie, WPA2 is enabled
if AP is WPA2, or WPA is enabled if AP is just configured to use WPA.
- If AP is 'hidden', then you have to enable the 'Scan Hidden' button on the
'Misc Configuration' tab in the Port-Modify widget. Double-click the interface
in the LANforge Port-Mgr to get this window.
- Check logs
-
- Check dmesg output on linux command line to see if there are any obvious errors.
- Make supplicant logs be verbose by double-clicking the radio (wiphy) device in the
LANforge Port-Mgr screen and selecting the 'Verbose Debug' checkbox and click OK.
- Check /home/lanforge/wifi/wpa_supplicant_log_<phy-name>.txt. You might 'tail -f' this
file while you reset the station in LANforge to see what happens when it tries to
scan and connect. Check for connection denied messages, supplicant not using the AP
because of encryption configuration mismatches, etc. Connection timeout isues may
indicate poor signal strength, very poor RF environment, buggy AP, or buggy LANforge.
- Sniff packets
-
- Try sniffing the air (or virtual transport if using virtual radios)
with another wifi NIC on LANforge or an external machine. In many cases, only
a sniff can determine whether it is the LANforge or the AP that is at fault.
-
Cannot train to expected Wifi Link Speed
The normal problem here is that the AP does not have 'WMM' enabled, or
is otherwise mis-configured. LANforge can also be configured to disable
HT speeds or otherwise force its stations to run at legacy protocol, so
make sure that you are not disabling HT or forcing to /b speeds or something
like that.
- Check Scan Results.
-
- Double-click the station interface in the Port-Mgr screen, then select the
'Display Scan' window. Check that the AP supports the expected rates
and has WMM enabled, etc. (Double-click the AP line in the Scan-Results window for details on that AP.)
- Check logs
-
- Check dmesg output on linux command line to see if there are any obvious errors about disabling HT and/or VHT
- Reset station in the LANforge-GUI's Port-Mgr screen
to see if rate-control normalizes at higher rate.
- While running at least some traffic in both directions, check the reported TX-Rate and RX-Rate
rates in the Port-Mgr tab. This is the link speed, not the current 'bits per second' for actual
traffic flowing across the interfaces.
- Sniff on air
-
- Try sniffing the air
with another wifi NIC on LANforge or an external machine. Missing ACKs would cause
rate-control to back-off, and can be caused by congested RF environments. For ath9k
a/b/g/n radios, check the 'Activity' column in the Port-Mgr tab. Unfortunately,
ath10k does not support this yet.
-
No IP address on Interface (WiFi, Ethernet, etc)
There can be multiple reasons why an interface will not have an IP address.
First, make sure it is connected. For wireless, this means the AP field in
the Port-Mgr tab of the LANforge-GUI shows the BSSID of the AP. For Ethernet,
double-click and make sure it reports Link status.
- Check configuration.
-
- Double-click the interface in the Port-Mgr screen, make sure DHCP is enabled
if attempting DHCP, or otherwise assign a fixed IP address.
-
- Make sure port is not admin-down. Batch-Modify on the Port-Mgr tab can be used
to enable the interface if it is down.
- Check logs and Alerts
-
- Check the 'Alerts' tab in the LANforge GUI. Some DHCP failures are shown there.
Link-Down will also indicate that the port has no connectivity, so it cannot even
try to do DHCP yet.
- Reset station in the LANforge-GUI's Port-Mgr screen
to see if bouncing the interface makes it come up properly.
- Check dmesg output on Linux command line to see if there are any obvious errors after
resetting the interface.
-
General Network Connectivity
When the interfaces have IP addresses, and you have configured Layer-3 or
some other protocol to generate traffic, you may find that the traffic still
does not flow properly. Here are some things to check...
- Check connectivity on Command Line (5.3.8 and higher, with VRF)
-
- LANforge 5.3.8 and higher, with kernel 4.16 and higher, will use VRF virtual
routing by default. When using commands on the linux command line, you may need
to use the ip vrf exec tool.
- First, ensure your environment is set up to use the VRF enabled tools:
cd /home/lanforge
. lanforge.profile
- Next, you find the VRF interface for the network device in question. Notice the
master _vrf21 in the output below:
[root@lf0313-6477 lanforge]# ip link show wlan1
9: wlan1: mtu 1500 qdisc noqueue master _vrf21 state UP mode DORMANT group default qlen 1000
link/ether 04:f0:21:38:98:f3 brd ff:ff:ff:ff:ff:ff
- Launch ping bound to that particular VRF device:
[root@lf0313-6477 lanforge]# ip vrf exec _vrf21 ping 7.7.7.1
PING 7.7.7.1 (7.7.7.1) 56(84) bytes of data.
64 bytes from 7.7.7.1: icmp_seq=1 ttl=64 time=4.71 ms
64 bytes from 7.7.7.1: icmp_seq=2 ttl=64 time=21.6 ms
- Check ARP tables with command:
[root@lf0313-6477 lanforge]# ip neigh show vrf _vrf21
7.7.7.1 dev wlan1 lladdr 04:f0:21:9f:c9:b0 DELAY
- Check route tables: ip route show table <table-id> The table-id is
the numeric port-id (last number in the EID as seen in the Port-Mgr tab of the GUI).
For EIDs greater than 252, then it is EID + 4. The management port uses table-id 0 (default table)
[root@lf0313-6477 lanforge]# ip route show vrf _vrf21
default via 7.7.7.1 dev wlan1
7.7.7.0/24 dev wlan1 scope link src 7.7.7.11
[root@lf0313-6477 lanforge]# ip route show table 21
default via 7.7.7.1 dev wlan1
broadcast 7.7.7.0 dev wlan1 proto kernel scope link src 7.7.7.11
7.7.7.0/24 dev wlan1 scope link src 7.7.7.11
local 7.7.7.11 dev wlan1 proto kernel scope host src 7.7.7.11
broadcast 7.7.7.255 dev wlan1 proto kernel scope link src 7.7.7.11
- For DNS related issues, you can use 'dig' to query the expected DNS server.
- Check connectivity on Command Line (5.3.7 and higher, without VRF)
-
- See if you can ping: ping -I <port-name> <destination-ip>
Use similar command to ping the gateway if one is expected to exist on your network.
- Check ARP tables with command: ip neigh show
- Check route tables: ip route show table <table-id> The table-id is
the numeric port-id (last number in the EID as seen in the Port-Mgr tab of the GUI).
For EIDs greater than 252, then it is EID + 4. The management port uses table-id 0 (default table)
- For DNS related issues, you can use 'dig' to query the expected DNS server.
- LANforge Configuration
-
- Try Layer-3 UDP first when testing connectivity. It is a simpler protocol and often easier
to debug.
- If you are using NAT or a Firewall in the system under test, use TCP protocol because
NAT and Firewalls may reject UDP traffic from the outside network. Make sure the 'B'
side of the network is on the outside of the NAT/Firewall since the B side acts as the
server.
- Sniff the ports using tshark/tcpdump on command line, or wireshark through the Port-Mgr 'Sniff'
button to better understand what is happening on the interface.
-
LANforge System Hangs
There can be many reasons for real or appearant system hangs. If possible,
connect a serial port to the system and make sure it is logging console output.
True system crashes will normally dump debug info to the serial console.
- Collect system console logs over the serial connection.
- Please follow the
Configuring
Serial Connection to LANforge cookbook to connect your laptop to the LANforge machine.
- Open the PuTTY session settings. Look for section Session→Logging.
Select All Session Output. With this enabled, you will collect session data to
the specified log file.
- Check Total OS Hang.
-
- From serial console or keyboard/monitor, see if you can get any
response or log into the system. If not, then OS is truly hung.
Ensure
serial console is properly set up, and the serial console monitor
system is set to log data to send to Candela support.
- Reboot system and contact support.
- Check Network Connectivity.
-
- From serial console or keyboard/monitor, see if you can ping from
LANforge to another nearby system (gateway, for instance).
- Try to ping the LANforge from a nearby system if you do not have
console access.
- If you cannot ping, but OS otherwise seems fine, double-check networking,
especially the management port. Make sure you do not have duplicate
IP addresses (especially management IP) on different interfaces. 'ip addr show'
will list all IP addresses.
- Make sure /etc/resolv.conf points to valid name-server,
or set it to nameserver 127.0.0.1 so that it fails quickly.
- Check Process Hangs.
-
- If OS and networking seems OK, you might try to strace the 'btserver' processes to see if
it is stuck. A common cause of appearant LANforge lockups is LANforge being
stuck making a system('foo') call, which shows up as btserver being stuck
in the 'wait' system call. The PID will be printed in that wait call
trace, and a separate command line window could run ps -auxwww | grep <pid> to find
what is hanging. This info will help support staff better understand the problem.
- If system is sluggish, waiting several seconds in wait() but then continuing,
you may be experiencing ath10k (AC NIC) problems. Check dmesg output to see if
it is showing ath10k errors like -11 or firmware crashes. Often in this case,
the best you can do is to save dmesg output to the file system and reboot. Send
output to support staff upon reboot.
- If strace shows no btserver activity at all for many seconds, you might try attaching
to the btserver process with 'gdb' and then issue the 'bt' command to get the program
backtrace.
-
LANforge GUI hangs or other problems
To debug the LANforge-GUI, you should run it from the command-line and watch the
console for java exceptions and other debug information. In the case where it hangs,
you can press CTRL+\ on Linux in the console window and it will dump a stack trace of all
threads. In Windows, it is CTRL+SHIFT+ESC.
This is vital for debugging hangs (application freezes).
To run the GUI from the console:
- Become 'lanforge' user on the LANforge machine.
- Change to the GUI directory. If you are using LANforge version 5.4.1, it would be this:
cd /home/lanforge/LANforgeGUI_5.4.1
- Start the GUI:
./lfclient.bash -s localhost
- Use the GUI normally, and grab the output in case there are problems. Here is an example of
an exception due to GUI version mismatch.
java.lang.RuntimeException: Assert failed: i: 80 buf.length: 96
at candela.lanforge.LANforgeMgr.lfassert(LANforgeMgr.java:1685)
at candela.lanforge.TrafficProfile.(TrafficProfile.java:255)
at candela.lanforge.TrafficProfileHandler.handle(CtrlHandlerMgr.java:988)
at candela.lanforge.CtrlHandlerRunnable.run(CtrlHandlerMgr.java:239)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:756)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:726)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
- Save the console output and send it to your support contact.
-
Plugin Crashes in LANforge GUI
If you get an exception message from running a LANforge GUI Plugin, please copy the exception
message and email it to us. In case you have to install a new plugin, refer to this
FAQ on adding GUI plugins.