Chapter 9. Debugging and Troubleshooting

Table of Contents

This chapter explains various methods for debugging problems with Flumotion. Though we are always working to make the server more robust and improve the quality of the feedback through our setup and configuration tools, you might sometimes need to do some investigation to diagnose your problem.

Debugging Flumotion

Many things can go wrong when streaming. Flumotion depends on a large stack of underlying technologies to function perfectly for your configuration. There is a huge array of hardware devices, drivers and libraries out there. So when something goes wrong it can be caused by anything from unsupported hardware, broken hardware, incomplete drivers to untested library versions. Fluendo and the Flumotion community are only able to test a small subset of the options available out there, so if you are not running a tested configuration you might run into problems. This chapter should help you debug such problems, and help you resolve them.

Debug levels

Flumotion and GStreamer log events at five different levels of increasing verbosity.

  • 0: No debug output at all.

  • 1: Only errors (default)

  • 2: Errors and warnings

  • 3: Errors, warnings and info messages

  • 4: Errors, warnings, info, and debugging messages

  • 5: Everything

Logging of events is also done in different categories. This allows you to log only certain events at a higher debug level than others. The debug variable can be set to a comma-separated list of category:level pairs. For example, to set the logging to level 3 for all categories, but to 4 for the admin category, component, you could run a workerwith

FLU_DEBUG=*:3,admin:4 flumotion-worker -u user -p test > worker.log 2>&1
         

Debugging on the Flumotion level

When Flumotion fails to run correctly the first place to start looking is, of course, at Flumotion itself. Even if the failure is caused by something further down the stack, Flumotion's internal error messages might be able to tell you what the problem is.

The Flumotion log files are by default placed under the /var/log/flumotion directory. The manager log is by default called 'manager.default.log' and the workerlog is called 'worker.default.log'.

Inside these log files you will see information on the workings of the Flumotion server, with the level of detail depending on the log level used. By default relatively little information is stored in these files so in many cases you will need to increase the logging level in order to get enough information to solve the problem.

There are two ways of increasing the amount of debug information in the log files. The simplest is adding <debug>5</debug> to the top of your default.xml file under /etc/flumotion/workers, just underneath the <worker> tag.

The other way is to set an environment variable, FLU_DEBUG, for your process. This is the equivalent of setting <debug>5</debug> in the worker default.xml file.

Debugging GStreamer problems

Sometimes Flumotion problems are caused by issues originating from the GStreamer multimedia framework that Flumotion is built upon. And while these issues might not be in GStreamer, but even further down the stack, like the DV camera interfacing library or the video4linux driver for your specific camera, debugging at the GStreamer level will often help you identify your problem.

Applications like Flumotion can make use of the extensive GStreamer debugging system to debug pipeline problems. Elements will write output to this system to log what they're doing. It's not used for error reporting, but it is very useful for tracking exactly what an element is doing, which can come in handy when debugging Flumotion issues.

The GST_DEBUG environment variable is used to turn on GStreamer debugging.

GST_DEBUG=*:5 flumotion-worker -u user -p test > worker.log 2>&1
	

Also here the * can be replaced by a component name to get only debugging information related to that component written. For example, oggmux:5 would turn on debugging for the Ogg muxer element.

If as a result of reading the generated log files, you think that the cause of your problems are GStreamer plugins or elements not working as expected, there is a nice command line tool in GStreamer called gst-launch which you can use to do some manual tests. For instance if you are unsure if the v4l plugin is providing Flumotion with the needed data flow from your TV card you can test it by running a command like this:

gst-launch v4lsrc device=/dev/video0 ! ffmpegcolordec ! ximagesink
	

This pipeline tries to access the video0 Linux device and run the output through a color space converter then output it to the x window terminal. If this fails you know that either the GStreamer plugin or the v4l driver in question is broken somehow; the reported error might even tell you which.

Debugging other parts of the system

Sometimes neither Flumotion or GStreamer level debug output lets you understand why things are not working, although they should at least give you a better understanding of where in the stack the problem is to be found. We can not give you lot of general debugging advice for these underlying libraries except to suggest you look at their documentation and mailing lists for information. Sometimes even just searching for Google for information about your hardware device and Linux will turn up information about driver problems etc., with your specific hardware.

Reading log files

Here are some tips to read Flumotion log files. First of all, make sure you are looking at a relevant part of the log file. The log files contain all the accumulated logging from multiple runs. Typically you are just interested in the current run you are analyzing. The manager, worker and job processes log four specific points in their lifetime at level 3 (INFO) that you can search for in the logs:

  • The process is starting. This gets logged as early as possible. If you want to check the logs for the last run, you go to the end of the log file and search backwards for this marker. Example:

    INFO  [17204]    manager           Mar 01 15:37:52
          Starting manager 'planet' (flumotion/manager/main.py:253)
    	

  • The process has started. From this point on, the process should keep running on its own; any case where it stops without being told to stop is a bug in the software and should be filed and fixed. Example:

    INFO  [17226]    worker            Mar 01 15:41:07
          Started worker 'localhost' (flumotion/worker/main.py:315)
    	

  • The process has been told to stop, and will now begin cleanup. For example, the workerwill wait for its child jobs to terminate.

    INFO  [17226]    worker            Mar 01 15:43:40
          Stopping worker 'localhost' (flumotion/worker/main.py:329)
            

  • The process is ready to terminate. This is the last point it logs before exiting.

  • INFO  [13329]    job               Mar 01 14:42:26
          Stopped job '/default/producer-audio' (flumotion/job/main.py:96)
            

Look for important errors first. Start by searching for ERROR and WARN. Start with the first ones you see after the starting point.

Analyzing core dumps

Sometimes Flumotion will encounter a bug and core dump. This happens because of a bug in the lower layers - GStreamer, the Python interpreter, a support library, ...

It is important for the Flumotion developers to get a good stack trace from this core dump to be able to locate and fix the bug. To do this, you need to generate a stack trace from the core dump.

When this happens in a component, the UI will show a message like this one:

The component crashed.
The core dump is '/var/cache/flumotion/core.21328' on the host running 
'localhost'.
Posted on Mon 24 Mar 2008 05:58:15 PM CET.
    

To produce a backtrace, you need to launch gdb on python with the given core file, like this:

# gdb python /var/cache/flumotion/core.21328
GNU gdb Red Hat Linux (6.3.0.0-1.122rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".


Core was generated by `/usr/bin/python /usr/bin/flumotion-job 
/default/video-encoder-ogg /tmp/flumotion'.
Program terminated with signal 11, Segmentation fault.

#0  0x4f40388a in _int_free (av=0xb3f00010, mem=0xb3f82230) at malloc.c:4410
4410            unlink(nextchunk, bck, fwd);
(gdb) thread apply all backtrace

Thread 5 (process 28840):
#0  0x008b2410 in ?? ()
#1  0xbfea3a38 in ?? ()
#2  0x00000008 in ?? ()
#3  0x00000004 in ?? ()
#4  0x4f45ecac in *__GI___poll (fds=0x4f4ceff4, nfds=4, timeout=8)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#5  0x4f736363 in g_main_context_iterate (context=0xa0935d8, block=1, 
    dispatch=1, self=0x9fc7120) at gmain.c:2849
#6  0x4f7366d9 in IA__g_main_loop_run (loop=0xa127bd0) at gmain.c:2751
#7  0x006138ec in _wrap_g_main_loop_run (self=0xb7fd4a50) at pygmainloop.c:257
#8  0x4f587b72 in PyEval_EvalFrame (f=0xa0ea86c) at Python/ceval.c:3547
#9  0x4f588748 in PyEval_EvalCodeEx (co=0xb7d54760, globals=0xb7d4b3e4, 
    locals=0x0, args=0xa1d588c, argcount=1, kws=0xa1d5890, kwcount=0, 
    defs=0xb7c56c18, defcount=1, closure=0x0) at Python/ceval.c:2736
#10 0x4f586afb in PyEval_EvalFrame (f=0xa1d5714) at Python/ceval.c:3656
#11 0x4f588748 in PyEval_EvalCodeEx (co=0xb7b7f9a0, globals=0xb7b5af0c, 
    locals=0x0, args=0xb7cd0638, argcount=1, kws=0xa1cb6a8, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2736
#12 0x4f53a5ae in function_call (func=0xb7affe64, arg=0xb7cd062c, 
    kw=0xb7cfd13c) at Objects/funcobject.c:548
#13 0x4f522587 in PyObject_Call (func=0xb7affe64, arg=0xb7cd062c, 
    kw=0xb7cfd13c) at Objects/abstract.c:1795
#14 0x4f585c38 in PyEval_EvalFrame (f=0x9f845e4) at Python/ceval.c:3840
#15 0x4f588748 in PyEval_EvalCodeEx (co=0xb7f5cda0, globals=0xb7f5b13c, 
    locals=0x0, args=0x9fa08f0, argcount=2, kws=0x9fa08f8, kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2736
#16 0x4f586afb in PyEval_EvalFrame (f=0x9fa074c) at Python/ceval.c:3656
#17 0x4f588748 in PyEval_EvalCodeEx (co=0xb7f5cde0, globals=0xb7f5b13c, 
    locals=0x0, args=0x9f9d8ec, argcount=1, kws=0x9f9d8f0, kwcount=0, 
    defs=0xb7f64718, defcount=3, closure=0x0) at Python/ceval.c:2736
#18 0x4f586afb in PyEval_EvalFrame (f=0x9f9d79c) at Python/ceval.c:3656
#19 0x4f588748 in PyEval_EvalCodeEx (co=0xb7f5ca20, globals=0xb7fb0824, 
    locals=0xb7fb0824, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:2736
#20 0x4f5887d3 in PyEval_EvalCode (co=0xb7f5ca20, globals=0xb7fb0824, 
    locals=0xb7fb0824) at Python/ceval.c:484
#21 0x4f5a5538 in run_node (n=Variable "n" is not available.
) at Python/pythonrun.c:1265
#22 0x4f5a6c48 in PyRun_SimpleFileExFlags (fp=0x9f68008, 
    filename=0xbfea5c4c "/usr/bin/flumotion-job", closeit=1, flags=0xbfea4914)
    at Python/pythonrun.c:860
#23 0x4f5a732a in PyRun_AnyFileExFlags (fp=0x9f68008, 
    filename=0xbfea5c4c "/usr/bin/flumotion-job", closeit=1, flags=0xbfea4914)
    at Python/pythonrun.c:664
#24 0x4f5add55 in Py_Main (argc=3, argv=0xbfea49e4) at Modules/main.c:493
#25 0x080485b2 in main (argc=0, argv=0x0) at Modules/python.c:23

Thread 4 (process 28842):
#0  0x008b2410 in ?? ()
#1  0xb7a1f4e8 in ?? ()
#2  0x00000000 in ?? ()

Thread 3 (process 28843):
#0  0x008b2410 in ?? ()
#1  0xb6d11148 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000003 in ?? ()
#4  0x4f45ecac in *__GI___poll (fds=0x4f4ceff4, nfds=3, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:87
#5  0x0030e4ef in gst_fdset_wait (set=0xa47bc00, timeout=-1) at gstfdset.c:523
#6  0x00312b29 in gst_multi_fd_sink_thread (sink=0xa4a8210)
    at gstmultifdsink.c:2269
#7  0x4f7508ef in g_thread_create_proxy (data=0xa470998) at gthread.c:582
#8  0x4f6e23b6 in start_thread (arg=0xb6d11ba0) at pthread_create.c:261
#9  0x4f46833e in ?? () from /lib/libc.so.6

Thread 2 (process 28846):
#0  0x008b2410 in ?? ()
#1  0xb4a4e438 in ?? ()
#2  0x00000003 in ?? ()
#3  0x00000000 in ?? ()

Thread 1 (process 14372):
#0  0x4f40388a in _int_free (av=0xb3f00010, mem=0xb3f82230) at malloc.c:4410
#1  0x4f40741d in *__GI___libc_free (mem=0xb3f82230) at malloc.c:3447
#2  0x00817126 in theora_comment_query () from /usr/lib/libtheora.so.0
#3  0x00835910 in theora_decode_header () from /usr/lib/libtheora.so.0
#4  0x0082a724 in theora_clear () from /usr/lib/libtheora.so.0
#5  0x002f61aa in theora_enc_reset (enc=0xa4a6078) at theoraenc.c:314
#6  0x002f838b in theora_enc_chain (pad=0xa1f30e0, buffer=0xa41b510)
    at theoraenc.c:871
#7  0x41044ab9 in gst_pad_chain_unchecked (pad=0xa1f30e0, buffer=0xa41b510)
    at gstpad.c:3459
#8  0x410451ab in gst_pad_push (pad=0xa1f3020, buffer=0xa41b510)
    at gstpad.c:3625
#9  0x004f73c5 in gst_base_transform_chain (pad=0xa1f3560, buffer=0xa41b510)
    at gstbasetransform.c:1606
#10 0x41044ab9 in gst_pad_chain_unchecked (pad=0xa1f3560, buffer=0xa41b510)
    at gstpad.c:3459
#11 0x410451ab in gst_pad_push (pad=0xa1f34a0, buffer=0xa41b510)
    at gstpad.c:3625
#12 0x004f73c5 in gst_base_transform_chain (pad=0xa1f33e0, buffer=0xa41b510)
    at gstbasetransform.c:1606
#13 0x41044ab9 in gst_pad_chain_unchecked (pad=0xa1f33e0, buffer=0xa41b510)
    at gstpad.c:3459
#14 0x410451ab in gst_pad_push (pad=0xa1f3320, buffer=0xa41b510)
    at gstpad.c:3625
#15 0x00152c3a in gst_gdp_depay_chain (pad=0xa1f3260, buffer=0xa46f868)
    at gstgdpdepay.c:315
#16 0x41044ab9 in gst_pad_chain_unchecked (pad=0xa1f3260, buffer=0xa46f868)
    at gstpad.c:3459
#17 0x410451ab in gst_pad_push (pad=0xa1f31a0, buffer=0xa46f868)
    at gstpad.c:3625
#18 0x004f2996 in gst_base_src_loop (pad=0xa1f31a0) at gstbasesrc.c:1601
#19 0x4105dd66 in gst_task_func (task=0xb55d4598, tclass=0xa470698)
    at gsttask.c:192
#20 0x4f75236c in g_thread_pool_thread_proxy (data=0xa219c78)
    at gthreadpool.c:158
#21 0x4f7508ef in g_thread_create_proxy (data=0xa343328) at gthread.c:582
#22 0x4f6e23b6 in start_thread (arg=0xb544fba0) at pthread_create.c:261
#23 0x4f46833e in ?? () from /lib/libc.so.6
(gdb) 
    

Create a bug report about the issue, including the output of gdb.