Pyinotify: monitor filesystem events with Python under Linux.

Pyinotify is now hosted at http://github.com/seb-m/pyinotify.

New versions of Pyinotify are released at this new location.
Edited Sun, 30 Dec 2007 21:13

Pyinotify is a Python module for monitoring filesystems changes. Pyinotify relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify. inotify is an event-driven notifier, its notifications are exported from kernel space to user space through three system calls. pyinotify binds these system calls and provides an implementation on top of them offering a generic and abstract way to manipulate those functionalities.

Edited Tue, 29 Jan 2008 02:43
Download the Current Version of Pyinotify on its new site (http://github.com/seb-m/pyinotify): this version is recommended for all recent systems i.e. Python 2.5 (or higher) it contains all the new developments.

Download Old Version 0.7.1 (deprecated): this version doesn't contains the lastest developments. Indeed, it is the last version of Pyinotify that supports Python 2.3 and Python 2.4. The documentation below applies to this specific version (and only to this version).

Edited Sat, 06 May 2006 13:07

To familiarize yourself with pyinotify, run a first example like this:

$ cd pyinotify-x-x-x && python setup.py build
$ python src/pyinotify/pyinotify.py -v my-dir-to-watch

Where my-dir-to-watch is a path leading to a valid directory. Now just go in this directory, play with files: read one, create another,... and compare your actions with the output produced by pyinotify. Enjoy, you have just been watching your first directory :).

Note: if you want to install pyinotify, just type:

$ cd pyinotify-x-x-x && python setup.py install

Read the README file to know where the files are installed.

Edited Tue, 29 Jan 2008 02:37

Browse the generated documentation online, versions 0.7.x to view the whole class hierarchy.

Edited Sun, 26 Nov 2006 10:47

Let's introduce the python namespace through which pyinotify can be accessed and which should help understanding its logic. pyinotify is compounded of two modules which are named inotify and pyinotify:

  • The namespace inotify is a simple raw wrap of inotify (wrapping 3 systems calls, and 3 variables available in /proc/sys/). You'd probably never have/want to directly import (use) this namespace unless you know what you're doing.
  • The namespace pyinotify exposes higher developments made on top of inotify and are the purposes of pyinotify.
Edited Mon, 28 Jan 2008 23:38

Let's start a more detailed example. Say, we want to monitor the temp directory '/tmp' and all its subdirectories for every new file's creation or deletion. For sake of simplicity, we only print messages for every notification on standart output.

Now you have the choice to either receive and process the notifications in the thread who instantiate the monitoring, the main benefit is that it doesn't need to instantiate a new thread, the drawback is to block your program in this task. Or, you don't want to block your main thread, so you can handle the notifications in a new separate thread. Choose which one is the most adapted to your needs and is consistent with your constraints and design choices. Next, we will detail the two approaches:

NotifierThreadedNotifier
Current threadYesNo
Separate threadNoYes

First the import statements: the watch manager stores the watches and provide operations on watches. EventsCodes bring a set of codes, each code is associated to an event. ProcessEvent is the processing class.

import os
from pyinotify import WatchManager, Notifier, ThreadedNotifier, EventsCodes, ProcessEvent

wm = WatchManager()

The following class inherit from ProcessEvent, handle notifications and process defined actions with individual processing methods whose the name is written with the specific syntax: process_EVENT_NAME where EVENT_NAME is the name of the handled event to process.

mask = EventsCodes.IN_DELETE | EventsCodes.IN_CREATE  # watched events

class PTmp(ProcessEvent):
    def process_IN_CREATE(self, event):
        print "Create: %s" %  os.path.join(event.path, event.name)

    def process_IN_DELETE(self, event):
        print "Remove: %s" %  os.path.join(event.path, event.name)

Next, we describe respectively the classes Notifier and ThreadedNotifier:

  • Class Notifier:

    This statement instantiate our notifier class and realizes initializations with in particular the inotify's instantiation. The second parameter is a callable object the one which will be used to process notified events this way: PTmp()(event) where event is the notified event.
    notifier = Notifier(wm, PTmp())
    
    The next statement add a watch on the first parameter and recursively on all its subdirectories, note that symlinks are not followed. The recursion is due to the optional parameter named 'rec' set to True. By default, the monitoring is limited to the level of the given directory. It returns a dict where keys are paths and values are corresponding watch descriptors (wd) and is assigned to wdd. An unique wd is attributed to every new watch. It is useful (and often necessary) to keep those wds for further updating or removing one of those watches, see the dedicated section. Obviously, if the monitored element had been a file, the rec parameter would have been ignored whatever its value.
    wdd = wm.add_watch('/tmp', mask, rec=True)
    
    Let's start reading the events and processing them. Note that during the loop we can freely add, update or remove any watches, we can also do anything we want, even stuff unrelated to pyinotify. We call the stop() method when we want stop monitoring.
    while True:  # loop forever
        try:
            # process the queue of events as explained above
            notifier.process_events()
            if notifier.check_events():
                # read notified events and enqeue them
                notifier.read_events()
            # you can do some tasks here...
        except KeyboardInterrupt:
            # destroy the inotify's instance on this interrupt (stop monitoring)
            notifier.stop()
            break
    
  • Class ThreadedNotifier:

    The second line starts the new thread, doing actually nothing as no directory or file is being monitored.
    notifier = ThreadedNotifier(wm, PTmp())
    notifier.start()
    
    wdd = wm.add_watch('/tmp', mask, rec=True)
    
    At any moment we can for example remove the watch on '/tmp' like that:
    if wdd['/tmp'] > 0:  # test if the wd is valid, this test is not mandatory
       wm.rm_watch(wdd['/tmp'])
    
    Note that its subdirectories (if any) are still being watched. If we wanted to remove '/tmp' and all the watches on its sudirectories, we could have done like that:
    wm.rm_watch(wdd['/tmp'], rec=True)
    
    Or we would have even done better like that:
    wm.rm_watch(wdd.values())
    
    That is, most of the code is written, next, we can add, update or remove watches on files or directories with the same principles. The only remaining important task is to stop the thread when we wish stop monitoring, it will automatically destroy the inotify's instance. Call the following method:
    notifier.stop()
    
Edited Sun, 26 Nov 2006 10:53
Event NameIs an EventDescription
IN_ACCESSYesfile was accessed.
IN_ATTRIBYesmetadata changed.
IN_CLOSE_NOWRITEYesunwrittable file was closed.
IN_CLOSE_WRITEYeswrittable file was closed.
IN_CREATEYesfile/dir was created in watched directory.
IN_DELETEYesfile/dir was deleted in watched directory.
IN_DELETE_SELFYeswatched item itself was deleted.
IN_DONT_FOLLOWNodon't follow a symlink (lk 2.6.15).
IN_IGNOREDYesraised on watched item removing. Probably useless for you, prefer instead IN_DELETE*.
IN_ISDIRNoevent occurred against directory. It is always piggybacked to an event. The Event structure automatically provide this information (via .is_dir)
IN_MASK_ADDNoto update a mask without overwriting the previous value (lk 2.6.14). Useful when updating a watch.
IN_MODIFYYesfile was modified.
IN_MOVE_SELFYeswatched item itself was moved, currently its full pathname destination can only be traced if its source directory and destination directory are both watched. Otherwise, the file is still being watched but you cannot rely anymore on the given path (.path)
IN_MOVED_FROMYesfile/dir in a watched dir was moved from X. Can trace the full move of an item when IN_MOVED_TO is available too, in this case if the moved item is itself watched, its path will be updated (see IN_MOVE_SELF).
IN_MOVED_TOYesfile/dir was moved to Y in a watched dir (see IN_MOVE_FROM).
IN_ONLYDIRNoonly watch the path if it is a directory (lk 2.6.15). Usable when calling .add_watch.
IN_OPENYesfile was opened.
IN_Q_OVERFLOWYesevent queued overflowed. This event doesn't belongs to any particular watch.
IN_UNMOUNTYesbacking fs was unmounted. Notified to all watches located on this fs.
Edited Sun, 26 Nov 2006 10:58

Each instance contains all the useful informations about the observed event. However, the presence of each field depends on the type of event. In effect, some fields are irrelevant for some kind of event (for example cookie is meaningless for IN_CREATE whereas it is useful to IN_MOVE_TO).

Each raised event will be dispatched to one appropriate processing method (according to its type), in which it can takes actions in response to this event. The possible fields are:

wd (int): is the Watch Descriptor, it is an unique identifier who represents the watched item through which this event could be observed.
path (str): is the complete path of the watched item as given in parameter to the method .add_watch.
name (str): is not None only if the watched item is a directory, and if the current event has occurred against an element included in that directory.
mask (int): is a bitmask of events, it carries all the types of events watched on wd.
event_name (str): readable event name.
is_dir (bool): is a boolean flag set to True if the event has occurred against a directory.
cookie (int): is a unique identifier permitting to tie together two related 'moved to' and 'moved from' events.
Edited Sun, 26 Nov 2006 11:05

We can obtain our own handling and processing implementation by subclassing ProcessEvent, and defining appropriate methods. See the commented example below:

class MyProcessing(ProcessEvent):
    def __init__(self):
        """
        Does nothing in this case, but you can as well implement this constructor
        and you don't need to explicitely call its base class constructor.
        """
        pass

    def process_IN_DELETE(event):
        """
        This method process a specific kind of event (IN_DELETE). event
        is an instance of Event.
        """
        print '%s: deleted' % os.path.join(event.path, event.name)

    def process_IN_CLOSE(event):
        """
        This method is called for these events: IN_CLOSE_WRITE,
        IN_CLOSE_NOWRITE.
        """
        print '%s: closed' % os.path.join(event.path, event.name)

    def process_default(event):
        """
        Ultimately, this method is called for all others kind of events.
        This method can be used when similar processing can be applied
        to various events.
        """
        print 'default processing'

Explanations and details:

  • IN_DELETE have its own method providing a specific treatment. We associate an individual processing method by providing a method whose the name is written with the specific syntax: process_EVENT_NAME where EVENT_NAME is the name of the handled event to process. For the sake of simplicity, our two methods are very basics they only print messages on standart output:
  • There are related events which needs most of the time the same treatment. It would be annoying to have to implement two times the same code. In this case we can define a common method. For example we want to share the same method for these two related events:
    mask = EventsCodes.IN_CLOSE_WRITE | EventsCodes.IN_CLOSE_NOWRITE
    Then it's enough to provide a single processing method named process_IN_CLOSE according to the general syntax process_IN_FAMILYBASENAME. The two previous events will be processed by this method. In this case, beware to not implement process_IN_CLOSE_WRITE or process_IN_CLOSE_NOWRITE, because these methods have an higher precedence (see below), thereby are looked first and would have been called instead of process_IN_CLOSE (for a complete example see: src/examples/close.py).
  • It only makes sense to define process_IN_Q_OVERFLOW when its class instance is given to Notifier, indeed it could never be called from a processed object associated to a watch, because this event isn't associated to any watch.
  • EventsCodes.ALL_EVENTS isn't an event by itself, that means that you don't have to implement the method process_ALL_EVENTS (even worst it would be wrong to define this method), this is just an alias to tell the kernel we want to be notified for all kind of events on a given watch. The kernel raises individual events (with the IN_ISDIR flag if necessary). Instead, if we need to apply the same actions whatever the kind of event, we should implement a process_default method (for a complete example see: src/examples/simple.py).
  • Processing methods lookup's order (ordered by increasing order of priority): specialized method (ex: process_IN_CLOSE_WRITE) first, then family method (ex: process_IN_CLOSE), then default method (process_default).
  • One more thing: say you redifine the method process_default which contains the instruction os.ismount(my-mount-point), it would be for example a mistake having this method called for every event IN_OPEN occurred in /etc. Because, one particularity of os.ismount is to check in /etc/mtab if the partition is mounted, so we could easily imagine the kind of endless situation: call process_IN_OPEN, open /etc/mtab, call process_IN_OPEN, open /etc/mtab ... loop forever.

Whenever possible you should process your notifications this way, with a single processing object. It is easy to imagine the benefits to have to deal with only one instance (efficiency, data sharing,...):

notifier = Notifier(wm, MyProcessing())

But, some watches might need a different kind of processing, you can attach them an instance which will be called only on their associated watch:

mask = EventsCodes.ALL_EVENTS
wm.add_watch('/one/path', mask, proc_fun=MyProcessing())
Edited Sat, 06 May 2006 13:07
Notifier(watch_manager, default_proc_fun=ProcessEvent())
Read notifications, process events.
watch_manager is an instance of WatchManager.
default_proc_fun is an instance of ProcessEvent or one of its subclasses.
check_events(timeout=4) => None
Check for new events available to read.
timeout (int): timeout passed on to select.select().
process_events() => None
Routine for processing events from queue by calling their associated processing function (instance of ProcessEvent or one of its subclasses).
read_events() => None
Read events from device and enqueue them, waiting to be processed.
stop() => None
Stop the notifications.
Edited Sat, 06 May 2006 13:07
ThreadedNotifier(watch_manager, default_proc_fun=ProcessEvent())
This notifier inherits from threading.Thread and from Notifier, instantiating a separate thread, and providing standart Notifier functionalities. This is a threaded version of Notifier.
watch_manager is an instance of WatchManager.
default_proc_fun is an instance of ProcessEvent or one of its subclasses.
inherits all the methods of Notifier but override the stop() method.
start() => None
Start the new thread, start events notifications.
stop() => None
Stop the thread, stop the notifications.
Edited Fri, 24 Nov 2006 17:24
Watch(wd, path, mask, proc_fun, auto_add)
Represent a watch, i.e. a file or directory being watched.
wd (int): Watch Descriptor.
path (str): Path of the file or directory being watched.
mask (int): Mask.
proc_fun (ProcessEvent): Processing object.
auto_add (bool): Automatically add watches on creation of directories.
Edited Fri, 24 Nov 2006 17:26
WatchManager()
The Watch Manager let the client add a new watch, store the active watches, and provide operations on these watches.
add_watch(path, mask, proc_fun=None, rec=False, auto_add=False) => dict
Add watch(s) on given path(s) with the specified mask.
path (str or list of str): Path(s) to watch, the path can either be a file or a directory.
mask (int): Bitmask of events.
proc_fun (ProcessEvent): Processing object (must be callable). Will be called if provided, otherwise, notifier.default_proc_fun will be called.
rec (bool): Recursively add watches on the given path and on all its subdirectories.
auto_add (bool): Automatically add watches on newly created directories in the watch's path.
update_watch(wd, mask=None, proc_fun=None, rec=False, auto_add=False) => dict
Update existing watch(s). All these parameters are updatable.
rm_watch(wd, rec=False) => dict
Remove watch(s).
get_wd(path) => int
Return the watch descriptor associated to path.
get_path(wd) => str
Return the path associated to wd, if wd is invalid, None is returned.
Edited Fri, 24 Nov 2006 17:26

Question: among these methods, which one must be called with a string path and which ones with a watch descriptor?

This question should be fairly simple to answer, but it's worth clarifying it once time for all, with a simple table. This table specifies the kind of parameter accepted by each method:

Parameter Returned result Example
add_watch path (or list of paths) {path1: wd1, path2: wd2, ...}
Where wdx is the watch descriptor associated to pathx, and is positive on success.

ra = notifier.add_watch('/a-dir', mask)
if ra['/a-dir'] > 0: print "added"

update_watch wd (or list of wds) {wd1: success, wd2: success, ...}
Where success is True if the op on wdx succeeded, False otherwise.

ru = notifier.update_watch(ra['/a-dir'], new_mask)
if ru['/a-dir']: print "updated"

rm_watch wd (or list of wds) {wd1: success, wd2: success, ...}
Where success is True if the op on wdx succeeded, False otherwise.

rr = notifier.rm_watch(ra['/a-dir'])
if rr['/a-dir']: print "deleted"

The methods updating or removing a watch only take watch descriptors and return a dictionary notifying the success or failure of operations.

In extreme case if your parameter doesn't fit the expected format, which can happens if you lost previous returned values, you can use the methods get_wd(a_path) and its counterpart get_path(a_wd). The former takes a path and returns its wd, the latter takes a wd and returns its path. Caution: in worst case get_wd(a_path) will have to iterate the whole list of watches, thus the cost can be high. Whenever it is possible avoid this method.

Edited Sat, 06 May 2006 13:07

For more detailed messages from pyinotify, turn-on the VERBOSE variable located in src/pyinotify.py.

Edited Thu, 11 May 2006 11:26

Either as introduction to inotify or as means to dive into pyinotify, there are some interesting readings worth to be mentioned:

  • Read this introduction to inotify written by one of its author Robert Love.
  • Obviously, read the pyinotify's documentation as specified above.
  • If you want to write code beyond the basic example, read the python files in the example directory src/examples/*.py and the tests in src/tests/*.py provided with pyinotify.
  • And finally, you can directly read parts of the src/pyinotify/pyinotify.py source code, its size is relatively short and readable (thanks to Python :) ).
-- seb |at| dbzteam |dot| org