Short (and useful) python snippets

Question

In spirit of the existing “what’s your most useful C/C++ snippet” – thread:

Do you guys have short, monofunctional Python snippets that you use (often) and would like to share with the StackOverlow Community? Please keep the entries small (under 25
lines maybe?) and give only one example per post.

I’ll start of with a short snippet i use from time to time to count sloc (source lines of code) in python projects:

# prints recursive count of lines of python source code from current directory
# includes an ignore_list. also prints total sloc

import os
cur_path = os.getcwd()
ignore_set = set(["__init__.py", "count_sourcelines.py"])

loclist = []

for pydir, _, pyfiles in os.walk(cur_path):
    for pyfile in pyfiles:
        if pyfile.endswith(".py") and pyfile not in ignore_set:
            totalpath = os.path.join(pydir, pyfile)
            loclist.append( ( len(open(totalpath, "r").read().splitlines()),
                               totalpath.split(cur_path)[1]) )

for linenumbercount, filename in loclist: 
    print "%05d lines in %s" % (linenumbercount, filename)

print "nTotal: %s lines (%s)" %(sum([x[0] for x in loclist]), cur_path)

Asked By: ChristopheD

||

Source

Answer 1

Hardlink identical files in current directory (on unix, this means they have share physical storage, meaning much less space):

import os
import hashlib

dupes = {}

for path, dirs, files in os.walk(os.getcwd()):
    for file in files:
        filename = os.path.join(path, file)
        hash = hashlib.sha1(open(filename).read()).hexdigest()
        if hash in dupes:
            print 'linking "%s" -> "%s"' % (dupes[hash], filename)
            os.rename(filename, filename + '.bak')
            try:
                os.link(dupes[hash], filename)
                os.unlink(filename + '.bak')
            except:
                os.rename(filename + '.bak', filename)
            finally:
        else:
            dupes[hash] = filename

Answered By: rmmh

Answer 2

I like this one to zip everything up in a directory. Hotkey it for instabackups!

import zipfile

z = zipfile.ZipFile('my-archive.zip', 'w', zipfile.ZIP_DEFLATED)
startdir = "/home/johnf"
for dirpath, dirnames, filenames in os.walk(startdir):
  for filename in filenames:
    z.write(os.path.join(dirpath, filename))
z.close()

Answered By: John Feminella

Answer 3

The only ‘trick’ I know that really wowed me when I learned it is enumerate. It allows you to have access to the indexes of the elements within a for loop.

>>> l = ['a','b','c','d','e','f']
>>> for (index,value) in enumerate(l):
...     print index, value
... 
0 a
1 b
2 c
3 d
4 e
5 f

Answered By: theycallmemorty

Answer 4

Initializing a 2D list

While this can be done safely to initialize a list:

lst = [0] * 3

The same trick won’t work for a 2D list (list of lists):

>>> lst_2d = [[0] * 3] * 3
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [5, 0, 0], [5, 0, 0]]

The operator * duplicates its operands, and duplicated lists constructed with [] point to the same list. The correct way to do this is:

>>> lst_2d = [[0] * 3 for i in xrange(3)]
>>> lst_2d
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> lst_2d[0][0] = 5
>>> lst_2d
[[5, 0, 0], [0, 0, 0], [0, 0, 0]]

Answered By: Eli Bendersky

Answer 5

To find out if line is empty (i.e. either size 0 or contains only whitespace), use the string method strip in a condition, as follows:

if not line.strip():    # if line is empty
    continue            # skip it

Answered By: Eli Bendersky

Answer 6

Suppose you have a list of items, and you want a dictionary with these items as the keys. Use fromkeys:

>>> items = ['a', 'b', 'c', 'd']
>>> idict = dict().fromkeys(items, 0)
>>> idict
{'a': 0, 'c': 0, 'b': 0, 'd': 0}
>>>

The second argument of fromkeys is the value to be granted to all the newly created keys.

Answered By: Eli Bendersky

Answer 7

zip(*iterable) transposes an iterable.

>>> a=[[1,2,3],[4,5,6]]
>>> zip(*a)
    [(1, 4), (2, 5), (3, 6)]

It’s also useful with dicts.

>>> d={"a":1,"b":2,"c":3}
>>> zip(*d.iteritems())
[('a', 'c', 'b'), (1, 3, 2)]

Answered By: AKX

Answer 8

For list comprehensions that need current, next:

[fun(curr,next) 
 for curr,next 
 in zip(list,list[1:].append(None)) 
 if condition(curr,next)]

For circular list zip(list,list[1:].append(list[0])).

For previous, current: zip([None].extend(list[:-1]),list) circular: zip([list[-1]].extend(list[:-1]),list)

Answered By: vartec

Answer 9

Huge speedup for nested list and dictionaries with:

deepcopy = lambda x: cPickle.loads(cPickle.dumps(x))

Answered By: vartec

Answer 10

To flatten a list of lists, such as

[['a', 'b'], ['c'], ['d', 'e', 'f']]

into

['a', 'b', 'c', 'd', 'e', 'f']

use

[inner
    for outer in the_list
        for inner in outer]

Answered By: George V. Reilly

Answer 11

I like using any and a generator:

if any(pred(x.item) for x in sequence):
    ...

instead of code written like this:

found = False
for x in sequence:
    if pred(x.n):
        found = True
if found:
    ...

I first learned of this technique from a Peter Norvig article.

Answered By: Jacob Gabrielson

Answer 12

For Python 2.4+ or earlier:

for x,y in someIterator:
  listDict.setdefault(x,[]).append(y)

In Python 2.5+ there is alternative using defaultdict.

Answered By: vartec

Answer 13

A custom list that when multiplied by other list returns a cartesian product… the good thing is that the cartesian product is indexable, not like that of itertools.product (but the multiplicands must be sequences, not iterators).

import operator

class mylist(list):
    def __getitem__(self, args):
        if type(args) is tuple:
            return [list.__getitem__(self, i) for i in args]
        else:
            return list.__getitem__(self, args)
    def __mul__(self, args):
        seqattrs = ("__getitem__", "__iter__", "__len__")
        if all(hasattr(args, i) for i in seqattrs):
            return cartesian_product(self, args)
        else:
            return list.__mul__(self, args)
    def __imul__(self, args):
        return __mul__(self, args)
    def __rmul__(self, args):
        return __mul__(args, self)
    def __pow__(self, n):
        return cartesian_product(*((self,)*n))
    def __rpow__(self, n):
        return cartesian_product(*((self,)*n))

class cartesian_product:
    def __init__(self, *args):
        self.elements = args
    def __len__(self):
        return reduce(operator.mul, map(len, self.elements))
    def __getitem__(self, n):
        return [e[i] for e, i  in zip(self.elements,self.get_indices(n))]
    def get_indices(self, n):
        sizes = map(len, self.elements)
        tmp = [0]*len(sizes)
        i = -1
        for w in reversed(sizes):
            tmp[i] = n % w
            n /= w
            i -= 1
        return tmp
    def __add__(self, arg):
        return mylist(map(None, self)+mylist(map(None, arg)))
    def __imul__(self, args):
        return mylist(self)*mylist(args)
    def __rmul__(self, args):
        return mylist(args)*mylist(self)
    def __mul__(self, args):
        if isinstance(args, cartesian_product):
            return cartesian_product(*(self.elements+args.elements))
        else:
            return cartesian_product(*(self.elements+(args,)))
    def __iter__(self):
        for i in xrange(len(self)):
            yield self[i]
    def __str__(self):
        return "[" + ",".join(str(i) for i in self) +"]"
    def __repr__(self):
        return "*".join(map(repr, self.elements))

Answered By: fortran

Answer 14

A “progress bar” that looks like:

|#############################---------------------|
59 percent done

Code:

class ProgressBar():
    def __init__(self, width=50):
        self.pointer = 0
        self.width = width

    def __call__(self,x):
         # x in percent
         self.pointer = int(self.width*(x/100.0))
         return "|" + "#"*self.pointer + "-"*(self.width-self.pointer)+
                "|n %d percent done" % int(x)

Test function (for windows system, change “clear” into “CLS”):

if __name__ == '__main__':
    import time, os
    pb = ProgressBar()
    for i in range(101):
        os.system('clear')
        print pb(i)
        time.sleep(0.1)

Answered By: Theodor

Answer 15

Emulating a switch statement. For example switch(x) {..}:

def a():
  print "a"

def b():
  print "b"

def default():
   print "default"

apply({1:a, 2:b}.get(x, default))

Answered By: GabiMe

Answer 16

like another person above, I said ‘Wooww !!’ when I discovered enumerate()
I sang a praise to Python when I discovered repr() that gave me possibility to see precisely the content of strings that I wanted to analyse with a regex
I was very satisfied to discover that print 'n'.join(list_of_strings) is displayed much more rapidly with ‘n’.join(…) than for ch in list_of_strings: print ch
splitlines(1) with an argument keeps the newlines

These four “tricks” combined in one snippet very useful to rapidly display the code source of a web page , line after line, each line being numbered , all the special characters like ‘t’ or newlines being not interpreted, and with the newlines present:

import urllib
from time import clock,sleep

sock = urllib.urlopen('http://docs.python.org/')
ch = sock.read()
sock.close()


te = clock()
for i,line in enumerate(ch.splitlines(1)):
    print str(i) + ' ' + repr(line)
t1 = clock() - te


print "nnIn 3 seconds, I will print the same content, using '\n'.join(....)n" 

sleep(3)

te = clock()
# here's the point of interest:
print 'n'.join(str(i) + ' ' + repr(line)
                for i,line in enumerate(ch.splitlines(1)) )
t2 = clock() - te

print 'n'
print 'first  display took',t1,'seconds'
print 'second display took',t2,'seconds'

With my not very fast computer, I got:

first  display took 4.94626048841 seconds
second display took 0.109297410704 seconds

Answered By: eyquem

Answer 17

Here are few which I think are worth knowing but might not be useful on an everyday basis.
Most of them are one liners.

Removing Duplicates from a List

L = list(set(L))

Getting Integers from a string (space seperated)

ints = [int(x) for x in S.split()]

Finding Factorial

fac=lambda(n):reduce(int.__mul__,range(1,n+1),1)

Finding greatest common divisor

>>> def gcd(a,b):
...     while(b):a,b=b,a%b
...     return a

Answered By: jack_carver

Answer 18

import tempfile
import cPickle

class DiskFifo:
    """A disk based FIFO which can be iterated, appended and extended in an interleaved way"""
    def __init__(self):
        self.fd = tempfile.TemporaryFile()
        self.wpos = 0
        self.rpos = 0
        self.pickler = cPickle.Pickler(self.fd)
        self.unpickler = cPickle.Unpickler(self.fd)
        self.size = 0

    def __len__(self):
        return self.size

    def extend(self, sequence):
        map(self.append, sequence)

    def append(self, x):
        self.fd.seek(self.wpos)
        self.pickler.clear_memo()
        self.pickler.dump(x)
        self.wpos = self.fd.tell()
        self.size = self.size + 1

    def next(self):
        try:
            self.fd.seek(self.rpos)
            x = self.unpickler.load()
            self.rpos = self.fd.tell()
            return x

        except EOFError:
            raise StopIteration

    def __iter__(self):
        self.rpos = 0
        return self

Answered By: piotr

Answer 19

Fire up a simple web server for files in the current directory:

python -m SimpleHTTPServer

Useful for sharing files.

Answered By: Adam Lehenbauer

Answer 20

I actually just created this, but I think it’s going to be a very useful debugging tool.

def dirValues(instance, all=False):
    retVal = {}
    for prop in dir(instance):
        if not all and prop[1] == "_":
            continue
        retVal[prop] = getattr(instance, prop)
    return retVal

I usually use dir() in a pdb context, but I think this will be much more useful:

(pdb) from pprint import pprint as pp
(pdb) from myUtils import dirValues
(pdb) pp(dirValues(someInstance))

Answered By: Josh Russo

Answer 21

Iterate over any iterable (list, set, file, stream, strings, whatever), of ANY size (including unknown size), by chunks of x elements:

from itertools import chain, islice

def chunks(iterable, size, format=iter):
    it = iter(iterable)
    while True:
        yield format(chain((it.next(),), islice(it, size - 1)))

>>> l = ["a", "b", "c", "d", "e", "f", "g"]
>>> for chunk in chunks(l, 3, tuple):
...         print chunk
...     
("a", "b", "c")
("d", "e", "f")
("g",)

Answered By: e-satis

Answer 22

When debugging, you sometimes want to see a string with a basic editor. For showing a string with notepad:

import os, tempfile, subprocess

def get_rand_filename(dir_=os.getcwd()):
    "Function returns a non-existent random filename."
    return tempfile.mkstemp('.tmp', '', dir_)[1]

def open_with_notepad(s):
    "Function gets a string and shows it on notepad"
    with open(get_rand_filename(), 'w') as f:
        f.write(s)
        subprocess.Popen(['notepad', f.name])

Answered By: iTayb

Short (and useful) python snippets

Question:

Answers: