2014/A Few Python Tips

From Open Source Bridge Wiki
Jump to: navigation, search

Nothing fancy here, just several tips that help you work effectively with Python.

Speaker: Sumana Harihareswara. This presentation is licensed CC BY; please feel free to reuse it at your company or conference, as long as you credit Harihareswara.

Return to this session's details

Contributed notes

Thank you for coming.

This is "A Few Python Tips". I am presenting a few tips to help you work effectively with Python. This talk is aimed at experienced programmers who are accustomed to other scripting languages, and to new programmers for whom Python is their first language.

This talk will *not* discuss: programming basics, why you might choose to work in Python, web frameworks such as Django and Flask, Wikimedia Python projects such as Wikimetrics and pywiki and mwclient and mwparserfromhell, or scientific computing distributions such as scipy and numpy.

I am aiming for breadth rather than depth, because the point here is to give you a little taste of a lot of useful stuff, and show you just enough to see why I use what I use. So I'm not going to go into a lot of detail for any one thing. And of course all my notes and the links are available on mediawiki.org so you can follow up.

I'm going to cover four big topics:

  • Initial setup (some tips)
  • Things to try while debugging
  • Code style tools
  • Favorite modules


Setup

2 or 3? So, you have a choice of writing code in Python 2 or Python 3. Python 3 came out in 2008 and everyone's really slowly transitioning to Python 3. If you are writing new code, if possible, I advise you to write it in Python 3. But there are some libraries and dependencies out there that aren't compatible with Python 3 yet, but that's less and less of a problem over time.

The examples I'm going to give you are Python 2 just because that is still the default on most systems when you run Python.

Once you've installed Python, the next thing you should do is install the package manager, "pip". I personally find it annoying that you have to do this separately, that it doesn't just come with Python. They're fixing that in the new version and pip will just come with Python. Installation instructions are here

http://pip.readthedocs.org/en/latest/installing.html

If you're contributing to MediaWiki right now you probably already have pip installed because it's part of the process for getting git-review set up. https://www.mediawiki.org/wiki/Gerrit/Tutorial#Prepare_to_work_with_gerrit

pip is your interface to the Python Package Index, also known as PyPI. That's where most of the Python ecology lives, in terms of reusable Python code.

I can show you what I have installed via

   pip freeze

Let me show you me installing something via pip.

   sudo pip install simplejson


So you can see that I have a lot of stuff here. And it's great that I have access to all those modules, but this causes two problems. One is, what if I start working on two projects that require different versions of a particular module? Like, one of your codebases needs pytwitter version 1, and the other needs pytwitter version 2. And second, how do I keep track of, and easily tell someone else, what dependencies a project requires?

So we use virtual environments. Sometimes you'll hear people call them virtualenvs or venvs.

For installation instructions, go to http://docs.python-guide.org/en/latest/dev/virtualenvs/#basic-usage

Within a virtual environment, you can just pip install something, with no sudo, and then it's only in that environment.

I use "virtualenvwrapper" which provides a little syntactic sugar and tracks where all the virtual environments are. See http://virtualenvwrapper.readthedocs.org/en/latest/install.html to install that. So with virtualenv and virtualenvwrapper installed, I can:

   mkvirtualenv foo
   pip freeze
   pip install simplejson
   pip freeze
   pip uninstall simplejson
   deactivate
   workon [tab]
   rmvirtualenv foo


And the last bit of setup I'll talk about is the interpreter, the command-line environment where you can type one line of code at a time and see what Python does with it. You'll also hear it called a REPL, Read-Evaluate-Print Loop. When you install Python it comes with the stock interpreter.


    python


  a = 2
  a
  def foo():
      print("yay")
  foo()
  help(a)

And just like on the command line, you can easily go back to something you've done in the past:

  [up arrow]
  blah Control-C
  [up arrow]
  Control-R   a =
  quit()


But I also want to mention two other interpreters to you. One is especially good for sharing your work, and it's called IPython. You install it with

   pip install ipython
and then you activate it on the command line with
   ipython

The big reason to use IPython is so that you can make IPython Notebooks, which are web-based presentations that let you show your code and the output. I'm not going to give instructions here on how to install and run that, but I'll show you an example from a Wikimedia volunteer, Merlijn van Deen:

  http://nbviewer.ipython.org/gist/valhallasw/7723637


And the third REPL is great for exploring new stuff. It's called bpython. You install it using your operating system's package manager, like aptitude on Debian.

   bpython


   a = "wiki"
   a.     [tab]      (

It pops up information about what you can do.


   import s  [tab]

So, if I'm trying out a new library:

   import mwclient
   mwclient.
   mwclient.Site( 

And it lets you undo something you've just done.


   b = 2
   b = "mag"
   Control-R
   b

so I like it for exploring.


Things to try while debugging

A lot of people don't know about the -i option when you run a Python script. Basically, it runs the script, and then it dumps you into an interactive session with the Python interpreter, at the state it was in at the end of the script. So you can dig around and see what values are stored.

  less interaction-example.py
  python -i interaction-example.py
  dir()
  toprint

python -i spits you out after the script has finished running, or crashed. But maybe you want to see what happens just before it crashes. Or you want to be able to sketch, figure out what you want to happen next, just before it crashes. So another thing you can do is use pdb, the Python debugger. If you've ever set breakpoints in a program so that you can step through it, then this will be familiar to you.

   less demo-of-pdb.py
   python demo-of-pdb.py

See where I say set_trace ? That's what sets the breakpoint.

The pdb environment is different from the regular Python interpreter. It lets you step through lines, or continue to the next breakpoint, or evaluate expressions.


   s


   s
   print a
   c
   c

The most common thing you'll do is set traces. But there's a lot more you can do. There was a talk at PyCon http://pyvideo.org/video/2673/in-depth-pdb and there's a reference manual at https://docs.python.org/2/library/pdb.html .

One more debugging tip is, when you're just starting out, you may want to know what you've actually defined, and whether or not something is in your path, so Python knows how to get to it.

  bpython
  dir()
  a = 1
  dir()

So you can see that if you have successfully defined a variable, it shows up in the list of strings in the current scope. I called it here so it's returning what's in the global scope.

  import sys
  dir()
  sys.path

So if you've tried to install some Python library but it doesn't show up in sys.path, then that's something you can follow up on.


A few favorite modules

random!

   bpython
   import random
   a = range(1,15)
   a

random.choice just gives you a random choice from a list.

   random.choice(a)

And since you're so often doing something like this, you don't even have to do that initial step of creating the range.

   random.randrange(15)

random.sample gives you k unique items from a list.

   random.sample(a, 4)

And if you just want a random float between 0 and 1, there's random.random().

   random.random()


requests! AKA, HTTP for Humans. Basically, if you're making basic API requests, to web sites, over HTTP, you should NOT use the built-in "urllib2" part of Python's standard library, because it's going to make you write a lot of boilerplate code that you shouldn't have to write. requests takes care of that for you.

Let's take a look at https://github.com/fhocutt/obscure-enwiki-fact/blob/master/obscurefact.py .

And when I run it:

   import obscurefact

If you have a .py file in your path, then you can import it with "import nameoffile".

   obscurefact.wikipediarecentchange()


codecs! UTF-8. In Python 3, all strings are Unicode, so you won't have to worry about this, but in Python 2, you will run into a zillion headaches over UTF-8 and ASCII conversion. So use the codecs module. https://docs.python.org/2/library/codecs.html

   cd ~/test/python-play/missing-from-wikipedia
   bpython
   import codecs
   with open("namelist-sample.txt","r") as f:
       namelist = [line.strip('\n') for line in f]
       print namelist


   with codecs.open("namelist-sample.txt", encoding='utf-8') as f:
       namelist = [line.strip('\n') for line in f]
       print namelist

Optional add-ons

In case we have time

Unit tests

Unit tests! Let's go to https://github.com/brainwane/obamaspeech/blob/master/speech-tests.py and show you.


And you can check how much code coverage you have with the "coverage" tool. http://nedbatchelder.com/code/coverage/

   cd ~/test/python-play/obamaspeech
   coverage run speech-tests.py
   coverage report

And coverage can give you an HTML version.

   coverage html
   file:///home/sumanah/test/python-play/obamaspeech/htmlcov/index.html


Finally, UTF-8. In Python 3, all strings are Unicode, so you won't have to worry about this, but in Python 2, you will run into a zillion headaches over UTF-8 and ASCII conversion. So use the codecs module. https://docs.python.org/2/library/codecs.html

   cd ~/test/python-play/missing-from-wikipedia
   bpython
   import codecs
   with open("namelist-sample.txt","r") as f:
       namelist = [line.strip('\n') for line in f]
       print namelist


   with codecs.open("namelist-sample.txt", encoding='utf-8') as f:
       namelist = [line.strip('\n') for line in f]
       print namelist


Style

The standard for how you should style your Python code - spaces and comments and naming conventions - is called PEP 8, Python Enhancement Proposal 8. http://legacy.python.org/dev/peps/pep-0008/ And you can automatically check whether your code complies with PEP 8.

    python /usr/local/lib/python2.7/dist-packages/pep8.py ~/test/python-play/interaction-example.py
   python /usr/local/lib/python2.7/dist-packages/pep8.py ~/test/python-play/datr.py

There's also a module you can install called pep8ify that actually tells you what change to make.

   pep8ify ~/test/python-play/interaction-example.py
   pep8ify -w ~/test/python-play/interaction-example.py

My own experience and another useful link are at http://www.harihareswara.net/sumana/2013/11/01/0

sample code

interaction-example.py:

#!/usr/bin/python
import random
orignum = range(1,10)
toprint = random.choice(orignum)
print(toprint)


demo-of-pdb.py:

#!/usr/bin/python
import pdb
def foo():
return "test"
pdb.set_trace()
a = 1
b = 2
print a
print b
pdb.set_trace()
print spam


requests example: https://github.com/fhocutt/obscure-enwiki-fact/blob/master/obscurefact.py

unittests example: https://github.com/brainwane/obamaspeech/blob/master/speech-tests.py

Example of why foo function only shows In/Out in some situations: https://gist.github.com/jayofdoom/6375e4ff023cd83c4d7e