Fun with ones and zeros - python



Monday, 22 September 2008

bpgsql 2.0 alpha 1

For many years I've been using bpgsql, my own pure-Python PostgreSQL client, and I've finally sat down and got things somewhat polished up enough to put together as a real package.

One thing that motivated the work was the desire to use in with Django - after seeing psycopg2 do some funny things when used under mod_wsgi. There's no doubt it's slower, but it's much easier to hack on, and might be of interest to people running Djano under other Pythons such as PyPy or Jython. Getting it to pass all the Django unittests really ironed out a lot of bugs, so I think it's in fairly decent shape now.

posted at: 14:58 | tags: django postgresql python | 0 comments | permanent link to this entry

Monday, 04 August 2008

amqplib 0.5

Put out a new release of py-amqplib, labeled 0.5, featuring the reworking mentioned earlier of how frames from the server are handled, and a big speed-improvement in receiving messages that was prompted by doing some profiling after reading Initial Queuing Experiments on the Second p0st blog.

posted at: 23:30 | tags: amqp python | 7 comments | permanent link to this entry

Saturday, 08 December 2007

amqplib 0.2

I noticed the other day that my two RabbitMQ servers were consuming more and more memory - one had gone from an initial 22mb size to over 600mb. As I sat and watched it would grow by 4k or so at regular intervals.

I think what had happened is that I had created an exchange which received lots of messages, and then ran scripts that created automatically-named queues bound to that exchange, but defaulted to not auto-deleting them. I ran these scripts many many times, which left many many queues on the server, all swelling up with lots of messages that would never be consumed. Good thing I caught it, it might have eventually killed my server.

This message in the rabbitmq-discuss list gives useful info on how to get in and see what queues exist on a RabbitMQ server, and how big they are.

It seems to me that having the auto_delete parameter of Channel.queue_declare() default to False is a really bad idea. If you want to keep a queue around after your program exits, I think you should explicitly say so, so I changed the default to True. The Channel.exchange_declare() also has a auto_delete parameter, which I also change the default to True for consistency.

I also did some work on supporting the redirect feature of AMQP, where a server you connect to can tell you to go somewhere else, useful for balancing a cluster. I don't actually have a RabbitMQ cluster, so I put together a utility to fake an AMQP server that tells you to redirect. It works well enough to run the uniitests unchanged against it, each test case being redirected from the fake server to the real server.

With those two changes, I put out a 0.2 release, on my software page and on the Cheeseshop.

posted at: 14:34 | tags: amqp python | 0 comments | permanent link to this entry

Monday, 03 December 2007

amqplib 0.1

I broke down and put together a tarball of my Python AMQP library, and stuck it up as a release 0.1 on the software section of this website, under the section py-amqplib.

Interestingly, someone hit the page and downloaded the tarball less than 3 minutes after I dropped a note about it to the RabbitMQ discussion list - so I guess there's at least some interest out there in this sort of thing :)

posted at: 16:06 | tags: amqp python | 0 comments | permanent link to this entry

Wednesday, 14 November 2007

AMQP

For some time I've been using Spread as a messaging bus between processes and machines using Python bindings, but there are a few things that make it not quite ideal for what I've been trying to do.

  • There's no access control
  • Messages are non-persistent - so if a receiver daemon is down and some important message comes through, it's SOL
  • The wire protocol is not documented, the docs basically just say use the C client library.
  • The Python bindings to the C library have a glitch of some sort when used in py-exim-localscan, I had to resort to a small ctypes wrapper to get around this.

I ran across the Advanced Message Queuing Protocol(AMQP), with RabbitMQ as one implementation of the protocol, that looks like a better fit for my needs.

There's a Python client library available named QPID, but there are a few issues with that:

  • Relies on threading, which is trouble when Python is embedded in something else, or if you want to try using it in Stackless Python
  • Lacking documentation
  • Has to load a big AMQP XML Spec file, which takes a few seconds.

I decided to take a whack at my own AMQP client, partially as a learning excercise to learn more about the protocol. I wrote a program to take the AMQP 0-8 spec file and statically generate the skeleton of a Python module, and then fleshed it out by hand. The generator is able to put lots of documentation from the spec file into Python docstrings, so the pydoc of this module is fairly decent. Because the module is statically generated, it should be easier to debug than QPID which generates lots of stuff on-the-fly. It's also much faster at making the first connection because it's not parsing the spec file. I also thew in SSL support in since it wasn't too difficult.

It has a ways to go, and some parts are probably naively conceived, but it does seem to work.

The first thing I've used it for is a syslog->AMQP bridge. I've setup my FreeBSD syslogd to feed all info or higher events to a Python daemon, which extracts the date, time, program name, host name, etc and reformats as an AMQP message and published to a 'syslog' topic exchange with the program name as the routing key.

My plan is then to write other daemons that subscribe to the 'sshd' topic for example, and then generate higher-level messages that say things like: 'block IP address xx.xx.xx.xx' in case of failed login attempts. Then i just need one daemon to listen for these firewall control message and make changes to the PF tables.

It's fun stuff. The only weak part is that there's no way to tell if the original syslog message was spoofed, but after that point, AMQP access controls should keep things trustworthy.

See py-amqplib for a Mercurial repository and eventual downloads.

posted at: 19:38 | tags: amqp python | 0 comments | permanent link to this entry

Sunday, 07 October 2007

Markdown and Pygments

This blog is mainly being written as Markdown text stored in a database, and I thought it would be nice to add the ability to use Pygments to add syntax highlighting to various bits of code within the entries.

There are some DjangoSnippets entries on how to do this, notably #360 which first runs text through Markdown to generate HTML and then BeautifulSoup to extract parts marked up in the original pre-Markdown text as <pre class="foo">...</pre> to be run through Pygments and then re-inserted back into the overall Markdown-generated HTML.

The problem with this is that the text within <pre>...</pre> needs to valid HTML with things like: e_mail='<foo@bar.edu>' escaped as e_mail='&lt;foo@bar.edu>', otherwise BeautifulSoup thinks in that example that you have a screwed up <foo> tag and tries to fix that up.

Making sure all the <, &, and other characters special to HTML are escaped within a large chunk of code misses out on the convenience of using Markdown. I decided to go with an arrangement in which regular Markdown code blocks are used, but if the first line begins with pygments:<lexer>, then that block is pygmentized.

So if I enter something like:

Here is some code

    pygments:python
    if a < b:
        print a

It ends up as:


Here is some code

if a < b:
    print a

What I came up with is this derivative of Snippet #360

from htmlentitydefs import name2codepoint
from HTMLParser import HTMLParser
from markdown import markdown
from BeautifulSoup import BeautifulSoup
from pygments.lexers import LEXERS, get_lexer_by_name
from pygments import highlight
from pygments.formatters import HtmlFormatter

# a tuple of known lexer names
_lexer_names = reduce(lambda a,b: a + b[2], LEXERS.itervalues(), ())

# default formatter
_formatter = HtmlFormatter(cssclass='source')    

class _MyParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.text = []
    def handle_data(self, data):
        self.text.append(data)
    def handle_entityref(self, name):
        self.text.append(unichr(name2codepoint[name]))

def _replace_html_entities(s):
    """
    Replace HTML entities in a string
    with their unicode equivalents.  For
    example, '&amp;' is replaced with just '&'

    """
    mp = _MyParser()
    mp.feed(s)
    mp.close()
    return u''.join(mp.text)  

def markdown_pygment(txt):
    """
    Convert Markdown text to Pygmentized HTML

    """
    html = markdown(txt)
    soup = BeautifulSoup(html)
    dirty = False
    for tag in soup.findAll('pre'):
        if tag.code:
            txt = tag.code.renderContents()
            if txt.startswith('pygments:'):
                lexer_name, txt = txt.split('\n', 1)
                lexer_name = lexer_name.split(':')[1]
                txt = _replace_html_entities(txt)
                if lexer_name in _lexer_names:
                    lexer = get_lexer_by_name(lexer_name, stripnl=True, encoding='UTF-8')
                    tag.replaceWith(highlight(txt, lexer, _formatter))
                    dirty = True
    if dirty:
        html = unicode(soup)

    return html

posted at: 22:11 | tags: python | 0 comments | permanent link to this entry

Monday, 24 September 2007

Stackless Python and Sockets

I've been intrigued by Stackless Python for a while, and finally got around to installing it one one of my machines. FreeBSD doesn't have a port available, so after creating an ezjail to isolate the installation, it was just a matter of fetching and extracting stackless-251-export.tar.bz2 and doing a standard ./configure && make && make install

The installation looks pretty much like a normal Python installation on FreeBSD, with a /usr/local/bin/python binary and libraries in /usr/local/lib/python2.5

Networking is something I especially wanted to check out with Stackless, and the examples on the Stackless website mostly make use of a stacklesssocket.py module which is a separate download. That module has unittests built in as the module's main function, but when running it on my FreeBSD 7.0-CURRENT box, it died with an exception ending in:

File "stacklesssocket.py.ok", line 286, in handle_connect
  self.connectChannel.send(None)
AttributeError: 'NoneType' object has no attribute 'send'

after doing some digging, I found that stacklesssocket.py has a dispatcher class which is a subclass of a class by the same name in Python's asyncore.py module. stacklesssocket.dispatcher.connect() calls asyncore.dispatcher.connect() which may directly call the object's handle_connect() method before returning back to stacklesssocket.dispatcher.connect(). However stacklesssocket.dispatcher.connect() doesn't setup that channel until after the call to asyncore.dispatcher.connect() returns. So when handle_connect() tries to send a message over a channel that doesn't exist yet, an exception is raised.

This trivial patch seems to fix the problem - only sending a message over the channel if it exists (which should only happen if there's another tasklet waiting on it back in a stacklesssocket.dispatcher.connect() method).

--- stacklesssocket.py.orig       2007-09-18 20:58:02.000835000 -0500
+++ stacklesssocket.py  2007-09-18 22:03:13.370709131 -0500
@@ -282,7 +282,7 @@

    # Inform the blocked connect call that the connection has been made.
    def handle_connect(self):
-        if self.socket.type != SOCK_DGRAM:
+        if (self.socket.type != SOCK_DGRAM) and self.connectChannel:
            self.connectChannel.send(None)

    # Asyncore says its done but self.readBuffer may be non-empty

With that patch, the unittests run successfully - at least on my box.

posted at: 12:36 | tags: python | 0 comments | permanent link to this entry

Thursday, 11 January 2007

Going to PyCon 2007

I've got my plane ticket, hotel reservation and conference registration for PyCon 2007 all lined up, so I'll be headed for Texas in 6 weeks.

posted at: 12:43 | tags: pycon python | 0 comments | permanent link to this entry

Sunday, 22 October 2006

Expy Update

I just updated Exim on my home server to 4.63, and built it with my py-exim-localscan (AKA expy) module linked to Python 2.5

Only minor glitch was a C compile warning, that's probably due to better warnings in a newer version of GCC than what I had when the package was originally developed. I fixed it and bundled up a new release - mainly to assure that it's not abandonware.

posted at: 13:20 | tags: python | 0 comments | permanent link to this entry

Monday, 11 September 2006

Django-powered

Haven't posted anything in a while, because I've been redoing this site in Django. Previously I had a photo-gallery written as a direct mod_python app, the software part was Zope 2.x, and this blog was in PyBlosxom.

mod_python is pretty bare-bones (as it should be), and I've been down on Zope for some time now. PyBlosxom was nice, but I've become quite a Django fan, and felt I could do much more with that framework. So I figured it would be good to do a kind of unification - and learn some more Django at the same time.

I'm using Markdown for editing the bodies of blog entries now, and found it was pretty easy to transfer the old PyBlosxom files into Django database records, with Markdown mostly able to handle the HTML I had entered for those old entries with just a few minor tweaks.

The Django URLs were planned so that Apache would be able to rewrite the old PyBlosxom URLs into the new format - so hopefully existing links will still work. URLs for the old feeds should be handled transparently, but I'm omitting the old entries from the feeds because their links had changed, and didn't want them to reappear as new entries for whoever's subscribed to them.

posted at: 11:29 | tags: django python | 0 comments | permanent link to this entry

1 2 Next>>


Feeds for tag: python
Atom Feed
RSS Feed

A Django site.

Valid XHTML 1.0 Transitional