Of rows and columns…

Lets say you have a 2 × 5 matrix.


Using the traditional way of specifying matrix sizes, that’d mean you had a matrix with 2 rows and 5 columns. Okay, fine.

But what if you had a 2 × 3 × 4 matrix?

In python with numpy, this might look something like (with a ton of whitespace added for clarity):

arr = numpy.array([

The way that we’re specifying the array in numpy actually helps add some clarity here. The 2 × 3 × 4 matrix is really just a list of lists of lists of elements.

I thought you said it added clarity?

Well, it does. Rows are made up of elements, 2D matrices are made up of rows, and 3D matrices made up of 2D matrices.

Anyway, the point of this post is, what do you call the measure of that third dimension? If the measure of the first dimension (the row) is called “columns”, the measure of the second dimension “rows”, what should the measure of the 3rd dimension be called?

This thread in the /r/math subreddit makes the following suggestions, in order of upvotes:

  • Aisles
  • Pages
  • Slices
  • Tensors
  • Layers
  • Depth(s?)

I’m not a huge fan of “aisles”, despite it getting the most upvotes. It just doesn’t seem intuitive enough. When I first think about an aisle, I’m thinking about the narrow passage ways in the stores you walk down, not the sides of those that contain the product (which is where I think that commenter was headed.) That could just be me, however.

I personally also rule out “slices” and “tensors” out, slices having somewhat specific meanings in python, and tensors having specific meanings in linear algebra and mechanics (which don’t align completely to the concept of the measure of the 3rd dimension).

I also rule out “depth”, if only because its not easily pluralized like row → rows, column → columns. A phrase like “number of columns” has some issues when you move to “number of depths”. That being said, the “depth” of a 3D matrix makes a lot of sense to me. Just as you might gain a “depth” perception when you go from a “flat” 2D Cartesian coordinate system to a 3D one.

So to that end, “layers” and “pages” both fit my intuitive concept of the measure of the 3rd dimension. You have a stack of “pages”, each with a 2D matrix printed on it — fine.

Between the two, I think “layers” is probably the better term — it seems more general.

However, there is an interesting “natural” relation that springs up if you go with “pages”. Consider:

  • Columns
  • Rows
  • Pages
  • Books
  • Series

“Series” is pretty weak, I admit, but the first four degrees are pretty solid. Nevertheless, I’m still going with “layers”.

Leave the first comment

Gist – simple __repr__() helper

Consider the following sample code:

class Foo(object):
    def __init__(self, fid, name, bar):
        self.fid    = fid
        self.name   = name
        self.bar    = bar

f1 = Foo(1, "Red",   [1,2,3])
f2 = Foo(2, "Green", [4,5,6])
f3 = Foo(3, "Blue",  [7,8,9])

foos = [f1,f2,f3]


The output will be something like:

[<__main__.Foo object at 0x00000000020F36A0>, <__main__.Foo object at 0x00000000020F9C88>, <__main__.Foo object at 0x00000000020F9CF8>]

Of course, this isn’t super helpful, it’d be better to get some sort of idea of what each of the Foo instances were.

The trick here is to override the class’s __repr__ function (docs):

class Foo(object):
    def __init__(self, fid, name, bar):
        self.fid    = fid
        self.name   = name
        self.bar    = bar

    def __repr__(self):
        return "Foo(fid=%s, name=%s, bar=%s)" % (self.fid, self.name, self.bar)

f1 = Foo(1, "Red",   [1,2,3])
f2 = Foo(2, "Green", [4,5,6])
f3 = Foo(3, "Blue",  [7,8,9])

foos = [f1,f2,f3]


Now, the output will be something like:

[Foo(fid=1, name=Red, bar=[1, 2, 3]), Foo(fid=2, name=Green, bar=[4, 5, 6]), Foo(fid=3, name=Blue, bar=[7, 8, 9])]

But what if you want to show 10 attributes, not necessarily for concise output, but for verbose debugging purposes. What about 20 attributes? 50?

The __repr__ function would get pretty long and needlessly complicated.

Enter the describe() function:

def describe(obj, keys, quoted_keys=None):
    qk = quoted_keys if quoted_keys is not None else []
    def desc_pair(obj, key, quoted_keys):
        val = obj.__getattribute__(key)

        if (isinstance(val, (basestring))) or (key in quoted_keys):
            return '%s="%s"' % (key, val)
            return '%s=%s' % (key, val)

    desc = "%s(%s)" % (obj.__class__.__name__, ', '.join([desc_pair(obj, k, qk) for k in keys]))
    return desc

It's pretty straightforward, but it allows you to generate the same __repr__ output, by simply specifying a list of attribute "keys":


def __repr__(self):
    return describe(self, ["fid", "name", "bar"])


This would output the following:

[Foo(fid=1, name="Red", bar=[1, 2, 3]), Foo(fid=2, name="Green", bar=[4, 5, 6]), Foo(fid=3, name="Blue", bar=[7, 8, 9])]

The output is basically the same as the long-hand manually-typed __repr__ implementation, with the added benefit of quoting strings.

Lastly, you can specify a third, optional argument to describe() which, if you do, it must be a list of keys that you want to be quoted "by force", even if they're not strings.

Do with it what you will :-)

Leave the first comment

Mutable Numeric Types in Python? Pass-by-reference?

People familiar with languages like C, where there are two distinct methods of passing variables to functions (pass-by-value and pass-by-reference) often ask how you can emulate these methods in Python.

For example, consider the following C code:

#include <stdio.h>

void double_by_val(int x)
    x *= 2;
    printf("Inside double_by_val(): x = %d\n", x);

void double_by_ref(int *x)
    *x *= 2;
    printf("Inside double_by_val(): x = %d\n", *x);

int main(void)
    int v = 5;
    printf("v = %d\n", v);
    printf("After double_by_val(): v = %d\n", v);
    printf("After double_by_ref(): v = %d\n", v);

The previous code, when compiled and run outputs:

v = 5
Inside double_by_val(): x = 10
After double_by_val(): v = 5
Inside double_by_val(): x = 10
After double_by_ref(): v = 10

So what’s happening here?

Note: This is a simplified discussion, see the Techincal Notes section at the bottom for more information.

In both functions, the value of x is 10 after the *= 2 operation. However, only the second function causes the change persist once the function exits. This is the basic difference between pass-by-value (PBV) and pass-by-reference (PBR).

In PBV, the function receives a “copy” of the variable, and operates on that. Any changes made to the copy of the variable aren’t reflected in the original variable.

In PBR, the function receives a reference to the original variable. Any changes made through the reference are made directly on the original variable and will “persist” once the function ends.

So this explains why, after the function pass_by_val() exited, that the value of v remained unchanged at 5. It also explains why after the function pass_by_ref() exited, the updated value of 10 persisted and was reflected in the final line of the output.

So C is both Pass-By-Value and Pass-By-Reference?

Well, no. I kind of hand-waved in the previous section to illustrate the difference between PBV and PBR. In fact, strictly speaking, C is only pass-by-value. You can emulate a pass-by-reference behavior by passing the function a pointer, but even that is still pass-by-value — in this case, that value is of the memory address the variable is stored in.

Safe Deposit Boxes

Safe Deposit Boxes

Let me try to explain that a bit further.

A scenario

Let’s first think about a bank with a wall of safe deposit boxes. One day, you decide you want to store your Personal Documents, so you go into the bank and reserve a safe deposit box. An unreserved safe deposit box is labelled and set aside for your use. Somewhere in the bank, there’s a log that now contains an entry linking you, your purpose, and the number of the box.

Date Customer Use Box No.
2013-09-22 John Smith Personal Documents 114

For the rest of this discussion, lets assume that we’ve chosen a very unscrupulous bank to store our things, where the only thing you need to gain access to a safe deposit box is the number. Thats it! Just having the box number will grant you access to the contents inside the box.

Let’s say a few months later you want to store some valuables as well, but your current safe deposit box is too small. So you go into the bank and reserve a second box, this one for your larger valuables. The entries of bank’s log corresponding to you might now look like:

Date Customer Use Box No.
2013-02-14 John Smith Personal Documents 114
2013-09-22 John Smith Valuables 237

Now, sometime down the road our spouse wants our birth certificate. But for some odd reason, the only way we’re able to communicate with them is over a fax machine. (Bear with me here).

Anything we attempt to give them will first be scanned, digitized, sent as bits over the phone line, and reconstructed on the other side.

So how can we get our spouse our birth certificate?

  1. We could go into the bank, access our safe deposit box, take out the birth certificate, and fax it to our spouse.
  2. Or, instead we could simply fax our spouse a piece of paper like the following:

    Shade E. Bank
    1402 Main Street
    Box 114

    Leaving it up to them to retrieve it from the safe deposit box.

What does this have to do with anything?

Well, this scenario is very much like what happens in your computer when you declare variables. In C, any time you declare a variable, the system reserves a memory location of sufficient size to store that variable. This is just like you taking out a safe deposit box at the bank.

Just like the safe deposit boxes have box numbers, the memory locations that you’ve reserved have addresses. But instead of Box 114 and Box 237, the memory addresses are a bit longer and often expressed in hexidecimal, for example 0x02f41ae7.

Just like safe deposit boxes are uniquely identified by the box number (that is, there are no two boxes in the bank with the same box number), memory addresses uniquely identify memory locations in your computer.

So every time you declare a variable, it’s like you’re reserving a safe deposit box in the bank. And every time you define that variable, or change its value, it’s like you’re changing the contents of that safe deposit box.

And the two options we discussed about how we could get our spouse our birth certificate?

Well the first, where we faxed the original, can be thought of as pass-by-value. We’re giving them a copy of a document, and no matter what they do to it, it won’t affect the original that remains in our safe deposit box.

The second, where we faxed them the location of the original, can be thought of as pass-by-reference. Instead of faxing them the document which they can use directly (albeit a copy of the original), we give them the location of the original itself. In other words, we give them a pointer to the document.

Pointer's (Courtesy of Sven)

Pointer’s (Courtesy of Sven)

Pointers are simply addresses, like Box 114 or 0x02f41ae7.

Just like we had two methods of passing our birth certificate, we have two methods of passing a value to a function. But remember that C is always pass-by-value, so that anything we pass will first be “copied”, eliminating our ability to modify it and have those modifications persist once the function ends.

So since we’re always passing a copy of the variable, we can either pass a copy of the value, or we can pass a copy of a pointer to the value (the address of the value).

Just like our birth certificate scenario, passing a copy of the value gives us no way to edit the original. But passing a copy of the address of the original allows us to (through a little bit of extra work) access and edit the original.

So when we pass the function a copy of the value directly, this is pass-by-value.

And when we pass the function a copy of the address, this is also pass-by-value, but here, that value is a pointer. And using this method allows us, by “visiting the address” (or “dereferencing the pointer”), to access and edit original. This behavior is sometimes called pass-by-reference-by-value, but pass-by-value(-of-reference) might make a bit more sense.

Pass-by-value(-of-reference) is a technique used in other languages too. For example, Java passes everything by-value, but for objects, but that value is a reference.

Quick Aside: What’s True Pass-By-Reference?
True pass by reference allows you to reassign the variables that are passed to it, and have those changes be reflected once the function exits.

For example, for this function to work, exactly as coded, the variables would need to be passed by reference:

function swap(a,b)
    t = a;
    a = b;
    b = t;

Note that pointer you could write a very similar function in C, with pointers:

void swap(int *a, int *b)
    int t = *a;
    *a = *b;
    *b = t;

But you’re not actually passing the things you’re swapping — you’re passing pointers / memory addresses and swapping the values at those addresses. This is the reason that C only emulates pass-by-reference.

Okay, so what about Python?

Consider the following python code:

def double_me(x):
    x *= 2

v = 5
print("v = %d" % v)
print("After double_me(): v = %d" % v)

In python, parameters are passed by value. So after the function double_me the value of v will remain unchanged at 5. This mimics the behavior of our pass_by_val() function in c.

So how do we pass-by-reference in python?

Well, while it’s true that python is pass-by-value, the value that is actually getting passed is a reference — another example of pass-by-value(-of-reference).

One common way of “passing-by-reference” is to enclose the object in a list:

def double_me(x):
    x[0] *= 2

v = [5]
print("v = %d" % v[0])
print("After double_me(): v = %d" % v[0])

After double_me(), v will be 10.

This is made possible because a list is a mutable datatype and we never rebind the reference.

Using an immutable datatype (int, float, string) this would not be possible, because the if the assignment worked, the reference would be re-bound.

Another possibility, other than wrapping a number in a list, is to create and use some mutable object and pass that to the function.

One example of a mutable numeric class would be the following:

MutableNum class

Allows you to pass the instance to a function, and with proper coding, allows you to modify the 
value of the instance inside the function and have the modifications persist.

For example, consider:

>   def foo(x): x *= 2
>   x = 5
>   foo(x)
>   print(x)

This will print 5, not 10 like you may have hoped.  Now using the MutableNum class:

>   def foo(x): x *= 2
>   x = MutableNum(5)
>   foo(x)
>   print(x)

This *will* print 10, as the modifications you made to x inside of the function foo will persist.

Note, however, that the following *will not* work:

>   def bar(x): x = x * 2
>   x = MutableNum(5)
>   bar(x)
>   print(x)

The difference being that [x *= 2] modifies the current variable x, while [x = x * 2] creates a new 
variable x and assigns the result of the multiplication to it.

If, for some reason you can't use the compound operators ( +=, -=, *=, etc.), you can do something
like the following:

>   def better(x):
>       t = x
>       t = t * 2
>       # ... (Some operations on t) ...
>       # End your function with a call to x.set()
>       x.set(t)

class MutableNum(object):
    __val__ = None
    def __init__(self, v): self.__val__ = v
    # Comparison Methods
    def __eq__(self, x):        return self.__val__ == x
    def __ne__(self, x):        return self.__val__ != x
    def __lt__(self, x):        return self.__val__ <  x
    def __gt__(self, x):        return self.__val__ >  x
    def __le__(self, x):        return self.__val__ <= x
    def __ge__(self, x):        return self.__val__ >= x
    def __cmp__(self, x):       return 0 if self.__val__ == x else 1 if self.__val__ > 0 else -1
    # Unary Ops
    def __pos__(self):          return self.__class__(+self.__val__)
    def __neg__(self):          return self.__class__(-self.__val__)
    def __abs__(self):          return self.__class__(abs(self.__val__))
    # Bitwise Unary Ops
    def __invert__(self):       return self.__class__(~self.__val__)
    # Arithmetic Binary Ops
    def __add__(self, x):       return self.__class__(self.__val__ + x)
    def __sub__(self, x):       return self.__class__(self.__val__ - x)
    def __mul__(self, x):       return self.__class__(self.__val__ * x)
    def __div__(self, x):       return self.__class__(self.__val__ / x)
    def __mod__(self, x):       return self.__class__(self.__val__ % x)
    def __pow__(self, x):       return self.__class__(self.__val__ ** x)
    def __floordiv__(self, x):  return self.__class__(self.__val__ // x)
    def __divmod__(self, x):    return self.__class__(divmod(self.__val__, x))
    def __truediv__(self, x):   return self.__class__(self.__val__.__truediv__(x))
    # Reflected Arithmetic Binary Ops
    def __radd__(self, x):      return self.__class__(x + self.__val__)
    def __rsub__(self, x):      return self.__class__(x - self.__val__)
    def __rmul__(self, x):      return self.__class__(x * self.__val__)
    def __rdiv__(self, x):      return self.__class__(x / self.__val__)
    def __rmod__(self, x):      return self.__class__(x % self.__val__)
    def __rpow__(self, x):      return self.__class__(x ** self.__val__)
    def __rfloordiv__(self, x): return self.__class__(x // self.__val__)
    def __rdivmod__(self, x):   return self.__class__(divmod(x, self.__val__))
    def __rtruediv__(self, x):  return self.__class__(x.__truediv__(self.__val__))
    # Bitwise Binary Ops
    def __and__(self, x):       return self.__class__(self.__val__ & x)
    def __or__(self, x):        return self.__class__(self.__val__ | x)
    def __xor__(self, x):       return self.__class__(self.__val__ ^ x)
    def __lshift__(self, x):    return self.__class__(self.__val__ << x)
    def __rshift__(self, x):    return self.__class__(self.__val__ >> x)
    # Reflected Bitwise Binary Ops
    def __rand__(self, x):      return self.__class__(x & self.__val__)
    def __ror__(self, x):       return self.__class__(x | self.__val__)
    def __rxor__(self, x):      return self.__class__(x ^ self.__val__)
    def __rlshift__(self, x):   return self.__class__(x << self.__val__)
    def __rrshift__(self, x):   return self.__class__(x >> self.__val__)
    # Compound Assignment
    def __iadd__(self, x):      self.__val__ += x; return self
    def __isub__(self, x):      self.__val__ -= x; return self
    def __imul__(self, x):      self.__val__ *= x; return self
    def __idiv__(self, x):      self.__val__ /= x; return self
    def __imod__(self, x):      self.__val__ %= x; return self
    def __ipow__(self, x):      self.__val__ **= x; return self
    # Casts
    def __nonzero__(self):      return self.__val__ != 0
    def __int__(self):          return self.__val__.__int__()               # XXX
    def __float__(self):        return self.__val__.__float__()             # XXX
    def __long__(self):         return self.__val__.__long__()              # XXX
    # Conversions
    def __oct__(self):          return self.__val__.__oct__()               # XXX
    def __hex__(self):          return self.__val__.__hex__()               # XXX
    def __str__(self):          return self.__val__.__str__()               # XXX
    # Random Ops
    def __index__(self):        return self.__val__.__index__()             # XXX
    def __trunc__(self):        return self.__val__.__trunc__()             # XXX
    def __coerce__(self, x):    return self.__val__.__coerce__(x)
    # Represenation
    def __repr__(self):         return "%s(%d)" % (self.__class__.__name__, self.__val__)
    # Define innertype, a function that returns the type of the inner value self.__val__
    def innertype(self):        return type(self.__val__)
    # Define set, a function that you can use to set the value of the instance
    def set(self, x):
        if   isinstance(x, (int, long, float)): self.__val__ = x
        elif isinstance(x, self.__class__): self.__val__ = x.__val__
        else: raise TypeError("expected a numeric type")
    # Pass anything else along to self.__val__
    def __getattr__(self, attr):
        print("getattr: " + attr)
        return getattr(self.__val__, attr)

if __name__ == "__main__":
    import sys
    import pprint
    import inspect

    def __assert(exp, act):
        if exp != act:
            lineno = inspect.currentframe().f_back.f_back.f_lineno
            print('%4d: Assertion Failed: expected %s, got %s' % (lineno, [exp], [act]))
    def assertEquals(exp, act): __assert(exp, act)        
    def assertTrue(act):        __assert(True, act)
    def assertFalse(act):       __assert(False, act)

    def assertInstanceEquals(exp, act):
        __assert(type(exp), type(act))  # isinstance(inst, class) is probably better here    
        __assert(exp, act)

    def add5(x): x += 5
    def sub3(x): x -= 3
    def trip(x): x *= 3
    def half(x): x /= 2
    def modtwo(x): x %= 2
    def square(x): x **= 2

    ## Ensure additional assertions even work
    assertInstanceEquals("Test", "Test")
    #assertInstanceEquals(5,"5") #This should fail

    ## Equality
    x = MutableNum(5)
    assertTrue(x == 5)
    assertTrue(5 == x)
    assertEquals(5, x)
    assertEquals(x, 5)

    # Comparisons
    x = MutableNum(5)
    assertTrue(x < 7)
    assertTrue(x > 3)
    assertTrue(7 > x)
    assertTrue(3 < x)
    assertFalse(x > 7)
    assertFalse(x < 3)
    assertFalse(7 < x)
    assertFalse(3 > x)
    assertEquals( 1, cmp(MutableNum(10), MutableNum(-5)))
    assertEquals( 0, cmp(MutableNum( 0), MutableNum( 0)))
    assertEquals(-1, cmp(MutableNum(-7), MutableNum(-2)))

    # Unary Ops
    x = MutableNum(13)
    y = MutableNum(-14)
    xp = +x
    xn = -x
    yp = +y
    yn = -y
    assertInstanceEquals(MutableNum( 13), xp)
    assertInstanceEquals(MutableNum(-13), xn)
    assertInstanceEquals(MutableNum(-14), yp)
    assertInstanceEquals(MutableNum( 14), yn)
    assertInstanceEquals(MutableNum(13), abs(x))
    assertInstanceEquals(MutableNum(14), abs(y))

    # Bitwise Unary Ops
    assertInstanceEquals(MutableNum(~20), ~MutableNum(20))
    assertInstanceEquals(MutableNum(~-5), ~MutableNum(-5))

    # Arithmetic Binary Ops
    assertInstanceEquals(MutableNum(10 + 2), MutableNum(10) + 2)
    assertInstanceEquals(MutableNum(10 - 2), MutableNum(10) - 2)
    assertInstanceEquals(MutableNum(10 * 2), MutableNum(10) * 2)
    assertInstanceEquals(MutableNum(10 / 2), MutableNum(10) / 2)
    assertInstanceEquals(MutableNum(10 % 2), MutableNum(10) % 2)
    assertInstanceEquals(MutableNum(pow(10,2)), pow(MutableNum(10), 2))

    # Reflective Arithmetic Binary Ops
    assertInstanceEquals(MutableNum(10) + 2, MutableNum(10 + 2))
    assertInstanceEquals(MutableNum(10) - 2, MutableNum(10 - 2))
    assertInstanceEquals(MutableNum(10) * 2, MutableNum(10 * 2))
    assertInstanceEquals(MutableNum(10) / 2, MutableNum(10 / 2))
    assertInstanceEquals(MutableNum(10) % 2, MutableNum(10 % 2))
    assertInstanceEquals(pow(MutableNum(10), 2), MutableNum(pow(10,2)))

    # Bitwise Binary Ops
    assertInstanceEquals(MutableNum(2 & 3), MutableNum(2) & 3)
    assertInstanceEquals(MutableNum(2 | 3), MutableNum(2) | 3)
    assertInstanceEquals(MutableNum(2 ^ 3), MutableNum(2) ^ 3)
    assertInstanceEquals(MutableNum(2 << 3), MutableNum(2) << 3)
    assertInstanceEquals(MutableNum(2 >> 3), MutableNum(2) >> 3)

    ## Compound Assignment / "Pass-by-reference"
    x = MutableNum(6)
    xid = id(x)
    assertInstanceEquals(MutableNum(6), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(18), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(9), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(14), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(11), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(121), x)
    assertEquals(xid, id(x))
    assertInstanceEquals(MutableNum(1), x)
    assertEquals(xid, id(x))

    ## Casts
    # Boolean
    # Int
    assertInstanceEquals(int( -5), int(MutableNum( -5)))
    assertInstanceEquals(int(  0), int(MutableNum(  0)))
    assertInstanceEquals(int(1.2), int(MutableNum(1.2)))
    assertInstanceEquals(int(  5), int(MutableNum(  5)))
    # Long
    assertInstanceEquals(long( -5), long(MutableNum( -5)))
    assertInstanceEquals(long(  0), long(MutableNum(  0)))
    assertInstanceEquals(long(1.2), long(MutableNum(1.2)))
    assertInstanceEquals(long(  5), long(MutableNum(  5)))
    # Float
    assertInstanceEquals(float( -5), float(MutableNum( -5)))
    assertInstanceEquals(float(  0), float(MutableNum(  0)))
    assertInstanceEquals(float(1.2), float(MutableNum(1.2)))
    assertInstanceEquals(float(  5), float(MutableNum(  5)))

    ## Conversions
    # Oct
    assertInstanceEquals(oct(12), oct(MutableNum(12)))
    assertInstanceEquals(oct(-2), oct(MutableNum(-2)))
    # Hex
    assertInstanceEquals(hex(12), hex(MutableNum(12)))
    assertInstanceEquals(hex(-2), hex(MutableNum(-2)))
    # Str
    assertInstanceEquals(str(12), str(MutableNum(12)))
    assertInstanceEquals(str(-2), str(MutableNum(-2)))

    ## Set
    x = MutableNum(0)
    assertInstanceEquals(MutableNum(5), x)
    assertInstanceEquals(MutableNum(1.2), x)
    assertInstanceEquals(MutableNum(4), x)

    print("ALL TESTS PASSED")

Note that the compound assignment operator must be used in order for this to work.

Leave the first comment

HTML5 Input Type Support Detector

While working on a web interface for a project we’re doing, I got really into reading up on HTML5.

One (of the many) cool additions is the ability to specify different tailored input types. Whereas we may have previously just resorted to <input type="text"> as a catch-all input for everything, we can now give the browser some context about what we expect to be entered in the field.

For example, I’ve written many-a-form with something like the following

<form action="some-handler.php">
    <input name="email" type="text">

And I’d hope that the user would enter something like Jones.McGillicutty@foobar.org in the field before submitting.

Of course, if I wasn’t sure that would happen (any production code), I could do some validation. This would always be server-side (as it should be), but sometimes I’d even throw in client-side validation as well for that extra spice.

In the end, the real validation was whether the user was able to receive the confirmation email at the address they specified, but catching errors / omissions immediately, before asking the user to wait around for a confirmation email they’d never get was, of course, a better approach.

But as it turns out, validating email addresses isn’t as easy as it might appear to be. As Mark Pilgrim notes,

Seriously, you’ll get it wrong. Determining whether a random string of characters is a valid email address is unbelievably complicated. The harder you look, the more complicated it gets. Did I mention it’s really, really complicated?

(More on Mark and his articles below)

So, to assist us1 in this task, we can leverage some of the new HTML5-inspired intelligence of the browser.

By simply changing the input type to email, we can take advantage of one of the 13 new input types HTML5 introduces. For example, the previous form could be rewritten as:

<form action="some-handler.php">
    <input name="email" type="email">

Without any2 negative side effects.

Without any negative side effects? you may ask. Indeed. As it turns out, browsers as early as IE6 (circa 2001) handled inputs with unexpected type attributes as having the text type. So what this means is, that by changing your input type from text to email, the worst that will happen on any modern (or not so modern) browser is that it will be silently changed back to text in the DOM.

Fine, no big deal. And in the browsers that do support it, you may even be rewarded with free client-side input validation. In fact, Apple takes this a step further and slightly modifies the iPhone and iPad’s soft keyboard accordingly. So when an iPhone/iPad user enters an input field of type email, the space bar shrinks a bit and buttons are added for the characters @ . _ (at sign, dot/period, and underscore). Screenshots of the iPhone’s automagically adapted keyboard can be found here in Mark’s article.

If you want to test it out for yourself, to see how your browser stacks up, I’ve thrown together a few things you might find helpful:

  • A tool to show you which of these new types your browser supports — Using the heuristic method described in Mark’s article, it just generates a new input element of each type, and sees whether the element type “sticks”. The pertinent javascript is contained within the file itself, so just view the source if you have any questions. The text under the table is just your User Agent string, which I tacked on there so include in the screenshots I post below.
  • An example of each of the 13 new input types — Just click the link, then click on one of the input types on the left. A form will load on the right, along with a note indicating whether your browser seems to support it. The types email, url, and search seem well-supported, so you may want to start with them. If your browser supports it, try submitting the form with invalid contents — see if you get a 100% code-free client-side error preventing your submission. Again, all pertinent javascript is included in the file itself, so feel free to view source.3


Firefox (23.0.1)

Firefox (23.0.1)

Google Chrome (29.0.1547.66 m)

Google Chrome (29.0.1547.66 m)

Internet Explorer (10.0.9200.16660)

Internet Explorer (10.0.9200.16660) (UA string truncated)

Interesting Reads

  • Mark Pilgrim’s Dive Into HTML5 — an online companion-of-sorts to Mark’s HTML5: Up & Running (O’Reilly), updated by a team of his friends.
  • Richard Clark’s article focusing on the new input types — an illustrative look at the new input types and how some various browsers render the various widgets and the associated error messages one gets when attempting to submit invalid input.
  • Modernizr — a well-polished Javascript library that among (many) other things, you can use to determine what capabilities users’ browsers have. My heuristic detection method was based off the one in Mark’s article, attributed to Modernizr.
  • If you noticed the “Mozilla” in Internet Explorer’s UA string, this article has a interesting and humorous history of the Browser Wars and the origins of “Mozilla” in IE’s User Agent string.

  1. Proper server-side validation is still very good practice; for a number of reasons including security and the fact that not all browsers yet support this built-in validation. 

  2. I could come up with some edge-cases where this would affect your code (e.g. using a CSS selector like input[type="text"]), but it seems extremely unlikely. 

  3. There is some server-side logic involved in validating the query string and displaying an input of the appropriate type, but it’s pretty trivial. 

Leave the first comment

Cross-Domain AJAX – A simple workaround

First off, this isn’t new. It’s not actually breaking any security protection, its just a way to perform cross-domain AJAX requests, something that is usually blocked by your browser.

Consider the following javascript code[1]:

var requestData = { 
        op: "fetchRecords",
        id: 124,
        key: "Q1GxWcxWOrKY"
var request = $.ajax({
    url: "http://some-other-domain.com/json",
    data: requestData,
    dataType: 'json'
request.done(function(msg) {

Unless the other domain is explicitly configured to allow for cross-domain AJAX requests, your request will fail.

Quick sidenote here: This failure may be very confusing, especially if you’re not aware of the same-origin policy restriction. The AJAX query will fail (so the fail and error methods will be called), but it will not indicate why it failed. More puzzling is that Firebug shows that the response header as 200 OK, but no data is passed back. Just a heads up in case this happens.

One option, if the server is setup to handle it, is JSONP. You can read more about it on the Wikipedia article, but unless the server is already configured to handle JSONP requests, or you can modify it to do so, this will not be an option for you.

When using third-party data providers, like in my case, these are not options.

What I needed was some way to asynchronously fetch the contents of a URL, but since I couldn’t do it directly, I did it indirectly.

To workaround this issue, I first created an extremely simple PHP script[2] that I called proxy.php:

$file = file_get_contents($_GET['requrl']);
echo $file;

All this script does is fetch the URL that is specified in the requrl GET parameter and return its contents.

For example,


would just display the contents of google’s main page.[3]

And because this file, proxy.php, was hosted locally on the same domain, I’m able to send AJAX requests to it without issue.

If we take the javascript example above, it could be re-written to use the proxy as follows:

var requestData = { 
        op: "fetchRecords",
        id: 124,
        key: "Q1GxWcxWOrKY"
var request = $.ajax({
    url: "proxy.php",
    data: {requrl: "http://some-other-domain.com/json" + $.param(requestData) },
    dataType: 'json'
request.done(function(msg) {

Now, the client will fetch proxy.php via AJAX, the PHP server will fetch the specified URL and return it in response to the client’s AJAX query.

If you notice, we’re even able to pass GET parameters to the proxy and those parameters will be used by the webserver when fetching the page. This is because jQuery’s param function will URL encode the query string after creating it from the map, eliminating any ambiguity about where the query string parameters should be sent to.

The reason this doesn’t break the same-origin policy is because the same-origin policy is meant to prevent the client from making unintended/unexpected requests (and usually, exploiting the fact that the client’s cookies are sent and can be used to imitate a user’s intentional actions). In the case of this proposed workaround, it is the server that is making the request and your cookies will be safe.

With any code samples you find on the internet, this one included, you should read up on the functions you’re considering using before putting them in any (especially production) code. :-)


  1. This snippet, and the other javascript snippets require jQuery. Also, I specify the response dataType as json so that jQuery will parse the returned json and return an object. If you don’t want this, you can remove or edit this line.
  2. This is just a proof-of-concept proxy script and shouldn’t be used in practice. Without first checking things like referrer or requested URL against a whitelist, anyone could use the proxy for whatever they wanted. It’s just not a good idea to leave production code like this.
  3. Because google uses relative paths for things like CSS and images, the page you fetch via the proxy may look a bit different, specifically it may be missing styling, images or scripts. This is almost never an issue if you’re using a JSON API or even just scraping pages for the text data.
5 comments so far, add yours

X Forwarding with sudo

With PuTTY and Xming I use X Forwarding to do a lot of work on my local linux boxes on my Windows box.

After a reformat of both, I ran into the issue where a command like wireshark would work fine but running it via sudo (sudo wireshark) failed with the error:

X11 proxy: wrong authorisation protocol attempted

For me, the only thing that was required was to properly define the $DISPLAY (which was fine) and $XAUTHORITY (which wasn’t set) environment variables and make sure they’re passed through to root via sudo.

The first can be done via:

export DISPLAY="localhost:10"                    # For example, this might be different for you -- this is probably already set
export XAUTHORITY="/home/<username>/.Xauthority" # For me, this wasn't set.  Also, make sure to use the full path and not ~

Once that is done, ensure that these two environment variables are “kept” via sudo by checking /etc/sudoers and finding the lines that look like:

Defaults    env_reset

These lines define what environment variables persist for the root user when sudo is invoked. By default in my installation (Fedora 16, x64) both DISPLAY and XAUTHORITY persist (first and last lines). If your file doesn’t have this, the environment variables won’t persist and you won’t be able to use X Forwarding as easily/transparently, so you should consider adding them.

Lastly, since $DISPLAY is already set correctly, I just added the $XAUTHORITY definition (shown above) in by ~/.bashrc file and I was good to go.

2 comments so far, add yours

Installing Eclipse/Subclipse on Windows 7 x64

Subclipse is a great subversion plugin for Eclipse allowing very tight integration into the IDE. Installing it on a 64-bit JVM is a bit tricky, however, and not as straightforward as it probably should be. Today’s post will walk through setting up Subclipse on a fresh Eclipse install.

We’ll be doing this the “right way”, using public key authentication (“passwordless”), but I’ll also point out the steps that can be skipped if you want to enter a password each time with an asterisk*. Note: entering a password for every transaction gets very tedious and I would recommend using public key authentication as I outline.

First, I’ll list the versions I’m working with:

  • Java: 1.7.0_03(x64)
  • Eclipse: 3.7.2 (Indigo) (x64)
  • SilkSVN: (x64)
  • Subclipse: 1.8

You’re free to use a newer version if this post becomes outdated, but things may change which may or may not invalidate this walkthrough.


A 64-bit JVM installed. Installing subclipse is presumably a lot easier if you have a 32-bit JVM installed.


Update: I originally outlined the steps necessary to install a version of Subclipse that included packages that were shortly thereafter outdated. Mark Phippard, the project manager for Subclipse and developer for Subversion stopped by and updated us with the current status of the project, including fact that the updated packages make this install a lot easier. As a result, I’ve struck out the steps (4,5) that were made unnecessary in the latest update. Thanks Mark!
  1. Install Eclipse (Indigo x64)
  2. Install Subclipse via Eclipse (Help – Intstall New Software – Add)
  3. Exit Eclipse
  4. Download and Install SlikSVN — This is needed for the 64-bit JavaHL libraries. (What is JavaHL and why do I need it?) Make sure you select the 64-bit edition.
  5. Ensure you have MSVC++ 2010 (x64) Redistributable Package Installed. If you think you have this installed, you can skip this step. If you encounter an Eclipse error dialog stating: no msvcp100 in java.library.path, you’re missing this.
  6. Install TortoiseSVN (or just download TortoisePlink.exe). Note that TortoisePlink.exe is different than the PLINK.EXE that is provided by the PuTTY package and using that version will not work.
  7. Download Puttygen.exe* (available here, either as part of the entire PuTTY package or alone). Note that this step will be necessary if (a) you want to use passwordless login and (b) you don’t already have a private/public keypair (if you’re unsure about this, you probably don’t have one and generating another one is no big deal).
  8. Create Public / Private Keypair using Puttygen.exe* — save these somewhere you’ll remember (I use %HOMEPATH%\Keys which is (on Win7) C:\Users\<username>\
  9. Install public key on remote computer.* Note: this might depend on the OS of your subversion server, but generally the steps are the same: (1) copy public key to subversion server, (2) add key to ~/.ssh/authorized_keys. Something like:
    cat id_rsa.pub >> ~/.ssh/authorized_keys
  10. Configure Subclipse to use plink. Open %appdata%\Subversion\config and in the [tunnels] section, add the line
    ssh = C:/Progra~2/Putty/TortoisePlink.exe -l username -i C:/path/to/private_key

    Notes: (1) change paths and username as appropriate, (2) The path separator should either be a single forward slash or a double (escaped) backslash. If you decide not to use public key authentication, drop the last (i) option.

  11. Launch Eclipse
  12. Open SVN Configuration (Window – Preferences – Team – SVN) and ensure JavaHL (JNI) 1.7.5 … is selected under (SVN Interface – Client) and an error dialog isn’t displayed. If you get an error dialog and the dropdown contains JavaHL (JNI) Not available, you may have forgot to exit and restart Eclipse or something went wrong while installing SlikSVN.
  13. Add repository (Window – Open Perspective – Other… – SVN Repository Explorer – Right click in “SVN Repositories” – New – Repository Location) and enter repo location (svn+ssh://subversion.server/path/to/repo)


At this point, one of four things should happen:

  • Your repository opens up in the Repo Browser. Congrats! You set up Subclipse successfully. Code on!
  • A password dialog opens up. Typically this is because either (a) you erred when specifying your private key on the ssh=… line, or (b) you didn’t, or didn’t correctly, add the public key to the authorized_keys file.
  • You get a red console error:
    Network connection closed unexpectedly
    svn: Unable to connect to a repository at URL ‘svn+ssh://path/to/repo’
    svn: To better debug SSH connection problems, remove the -q option from ‘ssh’ in the [tunnels] section of your Subversion configuration file.

    This means that you were able to connect to the server, but for some reason the connection later failed. This could be because you specified an incorrect username. Ensure your TortoisePlink.exe options are correct.
  • You get a red console error:
    The system cannot find the file specified.
    svn: Unable to connect to a repository at URL ‘svn+ssh://path/to/repo/’
    svn: Can’t create tunnel: The system cannot find the file specified.

    This means that Subclipse couldn’t locate TortoisePlink.exe. Ensure your path is correct and your path separators are appropriate.
3 comments so far, add yours

Tutorial: Android JNI (Part 2)

This is continued from Part 1.

In the first part, we setup the project structure, wrote the java wrapper class, generated the c header, wrote the c library and write the Android.mk file.

Building shared library

With traditional JNI (not targetting Android devices), you just compile the library with gcc using the -shared option. However, for Android devices, we use the ndk-build script.

To compile, we first switch to the project directory (the directory that contains the jni and src directories then run the ndk-build script provided in the Android NDK:

cd Android-JNI-Demo
/path/to/ndk-build # for example: ~/build/android-ndk/android-ndk-r7c/ndk-build

If when trying to compile on linux you get the following error:

Invalid attribute name:

You’ll want to check the line endings of your AndroidManifest.xml. I used the dos2unix command to correct them.

If the build script found your Android.mk and the library compiled without issue, you’ll see the following:

Compile thumb  : squared <= squared.c
SharedLibrary  : libsquared.so
Install        : libsquared.so => libs/armeabi/libsquared.so

So we’re happy with the library compilation and now we’ll move on the developing a simple UI to test whether our functions perform as we expect.

Develop Simple UI

By default, when we created a new Android project in Eclipse an activity was generated with the following:

package org.edwards_research.demo.jni;

import android.app.Activity;
import android.os.Bundle;

public class Android_JNI_DemoActivity extends Activity {
    /** Called when the activity is first created. */
    public void onCreate(Bundle savedInstanceState) {

If we didn’t care about looking pretty, we could change this to something like:

package org.edwards_research.demo.jni;

import android.app.Activity;
import android.os.Bundle;
import android.util.Log;

public class Android_JNI_DemoActivity extends Activity {
    /** Called when the activity is first created. */
    public void onCreate(Bundle savedInstanceState) {
        int b = 3;
        int a = SquaredWrapper.to4(b);
        Log.i("JNIDemo", String.format("%d->%d", b,a));

and run it, either in the emulator or on a device (with USB debugging enabled), we would see in LogCat an entry tagged with JNIDemo. In this case, we’d expect something like 3->81, since 3^4 = 81. But we’ll do a little bit more to see the performance of the library directly on the UI.

Instead of walking through the specific steps of creating the UI, I’ll simply post the pertinent files:


package org.edwards_research.demo.jni;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.EditText;
import android.widget.TextView;

public class Android_JNI_DemoActivity extends Activity {
    private EditText etInput;
    private TextView txtTo2;
    private TextView txtTo4;
    /** Called when the activity is first created. */
    public void onCreate(Bundle savedInstanceState) {
        // Define Input EditText, TextViews
        etInput = (EditText) findViewById(R.id.etInput);
        txtTo2 =  (TextView) findViewById(R.id.resTo2);
        txtTo4 =  (TextView) findViewById(R.id.resTo4);
    public void cbCalculate(View view)
        int in = 0;
            in = Integer.valueOf( etInput.getText().toString() );
        } catch(NumberFormatException e) { return ; }
        txtTo2.setText(String.format("%d", SquaredWrapper.squared(in)));
        txtTo4.setText(String.format("%d", SquaredWrapper.to4(in)));


<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:orientation="vertical" >

        android:layout_height="wrap_content" >

            android:layout_height="wrap_content" >

                android:inputType="number" />



            android:layout_height="wrap_content" >

                android:text="@string/squared" />

                android:text="" />


            android:layout_height="wrap_content" >

                android:text="@string/to4" />

                android:text="" />


            android:layout_height="wrap_content" >



<?xml version="1.0" encoding="utf-8"?>
    <string name="app_name">Android-JNI-Demo</string>
    <string name="squared">Squared:</string>
    <string name="to4">To 4:</string>

When run in the emulator the app looks as follows:

You can enter a number in the text field and press the calculate button which will result in the squared and ^4 calculations:

Of course this was a pretty silly demo library because a squared function could have been trivially implemented in java without the need for c code, cross compiling or dealing at all with the Java Native Interface, however it still illustrated the steps necessary to compile a native library against the Android NDK and how to import and use it in an Android Project.

7 comments so far, add yours

Tutorial: Android JNI

Today I’ll be posting a quick walkthrough of how to create and build a simple android project that includes native code using the Java Native Interface (JNI). As a note, there are sample projects included in the Android NDK, but this will walk you through building your own. After going through this, it’s suggested you review these sample projects.


As a prerequisite for this tutorial, you’ll need:

  1. Eclipse installed and configured to create Android projects. There are a number of tutorials out there about how to do this if you need help.
  2. A JDK installed, as I don’t believe the standard JRE contains the javah command that will be needed.
  3. The Android NDK (available here) downloaded and extracted somewhere.

Create the Android Project

For this tutorial we’re going to create a new project. However there is no reason you couldn’t integrate this into an already-created project.

To create the project, right click in Eclipse’s Package Explorer → New → Android Project. Give your project a name and select an API. For this tutorial I choose the latest Gingerbread API, 2.3.3.

Add jni folder, Android.mk makefile

Once your project has been created, you’ll need to create a new folder inside the top level of the project. To do this right click on your project name → New → Folder. Name this folder jni.

Inside this folder, create a new blank text file. To do this right click on your newly-created jni folder → New → File. Name this file Android.mk. Leave this file blank for now, we’ll come back to it later.

Your project should look something like this:

Create java source

For this tutorial, we’re going to have a simple c program — squared — that accepts an int and returns the square (e.g. 2 → 4, 3 → 9, etc.)

In order to accomodate that, we first create a java source wrapper. The wrapper’s job is to load the library, expose any native functions we wish to use directly, and provide any functions that we want to be able to utilize private native functions.

For this tutorial, we’re going to expose directly the native squared function as well as provide a to4 “derivative” function.

To expose the native function directly, we just declare it public. Alternatively, we could declare it private and limit it’s availability to other functions of the class.

Our full java source is

package org.edwards_research.demo.jni;

public class SquaredWrapper {
    // Declare native method (and make it public to expose it directly)
    public static native int squared(int base);
    // Provide additional functionality, that &quot;extends&quot; the native method
    public static int to4(int base)
        int sq = squared(base);
        return squared(sq);
    // Load library
    static {

Create C header

After we outline the native methods we’ll be using, we can use this java source to create a c header file with the function prototypes for the native methods we used. To do this, we first have to compile the java source into a class file. You can do this manually via the javac command, e.g.:

cd src # change into the source directory
javac -d /tmp/ org/edwards_research/demo/jni/SquaredWrapper.java

Note that the -d switch specifies the output directory for the class file — in this case, I’m just throwing it into /tmp.

Now that we have the class, we can create the c header file., e.g.:

cd /tmp
javah -jni org.edwards_research.demo.jni.SquaredWrapper

Note the need to specify the fullly-qualified class name (including package) and not the .class file extension.

The resulting header file in our case is /tmp/org_edwards_research_demo_jni_SquaredWrapper.h, but we can rename it to whatever we want. In this case, we’ll rename it to squared.h and place it in the jni folder in our project directory.

The resulting squared.h file looks like:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include &lt;jni.h&gt;
/* Header for class org_edwards_research_demo_jni_SquaredWrapper */

#ifndef _Included_org_edwards_research_demo_jni_SquaredWrapper
#define _Included_org_edwards_research_demo_jni_SquaredWrapper
#ifdef __cplusplus
extern &quot;C&quot; {
 * Class:     org_edwards_research_demo_jni_SquaredWrapper
 * Method:    squared
 * Signature: (I)I
JNIEXPORT jint JNICALL Java_org_edwards_1research_demo_jni_SquaredWrapper_squared
  (JNIEnv *, jclass, jint);

#ifdef __cplusplus

The function name is annoying long, I agree. We could eliminate a lot of that if we didn’t use a package in our java source, but I’m not really that concerned.

Create C source

Using the prototype generated by javah, we can implement our c source as follows:

#include &quot;squared.h&quot;

JNIEXPORT jint JNICALL Java_org_edwards_1research_demo_jni_SquaredWrapper_squared
  (JNIEnv * je, jclass jc, jint base)
        return (base*base);

Note we have to give the parameters names and I arbitrarily chose je andjc, and chose base to replicate our java source parameter name.

In this case, the c source is very simple, but this tutorial is meant to illustrate how to include native code into your Android app and more complex c functions could be substituted with few modifications.

Create Android.mk

After we create our c source file, we have to create our Android.mk file. This file serves the as a sort of makefile for the Android build tools. There are a number of sample Android.mk files in the samples/ directory of the NDK and we’ll actually be using almost the exact lines from the hello-jni sample project.

Our Android.mk file looks like:

LOCAL_PATH := $(call my-dir)

include $(CLEAR_VARS)

LOCAL_MODULE    := squared
LOCAL_SRC_FILES := squared.c


At this point, we have laid most of our groundwork for setting up and compiling the library. The next steps are actually creating the shared library, and implementing some simple UI code to show that our native function (squared) and derivative function (to4) work as expected.

This is continued in Part 2.

5 comments so far, add yours

Quick tip: “Commenting out” current command in terminal

So if you’re ever writing a command in a terminal, only to realize you forgot to do something first (e.g. switch to the right directory, change permissions etc.), you don’t have to delete the entire row, do what you forgot to do and retype.

To some, this might be obvious, they might suggest holding down the left arrow until you reach the beginning of the line and “commenting it out” by inserting a # in front of the text.

But there are actually two quicker solutions, one somewhat obvious, and the other not so much. The reason I’m writing this is just to point out the second one.

The first quick solution would be just to press the “Home” key and then insert the # symbol and press enter. This is the pretty obvious method.

The second method, which is actually a keystroke shorter, is to press Escape then the # symbol. These two keys have the same result as the three keys in the first method.

Saving a single keystroke is really not the motivation for posting this tip, but rather the fact that some terminals, especially when first configuring your client / initialization scripts don’t accept the home key correctly (or quite often your client isn’t sending it correctly).

Anyway, in that situation, its helpful to have a backup option and the “Escape #” method comes in very handy.

Leave the first comment