Python Pipe Operations

Python programs can link their input and output streams together using pipe (‘|’) operations. Piping isn’t a feature of Python, but rather comes from operating system. Nevertheless, a Python program can leverage such a feature to create powerful operating system scripts that are highly portable across platforms. For example, developer toolchains can be scripted together using Python. I personally have used Python to feed input into unit testing for my Java/Kotlin programs.

This post is a modified example of a demonstration found in Programming Python: Powerful Object-Oriented Programming. It uses a producer and consumer script to demonstrate one program feeding input into another Python program.

writer.py

Here is the code for writer.py

family = [
    'Bob Belcher',
    'Linda Belcher',
    'Tina Belcher',
    'Gene Belcher',
    'Louise Belcher'
]
for f in family:
    print(f)

Nothing special here. We are just building up a list that prints out the names of our favorite TV family, the Belchers.

reader.py

This code receives the output from writer.

while True:
    try:
        print('Entering {}'.format(input()))
    except EOFError:
        break

Once again, this is a simple script. Without a pipe operation, the input() statement on line 3 would normally collect the input from the keyboard. That’s not what is going to happen here.

Demonstration

We are going to execute these scripts by running the command below in the terminal.

python writer.py | python reader.py

The pipe '|' character does the job of connecting writer.py's output stream to reader.py's input stream. Thus, print statements in writer.py connect to input statements in reader.py. Here is the output of this operation.

Entering Bob Belcher
Entering Linda Belcher
Entering Tina Belcher
Entering Gene Belcher
Entering Louise Belcher
Advertisements

Python Console Streams

Programs on all major operating systems are connected to input and output streams (and usually an error stream). When run from the commandline without a redirect operator, a program is normally connnected to the shells standard input stream (where a user can type commands into the program) and the standard output stream (which prints output back on the console).

However, we aren’t limited to such streams. It’s perfectly possible to use the contents of a file as a program’s input stream or even use the output of one program and link it’s output to another program’s input stream. Such chaining of streams allows for powerful OS scripting.

This isn’t really a Python feature, but here is an example found in Programming Python: Powerful Object-Oriented Programming that demonstraes connecting the output of one OS stream to the input stream of a Python program. I added some comments that help explain the program.

teststreams.py

def interact():
    print('Hello stream world')
    while True:
        try:
            # Input normally reads from the keyboard because our program
            # is connected to that input stream. However, if we execute this program
            # in a way that connects the program's input to some other stream,
            # the input command reads from there instead!
            reply = input('Enter a number => ')
        except EOFError:
            # We have reached the end of our input stream (for example user entered ctrl+c at the shell)
            # So we exit the looop
            break
        else:
            num = int(reply)
            print("%d squared is %d" % (num, num ** 2))
    print('Bye')

if __name__ == '__main__':
    interact()

When this program is run on it’s own, it will collect input from the keyboard until we press ctrl+c. That’s not the part that we are demonstrating here. Let’s suppose we have a text file that has the following contents.

input.txt

1
2
3
4
5
6
7
8
9
10

Now when we run our program with a redirect operator, we get the following output.

Patricks-MacBook-Pro:Streams stonesoup$ python teststreams.py  1 squared is 1
Enter a number => 2 squared is 4
Enter a number => 3 squared is 9
Enter a number => 4 squared is 16
Enter a number => 5 squared is 25
Enter a number => 6 squared is 36
Enter a number => 7 squared is 49
Enter a number => 8 squared is 64
Enter a number => 9 squared is 81
Enter a number => 10 squared is 100
Enter a number => Bye

Notice the unix redirect operator. This program was run python teststreams.py < input.txt. That < input.txt connects the contents of input.txt to the teststreams.py script. Thus, when input is called, the function simply collects the next line in input.txt rather than waiting for the keyboard.

Python Environmental Variables

Every operating system allows system adminstrators and users to set and use environmental variables. Such variables can be highly useful when writing shell scripts that automate commands and executes tasks on the system. It’s highly feasible to imagine situations where build toolchains may need to know about the location of programs such as compilers, linkers, etc.

Python’s os module has an environ object. It acts like a dictionary and allows scripts to read and write to enviornmental variables. Here are two example scripts found in Programming Python: Powerful Object-Oriented Programming that demonstrate using environmental varaibles in Python. I added comments to help explain the programs.

setenv.py

import os

# First print the value associated with the USER
# environmental variable
print('setenv...', end=' ')
print(os.environ['USER'])

# Now overwrite the value stored in USER. It will
# propagate through the system
os.environ['USER'] = 'Brian'

# Run the echoenv script to prove the environmental
# variable has been updated
os.system('python echoenv.py')

# Repeat
os.environ['USER'] = 'Arthur'
os.system('python echoenv.py')

# This time, we ask the user for a name
os.environ['USER'] = input('?')

# Now we are going to run the echoevn script, but connect
# it's output to this process and print
print(os.popen('python echoenv.py').read())

echoenv.py

import os
print('echoenv...')

# Just print the value for USER found in os.environ
print('Hello, ', os.environ['USER'])

Detailed Explanation

Although the main purpose of this script is to demonstrate how to use environmental variables in Python. However, we are going to use the os.system and os.popopen functions to execute our helper echoenv script to help prove that changes to the environmental variables propogate throughout the system.

Line 6 is the first call to os.environ. We look up the USER value in this environ object which prints out the current user to the console. On line 10, we overwrite the value in USER with “Brian”. Now Brian is the user. Line 14 proves the change by calling os.system and executing echoenv.py. That script will print “Brain” in the console.

Lines 17 and 18 are a repeat of lines 10 and 14. Line 21 is only different in the sense that we ask the user for a value this time. Line 25 uses the os.popopen command to execute echoenv.py. This function returns an object that gives us a handle into the sister process. Rather than having echoenv print to the console directly this time, we use the read() method to print the output in the setenv process.

Example Output

Here is the output when run on my machine.

Patricks-MacBook-Pro:Environment stonesoup$ python setenv.py
setenv... stonesoup
echoenv...
('Hello, ', 'Brian')
echoenv...
('Hello, ', 'Arthur')
?Bob Belcher
echoenv...
('Hello, ', 'Bob Belcher')

Parse Command Line Arguments in Python

Most command line programs allow users to pass command line arguments into a program. Consider, for example, the unix ls:

ls -a

Python programs can utilize such techniques by using sys.argv and performing string operations on each element in the argv list. Here is an example script from Programming Python: Powerful Object-Oriented Programming with my own comments added to the script. (Of course, the Python standard library provides much more sophisticated getopt and optparse modules which should be used in production programs)

def getopts(argv):
    # Create a dictionary to store command line options
    opts = {}

    # Iterate through each command line argument
    while argv:
        # Look for the dash symbol
        if argv[0][0] == '-':

            # Create an entry in the opts dictionary
            opts[argv[0]] = argv[1]

            # Slice off the last entry and continue
            argv = argv[2:]
        else:

            # Otherwise just slice of the last entry since there
            # was no command line switch
            argv = argv[1:]

    # Return the dictionary of the command line options
    return opts

# Only run this code if this is a standalone script
if __name__ == '__main__':
    # Put the argv option into our namespace
    from sys import argv

    # call our getopts function
    myargs = getopts(argv)

    # Print our output
    if '-i' in myargs:
        print(myargs['-i'])

    print(myargs)

Detailed Explanation

Our goal in this script is to look for any command line switches that are passed to the script. To start with, we need to import the argv object from the sys module which is a list that provides the command line options passed to the script. On line 30, we call our getopts function and pass the argv object to the function. The program’s execution jumps up to line 3.

Line 3 creates an empty dictionary object that will store all command line switches with their arguments. For example, we may end up with ‘-n’:”Bob Belcher” in this dictionary. Starting on line 6, we enter into a while loop that examines each entry in argv.

Line 8 looks at the first character in the first entry of argv (remember that Python String implement the [] operator). We are just checking if the character in question is a dash (‘-‘) character. If this character is a dash, we are going to create an entry into our opts dictionary on line 11. Then we slice off the next 2 entries in argv from the beginning of the list (one for the switch, the other one for the argument).

Line 15 is the alternative case. If argv[0][0] is something other than a dash, we are going to ignore it. We just slice off the first entry in argv (only one this time since there is no switch). Once argv is empty, we can return the opts dictionary to the caller.

The program execution resumes on line 33. For demonstration purposes, it checks if the myargs dictionary has a ‘-i’ key. If it does, it will print the value associated with ‘-i’. Then line 36 prints out the myarg dictionary.

When run from the console, the script will show output such as this.

Patricks-MacBook-Pro:system stonesoup$ python testargv2.py -h "Bob Belcher" -w "Linda Belcher" -d1 "Tina Belcher" -s "Gene Belcher" -d2 "Louise Belcher" 
{'-h': 'Bob Belcher', '-w': 'Linda Belcher', '-d1': 'Tina Belcher', '-s': 'Gene Belcher', '-d2': 'Louise Belcher'}

Python Command Line Arguments

The sys module provides developers with an access point to command line arguments. Here is an example program that prints command line arguments to the console.

import sys

# The sys object has an argv field that is a
# list of the command line arguments passed to the program
# The first entry is the name of the script
print(sys.argv)

Here is the program’s output when run from the command line.

Patricks-MacBook-Pro:system stonesoup$ python testargv.py
['testargv.py']

The sys.argv object is a list of all command line arguments supplied to the script when run as a python program. Generally speaking, the first entry in the list is the name of the script. For more information on Python, see Programming Python: Powerful Object-Oriented Programming

Python Page Through A File

Many operating systems have command line tools that allow a user to page through a file in chunks. As a demonstration of how to read text files in Python, I used an example from Programming Python: Powerful Object-Oriented Programming.

Code

def more(text, numlines=15):
    # This splits the text into a list object based on line
    # endings
    lines = text.splitlines()

    # Now continue to loop until we are out of lines
    while lines:
        # Slice off numLines into chunk
        chunk = lines[:numlines]
        
        # Remove numLines from the beginning of lines
        lines = lines[numlines:]

        # Now loop through each line in chunk
        for line in chunk:
            # and then print a line
            print(line)
            
        # Now ask the user if we want to keep going
        if lines and input('More?') not in ['y', 'Y']:
            break

if __name__ == '__main__':
    # Import sys so that we can read command line arguments
    import sys
    
    # Next, we are grabbing the first argument from the
    # command line, and passing it the open function
    # which returns a file object. Calling read on this
    # object will dump the contents of the file into a String
    # which gets passed to our more function above
    more(open(sys.argv[1]).read(), 10)

Detailed Explanation

The comments in the code above are mine and explain what is going on in the program. The program starts by testing if this script is getting called as a standalone program or if we are importing this code as a module.

Assuming this is a standalone program, we import the sys module so that we can examine the command line arguments. The second command line argument needs to be a text file or this program will crash. We pass the name of the file to the open function, which returns a file object. Calling read() on the file object dumps the entire contents of the file into a String.

At this point, we pass the string into our more() function. It starts out by splitting the string by lines, which returns a list object. We start to loop through this list object, which continues until the list is empty.

Inside of the while loop, we slice off numLines from lines and store then in chunk. Then we remove those lines from the lines list. The next step is to print out each line in chunk. Once that is complete, we test if we still have more lines to print and if we do, we ask the user if they want to keep going or exit.

Here is the program output when run on my screen.

Patricks-MacBook-Pro:System stonesoup$ python more.py more.py
def more(text, numlines=15):
    lines = text.splitlines()

    while lines:
        chunk = lines[:numlines]
        lines = lines[numlines:]

        for line in chunk:
            print(line)
        if lines and input('More?') not in ['y', 'Y']:
More?y
            break

if __name__ == '__main__':
    import sys
    more(open(sys.argv[1]).read(), 10)

Python Current Working Directory

Many programs have a need to figure out the current working directory (CWD) at runtime. The Python os package has a getcwd() function that returns a program’s CWD. This is an example taken from Programming Python: Powerful Object-Oriented Programming

Code

import os, sys

# This prints the current working directory
print('my os.getcwd =>', os.getcwd())

# This prints the system path
print('my sys.path =>', sys.path[:6])

input('Press any key to exit')

Explanation

There isn’t much going on in this program. The first line imports the os and sys modules. The next line calls the print statement and passes the value returned from os.getcwd(). That will print the current working directy.

The next line prints the system paths, limited to 6 paths. Finally there is an input statement that causes the program to wait until the user presses a key to exit the program.

Output

Here is the output when ran on my system.

my os.getcwd => /Users/stonesoup/IdeaProjects/ProgrammingPython/PP4E/System
my sys.path => ['/Users/stonesoup/IdeaProjects/ProgrammingPython/PP4E/System', '/Users/stonesoup/Library/Application Support/IntelliJIdea2017.2/python/helpers/pydev', '/Users/stonesoup/IdeaProjects/ProgrammingPython', '/Users/stonesoup/IdeaProjects/ProgrammingPython/PP4E', '/Users/stonesoup/IdeaProjects/ProgrammingPython/PP4E/System', '/Users/stonesoup/Library/Application Support/IntelliJIdea2017.2/python/helpers/pydev']
Press any key to exit