Library search via web page

A few month ago, I wrote about three short Python scripts I could run from the command line to search for books in my technical library. The information on the books and authors was kept in an SQLite3 database, and the SQL queries built into the scripts were originally written with the help of ChatGPT and then improved with comments from readers. I said at the time that my goal was to move the database to a server and access it through a web page, the idea being that I could check on what books I have when I’m away from my computer. (In a used book store, for example.) I made the move last weekend and figured I’d write up a quick overview.

There’s just a single web page. Here’s what it looks like on my iPhone (a 16 Pro running Safari) before and after a search:

Library search on iPhone

It’s set up to search by title, author, or both, depending on which fields I fill in.

Because I’m not a web programmer and I wanted to reuse as much of the previously written Python code as possible, the page is generated by an old-fashioned CGI script. This one:

python:
  1:  #!/path/to/python3
  2:  
  3:  import os
  4:  from urllib.parse import parse_qs
  5:  import sqlite3
  6:  
  7:  # Get form data from QUERY_STRING
  8:  def parse_query_string():
  9:      query = os.environ.get("QUERY_STRING", "")
 10:      params = parse_qs(query)
 11:      return {k: v[0] for k, v in params.items() if v}
 12:  
 13:  tString = ''
 14:  aString = ''
 15:  params = parse_query_string()
 16:  tString = params.get('title', '').strip()
 17:  aString = params.get('author', '').strip()
 18:  
 19:  # Set up query strings
 20:  qTitle = '''SELECT b.loc, b.title, GROUP_CONCAT(a.name, '; ')
 21:  FROM book_author ba
 22:  JOIN author a ON a.id = ba.author_id
 23:  JOIN book b ON b.id = ba.book_id
 24:  WHERE b.title LIKE ? GROUP BY b.id ORDER BY b.loc;'''
 25:  
 26:  qAuthor = '''SELECT b.loc, b.title, GROUP_CONCAT(a.name, '; ')
 27:  FROM book_author ba
 28:  JOIN author a ON a.id = ba.author_id
 29:  JOIN book b ON b.id = ba.book_id
 30:  WHERE b.id IN (
 31:    SELECT ba.book_id
 32:    FROM book_author ba
 33:    JOIN author a ON a.id = ba.author_id
 34:    WHERE a.name LIKE ?
 35:  )
 36:  GROUP BY b.id ORDER BY b.loc;'''
 37:  
 38:  qTitleAuthor = '''SELECT b.loc, b.title, GROUP_CONCAT(a.name, '; ')
 39:  FROM book_author ba
 40:  JOIN author a ON a.id = ba.author_id
 41:  JOIN book b ON b.id = ba.book_id
 42:  WHERE b.title LIKE ?
 43:  AND b.id IN (
 44:    SELECT ba.book_id
 45:    FROM book_author ba
 46:    JOIN author a ON a.id = ba.author_id
 47:    WHERE a.name LIKE ?
 48:  )
 49:  GROUP BY b.id ORDER BY b.loc;'''
 50:  
 51:  # Query the database
 52:  def search(t, a):
 53:      con = sqlite3.connect('library.db')
 54:      cur = con.cursor()
 55:      if (t != '') and (a == ''):
 56:        sql = qTitle
 57:        vals = [f'%{t}%']
 58:      elif (t == '') and (a != ''):
 59:        sql = qAuthor
 60:        vals = [f'%{a}%']
 61:      elif (t != '') and (a != ''):
 62:        sql = qTitleAuthor
 63:        vals = [f'%{t}%', f'%{a}%']
 64:      else:
 65:        results = []
 66:        con.close()
 67:        return results
 68:      cur.execute(sql, vals)
 69:      results = cur.fetchall()
 70:      con.close()
 71:      return results
 72:  
 73:  results = search(tString, aString)
 74:  if results:
 75:    rcount = len(results)
 76:  else:
 77:    rcount = 0
 78:  
 79:  if rcount == 1:
 80:    rheader = '1 Result'
 81:  else:
 82:    rheader = f'{rcount} Results'
 83:  
 84:  # Output HTML
 85:  print(f'''Content-Type: text/html
 86:  
 87:  <html>
 88:    <head>
 89:      <title>Library Search</title>
 90:      <link rel="stylesheet" href="search.css">
 91:    </head>
 92:    <body>
 93:      <h1>Search Library</h1>
 94:      <form method="get" action="search.py">
 95:        <p class="desc">Enter strings for title, author, or both</p>
 96:        <label for="title">Title:</label>
 97:        <input type="text" name="title" value="{tString}">
 98:        <label for="author">Author:</label>
 99:        <input type="text" name="author" value="{aString}">
100:        <input type="submit" value="Submit">
101:      </form>
102:      <div class="results">
103:      <h2>{rheader}</h2>''')
104:  
105:  if results:
106:    print('<ul>')
107:    for row in results:
108:        print(f'<li>{"<br/>".join(row)}</li>')
109:    print('</ul>')
110:  
111:  print('''    </div>
112:    </body>
113:  </html>''')

Once upon a time, a script like this would’ve imported the cgi module to handle the form data. But that module was deprecated in Python 3.11 and removed in 3.13, so the parse_query_string function in Lines 8–11 was built “by hand” by pulling in data from the $QUERY_STRING environment variable.

(Unfortunately, the word query will be doing double-duty in this post. In the parse_query_string function, it means the query portion of a URL—the part after the question mark. In the rest of the script, it’ll mean an SQL query. The potential confusion is unavoidable; query is the standard term for both situations.)

The title and author strings, tString and aString, are set in Lines 13–17. The default values are empty strings; they get set to the corresponding field values when the script is run via the Submit button.

The next portion of the script, Lines 20–49, define the three SQL queries we’ll need: one for searching on a title only, one for searching on an author only, and one for searching on both. These strings were taken from the earlier Python scripts.

Lines 52–71 define the search function, which runs the appropriate SQL query according to which strings were provided in the input fields. Lines 73–82 perform the search and set up the header for the Results section of the page.

The rest of the script outputs the page’s HTML. The part that’s most variable is given in Lines 105–109, which spit out the search results in a unordered list.

The CSS file that styles the page is this:

css:
  1:  /* Base styles */
  2:  body {
  3:    font-family: Helvetica, sans-serif;
  4:    margin: 1em;
  5:    padding: 0;
  6:    line-height: 1.6;
  7:    background-color: #f9f9f9;
  8:    color: #333;
  9:  }
 10:  
 11:  /* Form container */
 12:  form {
 13:    max-width: 40em;
 14:    margin: 0 auto 2em;
 15:    background-color: #fff;
 16:    padding: 1em;
 17:    border-radius: 8px;
 18:    box-shadow: 0 0 8px rgba(0, 0, 0, 0.1);
 19:  }
 20:  
 21:  /* Form elements */
 22:  form p {
 23:    padding-top: 0em;
 24:    margin-top: 0em;
 25:    font-size: 1.25em;
 26:  }
 27:  
 28:  form label {
 29:    display: block;
 30:    margin-bottom: 0.5em;
 31:    font-weight: bold;
 32:  }
 33:  
 34:  form input[type="text"] {
 35:    width: 100%;
 36:    max-width: 100%;
 37:    padding: 0.5em;
 38:    font-size: 1em;
 39:    margin-bottom: 1em;
 40:    box-sizing: border-box;
 41:  }
 42:  
 43:  form input[type="submit"], form button {
 44:    padding: 0.6em 1.2em;
 45:    font-size: 1em;
 46:    cursor: pointer;
 47:    background-color: #007bff;
 48:    border: none;
 49:    border-radius: 4px;
 50:    color: #fff;
 51:  }
 52:  
 53:  form input[type="submit"]:hover, form button:hover {
 54:    background-color: #0056b3;
 55:  }
 56:  
 57:  /* Results list */
 58:  .results {
 59:    max-width: 40em;
 60:    margin: 0 auto;
 61:    padding: 1em;
 62:    background-color: #fff;
 63:    border-radius: 8px;
 64:    box-shadow: 0 0 8px rgba(0, 0, 0, 0.1);
 65:  }
 66:  
 67:  .results h2 {
 68:    margin-top: 0em;
 69:    padding-top: 0em;
 70:  }
 71:  
 72:  .results ul {
 73:    font-size: 1.1em;
 74:    line-height: 1.25em;
 75:    padding-left: 1.2em;
 76:  }
 77:  
 78:  .results li {
 79:    margin-bottom: 0.75em;
 80:  }
 81:  
 82:  /* Responsive tweaks */
 83:  @media (pointer: coarse) {
 84:    html {
 85:      font-size: 30px;
 86:    }
 87:    body {
 88:      margin: 0.5em; 
 89:    }
 90:  
 91:    form, .results {
 92:      padding: 0.8em;
 93:    }
 94:    
 95:    form input[type="text"], form button {
 96:      width: 100%;
 97:      font-size: 1.25em;
 98:      margin-top: 0.5em;
 99:    }
100:    form input[type="submit"], form button {
101:      width: 100%;
102:      margin-top: 0.5em;
103:    }
104:  }

About the only clever thing in this is the use of pointer: coarse to make the page easier to read on an iPhone. I started out trying a responsive design with different width values in the @media query (oops! there’s a third use of that word), but I wasn’t happy with the results. I didn’t want the text to get bigger just because the width of the browser window on my Mac got narrow. And I didn’t want to rewrite this when I got a new (and different resolution) phone. Then I saw the pointer feature and decided that was the way to go. Yes, it might make the text larger than I like on my iPad, but I don’t see myself using this on my iPad.

The Python script, the CSS file, and the database file are all kept in the same directory, a directory that’s password-protected through few Apache settings. Whenever I get a new book, I’ll add it to the local version of the database file and then upload that to replace the one on the server.


NBA Finals and Pandas

The NBA season ended last night as the Thunder beat the Pacers 103–91 in the seventh game of the Finals. I read this morning that this was the 20th Finals to go seven games in the 79-year history of the NBA, and I wondered what the distribution of game counts was.

I found all the Finals results on this Wikipedia page in a table that starts out this way:

NBA Finals table excerpt from Wikipedia

I figured I’d use what I learned about how Pandas can read HTML tables to extract the information I wanted. So I started an interactive Python session that went like this:

python:
>>> import pandas as pd
>>> dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_NBA_champions')
>>> df = pd.DataFrame({'Result': dfs[2]['Result']})
>>> df[['West', 'East']] = df.Result.str.split('–', expand=True).astype(int)
>>> df
     Result  West  East
0     1–4     1     4
1     4–2     4     2
2     4–2     4     2
3     4–2     4     2
4     4–3     4     3
..    ...   ...   ...
74    2–4     2     4
75    4–2     4     2
76    4–1     4     1
77    1–4     1     4
78    4–3     4     3

[79 rows x 3 columns]
>>> df['Games'] = df.West + df.East
>>> for n in range(4, 8):
...     print(f'{n}  {len(df[df.Games == n])}')
...     
4  9
5  20
6  30
7  20

OK, you’re right, this is an edited version of the session. The command that created the West and East fields took a few tries to get right. Also, I’m pretty sure there’s a better way to put the Results column from the original table into its own dataframe, but creating an intermediate dictionary was the first thing that came to mind. Overall, I was pleased that I didn’t need to do much thrashing about; I’ve used Pandas long enough now to get most things right or nearly right on the first try.

The upshot is this:

Games Count
4 9
5 20
6 30
7 20

I expected the 4–0 series to be the least common. As a Bulls fan, I suppose I should have guessed that 4–2 series are the most common—Michael Jordan was responsible for five of them.

If next year’s Finals is a sweep, I will definitely do this again. What a neat and tidy table that will be.


Graphing without empty spaces

I’ve mentioned here before that I was on a dietary program to keep my Type II diabetes under control. The program and its associated app also kept track of my blood pressure with a Bluetooth-connected cuff that I used once a week. I left the program at the beginning of the year (insurance wouldn’t cover it anymore), but I’ve continued the diet and tracking my blood pressure. I don’t know if it’s possible to connect to the cuff’s Bluetooth signal, and even if it is, I don’t have the programming chops to do it. But I’ve kept taking my blood pressure once a week and entering it into the Health app on my phone.

Unfortunately, the Health app’s way of plotting blood pressure is kind of crappy. Here’s what the past six months looks like:

Blood pressure graph from Health

Lots of wasted space in there, and because the range is so broad, I can’t see at a glance where I stand. Since seeing at a glance is sort of the whole point of plotting data, I would say this graph is basically useless.

Plotting the data using a tighter range would be better, as it would give me a better chance to figure out the various values to within a couple of mm Hg. Here’s an example:

Blood pressure single plot

This still seems like it could be improved. We’ve traded a bunch of wasted space above the systolic readings for a bunch of wasted space between the systolic and diastolic values. In some situations, it’s good to see how a gap between two sets of values compares to the variation within a set, but I don’t see much use in it for this type of data.

This data could use a scale break. In effect, this means making two plots but arranging them into a single figure to take advantage of their common parts. Here’s one way to do it:

Blood pressure subplots

The scales of the two parts are the same, so the larger variation in systolic values is properly represented. We’ve just cut out the empty space between the systolic and diastolic and pushed them together.

William Cleveland, author of The Elements of Graphing Data, is not a fan of scale breaking:

Use scale breaking only when necessary. If a break cannot be avoided, use a full scale break. Do not connect numerical values on two sides of a break. Taking logs can cure the need for a break.

In this case, taking logs would make the cure worse than the disease, as it would make reading the values harder. If you’re wondering about the difference between full and partial breaks, a partial break is when an axis is broken by a wavy or zigzag line, like this:

Partial scale break

Cleveland thinks partial breaks are too easy for readers to overlook. A full break is what we’ve done, so he may forgive what we’ve done.

Despite Cleveland’s admonitions, scale breaks can be effective and attractive. Here’s an example from Modern Timber Engineering by Scofield and O’Brien:

Scofield connector graph

This gives the capacities of a type of wood connector under a variety of conditions. The graph is broken into three groups according to wood species, and what makes scale breaks useful here is that the plots would overlap and be impossible to read without the breaks. The authors could have make this three separate graphs, of course, but putting them together into a single figure emphasizes the interrelatedness of the data. And the curving gaps between the sections look really cool.

I’m not saying Apple should try curvy scale breaks in the Health app, but it wouldn’t take much to make the blood pressure graphs a lot more useful.


Technical editor needed

I read this article from Scientific American about the GBU-57/B, the “bunker buster” bomb that Donald Trump will… or won’t… or will… or won’t allow Israel to use on Iran’s Fordo nuclear facility. The facility is buried deep within a mountain, and the GBU-57/B is the only non-nuclear bomb that may be able to destroy it. The article is worth reading, but if you do, you’ll probably notice some obvious errors.

The first error is related to concrete, which is why I picked up on it. Here’s the passage:

According to a 2012 Congressional Research Service briefing, the GBU-57/B has been reported to burrow through 200 feet of concrete or bedrock with a density of 5,000 pounds per square inch (comparable to the strength of bridge decks or parking-garage slabs).

The 5,000 psi figure refers to the compressive strength of concrete, not its density. Back when I was a student, 5,000 psi was kind of on the strong side for commercially available concrete; now it’s a garden-variety strength, as suggested by the parenthetical comment. The compressive strength of intact rock is often much higher than this, but natural rock formations tend to have joints and other defects that reduce their strength. By the way, even if you don’t have much experience with concrete or rock, you should know that something’s fishy with this passage. Density is weight or mass per unit volume—it can’t be measured in pounds per square inch.

Later, we see this:

About one fifth of the warhead’s 5,342-pound total weight is made up of two explosives: 4,590 pounds of AFX-757 plus 752 pounds of PBXN-114.

Since the sum of the two explosive weights—4,590 lbs and 752 lbs—is equal to 5,342 lbs, it’s hard to see how their sum could be one-fifth of that total. I’m guessing the intention here is to say that the combined explosive weight is about one-fifth of the missile’s total weight, which is given earlier in the article as about 30,000 lbs.

There’s also a discussion of how the ogive shape of the missile’s nose gives it both good aerodynamic and good penetrating properties. There’s nothing wrong with this, but it suggests the shape is something special. It isn’t. The ogive shape is common in rockets, missiles, and bullets. Maybe the GBU-57/B’s ogive is unusual in some way, but if it is, the article doesn’t say so.

I should say that this article isn’t in the Scientific American magazine proper, it’s just on the web, and maybe web articles aren’t given the same scrutiny as print articles. It does seem odd, though, that piece coming out under the SciAm name is edited at the same level as a blog post.