Browser Caching and How Reload vs Refreshing Works

I found out an interesting new thing about how browsers cache assets recently. I already knew that if you use Control+F5 in Windows or Command+Shift+r in OS X that it would clear the browser’s cache and reload the page from the server. This is very useful when you are developing and need to force the browser to load a new copy from your development server. Recently, when I was checking that the TrustAuth site had proper caching headers for the static assets I found out about a third way (1. cache clearing reload, 2. normal reload) the browser can reload the page.

It turns out that what I had thought was the same way of loading a page normally actually caused a different type of reloading. If you use Command+r or hit the reload button in Chrome or in Firefox, it tells the browser to perform a Conditional Get request for everything that it has in the cache to verify it is still valid. Because I didn’t realize that, I spent a while trying to figure out why it was sending Conditional Get requests for the assets that had a proper Cache-Control header and should have been loaded directly from the cache.

It was made all the more confusing for me when I was checking it with another website that did browser cache tests. It said all of the tests passed and sometimes I could see in the developer tools that it was loading the assets from the browser’s cache but I couldn’t see the pattern. Now that I know that about Command+r reloading, it makes perfect sense. Every time that it was loading from the cache, I had just clicked one of the links on the page for the next step of the test and instead of a reload.

I think that I should summarize the three types of page reloading / refreshing to clarify what they do.

  1. Command+Shift+r reloading: Reloading this way makes the browser ignore its cache and reload everything on the page. This is a great way to invalidate your browser cache when you are testing or developing.
  2. Command+r and the browser’s reload button: Reloading this way makes the browser fetch the page and revalidate everything it has for the page cached. This is often good enough to fetch updated assets assuming you are using ETags or Last-Modified headers.
  3. Clicking a link for the same page or pressing enter when the browser’s address bar has focus: Reloading this way makes the browser use everything for the page that it has cached. This is not usually what you want when you are developing but it is better for other websites when you just want to refresh the page to see if it has been updated.

In the past I always used Command+r for that purpose but now that I know the difference, I will only use this third method to save myself and the website I am checking from the unnecessary Conditional Get requests.

Posted in Uncategorized | Leave a comment

Fun with Ruby Enumerators

One of my favorite features in Ruby is the Enumerable module and the ability to create enumerators. For background, an enumerator in Ruby is like an iterator, it responds to the method next which returns successive elements from some container that is often an array or hash. Ruby’s Enumerable module contains many helper methods for working with enumerators such as each, map, reduce, all?, any?, and many more. These methods also return an enumerator which allows methods to be chained. As an example, say I have an array of objects we’ll call users that each have a name. If I want to transform that array into a new array where each object becomes an array of the letters in the name string, I can do the following:

> users.map { |user| user.name }.map { |name| name.chars }
=> [["B", "o", "b"], ["D", "a", "n"]]

What happens though when you do something like:

> (1...Float::INFINITY).map { |number| number * number }
=> # Runs until you run out of memory

This happens because Ruby doesn’t return from map until it runs out of items in the enumerable you call map on, in this case the range 1 to infinity. So how might we get this to work if we know how many numbers we want to process? Well we can change #map to be lazy by changing the example to:

> lazy_squares = (1...Float::INFINITY).lazy.map { |number| number * number }
=> #<Enumerator::Lazy: #<Enumerator::Lazy: 1...Infinity>:map>
> lazy_squares.take(10).to_a
=> [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This then returns the first 10 calls to our lazy enumerator. The call to #to_a is necessary because once you add #lazy, all returned enumerators are also lazy and #to_a triggers the evaluation of the enumerator to return a result. This allows us to write some pretty powerful transformations but only do the computation that we actually need.

Another cool feature of Ruby is what you can do with Ruby’s yield keyword. In a few projects now I have created an interfacing for dealing with iteration that is more complicated than just stepping through each item one-at-a-time. I recently wrote an example as part of an implementation for Conway’s Game of Life. I wrote a method that would iterate through each cell in the board and yield the cell along with the number of live neighbors for that cell. I would also set the new value of the cell to the return value of the block. This made it really simple to implement the rules for the game. Here’s what that looks like:

def next_state
  GameBoard.new(map_cell_with_live_neighbor_count { |cell, live_neighbors|
    if (cell && live_neighbors == 2) || live_neighbors == 3
      true
    else
      false
    end
  })
end

Here is the implementation for #map_cell_with_live_neighbor_count:

def map_cell_with_live_neighbor_count
  new_board = duplicate_board
  each_cell_with_location do |cell, (row_number, column_number)|
    live_neighbors = neighbors(row_number, column_number).count(true)
    new_board[row_number][column_number] = yield cell, live_neighbors
  end
  new_board
end

#duplicate_board does what you would expect, it returns a copy of the existing board of cells. Next, we step through each cell in the existing board. This is another use of yield to make a nice interface. #each_cell_with_location yields the value of each cell along with its location on the board. This allows us to count the number of live (true) cells using the #neighbors method which returns an enumerator so we can use the Enumerable #count method. Then we yield the value of the cell and the number of live neighbors and store the result in the new board at the appropriate location. Finally we return the new board for use in creating a new copy of the game board.

There are several functions more that make this work, too many to show here but if you’re interested you can see the implementation on Github.

I really like the interface you get for these methods by using yield. As seen above, it allows you to easily add layers of abstraction. In the #next_state method, I am given the number of live neighbors so that in that method I don’t need to worry about computing that and instead I can stick with the single responsibility of deciding what is the next state of the cell I am checking.

I actually have an even more complicated example that helps show off the power of using yield and enumerators like this but it is too large to copy into a blog post. Here is a quick summary though. This example is from a game combat simulator. To explain what the pieces represent requires a little bit of background on the rules of the game.

Many units in the game consist of a number of models (as in miniatures) that are aligned in a grid of ranks and files. To represent this, I effectively used a multi-dimensional array to track the location and what model was in each position. Each rank must contain exactly as many files as the rank in front of it with the sole exception of the most rear rank. The rules of the game don’t specify where the models in this last rank must be located so for this simulator, I used the Strategy Pattern (from the Gang of Four book).

To start it off I created three strategies to choose from: center alignment which tries to center the last rank, left alignment which pushes all models to the left side, and right alignment which pushes all models to the right side. Given that I know how wide (how many files) the unit is and which strategy is being used, there is a single position which represents the next place to add a model to the unit while observing the rules for the formation. Because of that, I could create a method that would yield the rank and file numbers for that next position successively until you no longer wanted to add more models. Check out this gist for what that looks like. I only included the implementation of the Left Alignment Strategy for brevity.

Posted in Enumerators, Functional, Ruby | Leave a comment

Cool Unix Terminal-fu

I ran into an interesting problem recently. Looking to do some analysis of the log for a Rails app I maintain, I fetched a copy from the server. Since I don’t have a log rotator running for this app, the log is roughly 340 megabytes or in other words, a ton of text! I wanted to explore this data quite a bit using grep and the like to see what the most accessed routes are, how fast are the request completion times, etc. The log contains data from October 2013 to the present and since the log is appended to, the most recent data is at the end which left me with a problem. How do I find the most recent matches for a grep search?

Since this is a very large file, I didn’t want to use tail and pipe that to grep (it turns out on my macbook pro that even 340 MBs of text only takes a few seconds to go through so it wasn’t actually a big deal). While looking for a solution and talking to a friend of mine about the problem I wondered, is there a way to pipe data between unix tools and have them only process the data required? As it turns out that there is! The tool we need is called Process Substitution. Process substitution allows us to easily pass data from one command to another command that expects a file as input. Here is a basic example. Say we are trying to find a process by name using:

$ ps aux | grep 'ruby'

We can get the same result using process substitution by changing that to:

$ grep 'ruby' <(ps aux)

Now there are several things to explain about this.

  1. The “<” should look familiar if you’ve used I/O redirection in the terminal before. It tells the shell to connect the stdin of grep to the stdout of the command run in parentheses.
  2. One of the features of using process substitution is parallelism. In this example, the ps command is actually run in a child process. This allows for better performance assuming we have multiple cpus or cpu cores.
  3. This wiring is accomplished by the shell substituting “<(ps aux)” with a file descriptor of a unix pipe whose input is given to the ps command running in the child process.

How does this allow us to make a solution for the original problem? Well as a reminder, my original problem was I wanted to be able to do was find the most recent matches to some regex pattern without needing to process the entire log file. I can find the 20 most recent starting request lines with 5 lines of context from the Rails log by running:

$ tail -r <(grep --max-count 20 --before-context 5 'Started' <(tail -r production.log))

Starting from the inside and working out, first we start tail reading the production.log file. The “-r” switch tells it to display the input in reverse order by line. This is what let’s us read from the file backwards. Next, we have that piped into grep’s stdin. We tell grep to stop after 20 matches and to give us 5 lines of context from before the match (for reference the first text of a line in a Rails log that starts a new request is ‘Started’). Why 5 lines before the match instead of after? Because the input to grep is in reverse order of course! So that gives us the 5 lines that appear below the match in the production.log. Lastly, this output is passed into another instance of tail that is told to output the reverse which gives us the result we want, the last 20 new requests from the production log.

Astute readers will have noticed by now that there is a easier way to do write the same command:

$ tail -r production.log | grep --max-count 20 --before-context 5 'Started' | tail -r

Not only does the above give us the exact same output as using process substitution but the shell also spawns each of those commands in child processes allowing them to run in parallel too. That doesn’t mean that process substitution is useless, however. The Wikipedia page on process substitution has an example I wish I knew about when I needed it a few days ago. Diff is a tool that requires its input be files which prevents us from connecting pipes like with the log example. This is where process substitution comes in. My problem was that I wanted to diff the files in two directories to see what was different. Since I didn’t know about process substitution I resorted to writing the file list to files and then comparing those. The alternative I now know about would be to run this instead:

$ diff <(find dir_a -type f | cut -f 2- -d '/') <(find dir_b -type f | cut -f 2- -d '/')

If you’re not familiar with find, “-type f” tells it to look for only files. Find returns the full path of the file relative to the current working directory so the pipe into cut is there to remove the name of the directory we’re looking in from each side.

The unix shell and the standard unix tools are a major part of what makes me love to use OS X or a Linux distro. There are so many common problems that are really easy to solve with a little knowledge and a few commands such as the comparison of two directories above.

Posted in Terminal / Command Line | Leave a comment

A couple rsync problems

A couple days ago I decided to move my podcast management to my desktop from my macbook pro. Since I’m running an SSD, I don’t have much space on the mac to store 80+ GBs of podcasts. So after rsyncing my current list and then painstakingly resubscribing to all of my podcasts in iTunes and adding the existing episodes, I decided to try to back them up using rsync again to my FreeNAS box.

Unfortunately, since I first tried to duplicate the command on Windows (even though I knew the file permissions would not map from Windows to Unix), the permissions of my main share folder were changed to 0000. Naturally, these are useless permissions for a network share. They also prevented the rsync backup from working on my mac.

Once I discovered the changed permissions, all that was necessary to fix the problem with my mac was to restore them using chmod. Then I turned to Windows to fix the permission settings.

I removed the ‘archive’ option from the command and replaced it with just recursive directories and preserve modification times (-rt). This fixed the problem and allowed rsync to successfully copy over all of the new podcasts. To be sure I decided to check the permissions on the new files. I was surprised to see that they were set to 0000. So I turned to the Internet to find how to make rsync set permissions.

Luckily it didn’t take long and I found this post on superuser. The answer is to add a switch to rsync to tell it to change the permissions. According to the poster, adding “–chmod=ugo=rwX” to the rsync command will tell the receiver to make the new files using the filemask. This created the new files with the expected permissions of 0644 as all of the other podcast episodes have been.

Lastly, since I had already copied over eight or so podcast episodes with bad permissions I decided to fix them. I knew it would be quite a bit of work to type out the file names of all of them or to go to each folder to find them and change them so I decided to see if you could find files by permissions. Thankfully the find command is capable of doing this using the “-perm” switch. Combing this with xargs and chmod, I had a one-liner that would find the files with broken permissions and change them to the proper ones.

$ find Podcasts -perm ugo=-rwx -print0 | xargs -0 chmod 644
Posted in Rsync, Terminal / Command Line | Leave a comment

Using a raw disk as a VirtualBox drive in Windows 7

The other day I was listening to episode 386 of Security Now! and learned from a listener who had written in that VirtualBox was capable of using physical disks for virtual machines and that this would work with SpinRite. Using this feature it is possible to run a SpinRite scan on a secondary disk in a virtual machine while still using the host machine. So to make sure I don’t forget how this was done and to hopefully help someone else out there, here are the steps that I took to get this working. An interesting thing to note here, since you can create raw drives in OS X, you should be able to run in the virtual machine a SpinRite scan on a secondary disk on a Mac!

Since it was listed on the manual page and I don’t want anyone to lose data, read this:

Warning

Raw hard disk access is for expert users only. Incorrect use or use of an outdated configuration can lead to total loss of data on the physical disk. Most importantly, do not attempt to boot the partition with the currently running host operating system in a guest. This will lead to severe data corruption.

For more information check out this section on creating raw disks in the VirtualBox manual https://www.virtualbox.org/manual/ch09.html#rawdisk.

Another final note: while I have not verified that this occurs, it is possible that Windows will assign a different disk number to the drive after rebooting. A few options for handling this include:

  1. Create a raw virtual disk for each drive number naming them, so you can attach the correct one to the virtual machine before running it but not need to recreate the raw disk files.
  2. After running the SpinRite scan delete the raw virtual disk vmdk file to make sure you don’t use it accidentally.
  3. Follow the instructions on this post by Kevin Cave to create a raw virtual disk file that will always point to the correct drive.

Creating the Virtual Drive

1. Connect your hard drive.

This should be obvious. The first thing you should do is connect up the drive that you need to scan to the host machine.

2. Find the disk number.

After you have booted your machine back up you need to find out what number Windows has given to your disk. You can find this information in the disk management pane. First, open the start menu and right click on “Computer”. Select “Manage” from the menu. Next, select “Disk Management” from the pane on the left under the heading “Storage”. In the middle section you will see all of your hard drives and removable media drives listed. At the top of the bottom half should be at least two disks, the first labeled “Disk 0”. Find the disk number for the drive you want to create the raw disk for (because you can’t use your host machine’s drive as a raw disk, obviously this drive should not have the (C:) partition on it).

Here’s a screenshot showing my desktop’s drives, the primary drive first as “Disk 0” and the drive I want to run the scan on is the second, “Disk 1”.

screenshot of disk management

3. Run VirtualBox as an administrator.

Right click on the VirtualBox shortcut and select “Run as administrator” from the menu. If / when UAC opens a box requesting permission to run, select “yes”.

4. Open up a command prompt.

Open the start menu and in the search box type “cmd”. In the list above right click on “cmd.exe” and select “Run as administrator”.

5. Navigate to the VirtualBox folder.

Next, you need to navigate to the folder where you installed VirtualBox. For me, this is the default location and since I’m running 64-bit Windows 7, the installer installed the 64-bit version of VirtualBox. For me this means I needed to “cd” (change directory) into the regular Program Files folder.

cd C:\Program Files\Oracle\VirtualBox\

Note: For the more savvy Windows users, all you really need to do is add your VirtualBox install directory to the PATH environment variable.

6. Enter the following command

To create the raw disk for use with VirtualBox type the following command in the command prompt filling in the file name and the disk number in place of the # symbol:

VBoxManage internalcommands createrawvmdk -filename "FILENAME" -rawdisk "\\.\PhysicalDrive#"

Make sure that you include the double quotes around the filename and the disk name. This ensures that if there is a space in your filename that the command uses the entire path.

This is the command that I ran to create mine as an example using the disk number shown in the screenshot above and saving the disk image to my Downloads folder.

VBoxManage internalcommands createrawvmdk -filename "C:\Users\dan\Downloads\internalssd.vmdk" -rawdisk "\\.\PhysicalDrive1"

7. Create a new VirtualBox VM for running SpinRite.

In order to run SpinRite, you’ll need to create a VM and select the type “Other” then OS “DOS”. Follow the wizard and select how much RAM you’d like to allocate (I chose 128 MB which worked fine) and you don’t need to create a virtual hard disk since you are not installing an operating system.

8. Add the newly created virtual disk to your VirtualBox VM.

Navigate to the location of the file you created with the command and add it to your SpinRite VM. Lastly, you’ll have to mount the SpinRite.iso in the VM as well by adding a new CD/DVD drive and selecting the file on your machine.

9. Start up the VM and run SpinRite!

If everything went fine, SpinRite should discover the drive and show it in the list.

Note: You’ll need to run VirtualBox as an administrator anytime you want to run the VM so it can access the drive.

spinrite welcome screenspinrite drive selectionspinrite runningspinrite finished

Running on OS X and Linux

The only changes to the above steps that should be required for OS X and Linux (I have not tested these but if they work for you or a something different is required let me know) is to change the name of the raw disk in the command. On Linux use “/dev/sda” or whatever your drive is. On OS X use “/dev/disk1” or whatever your drive is.

Troubleshooting

Hopefully those steps worked for you like they did for me. If you get an error like, “VERR_ACCESS_DENIED” you probably didn’t run the command prompt as an administrator. If you get an error like,

VBoxManage.exe: error: Failed to create the VirtualBox object!
VBoxManage.exe: error: Code CO_E_SERVER_EXEC_FAILURE (0x80080005) - Server execution failed (extended info not available)
VBoxManage.exe: error: Most likely, the VirtualBox COM server is not running or failed to start.

Then you probably didn’t run VirtualBox as an administrator.

Relevant Links not given above:

Posted in SpinRite, Virtual Machines, VirtualBox | Tagged , , , , , , | 14 Comments

Unicorn Rails Ubuntu init script

Today I’ve been updating the TrustAuth project website to reflect the changes I made the last couple of months. While I was deploying, I rediscovered the terrible state of the Ubuntu init script I had been using to run Unicorn. It wouldn’t properly stop Unicorn so I had to SSH into my server and restart Unicorn when I updated the website. If I didn’t it would still be serving the old Rails pages. So tonight I set out to fix this.

I started by trying to figure out why the script wasn’t creating the file for the PID. Looking through the script I had found in this setup guide I realized that it wasn’t writing the PID anywhere. I tried to use pidof to output the PID of the process but it wasn’t finding any unicorn_rails processes despite there being five listed with ps aux. This gave me the idea to look at one of the existing init scripts that I have been using already; specifically the Nginx script.

In the script I found that every command was using start-stop-daemon so I decided to look if it would work with Unicorn. I wasn’t sure because unicorn_rails is actually a Ruby script that launches Unicorn. Sure enough with a little bit of tinkering I was able to get a working init script that starts, stops, and restarts Unicorn. Not only that but start-stop-daemon outputs the PID so I have that too. Here’s the init script based off the one from the previously mentioned guide and with my modifications:

By using `which unicorn_rails` I don’t need to update the script when I update the version of Ruby I’m using. I hope this helps you as much as it has helped me. I tested this on Ubuntu Lucid Lynx with Ruby 1.9.3-p194.

Posted in Rails | Tagged , , , , | 2 Comments

Setting up a symbolic link for Apache in OS X Lion

Today I was fixing my OS X installation that had been wiped out before and ran across a problem that I forgot about. I store the code for the TrustAuth website in Dropbox for convenience and sym linked it to the appropriate folder to run the site locally. Apache was giving me the error:

Symbolic link not allowed or link target not accessible: /Users/dan/Sites

What I forgot was that I had to do before was change the permissions on the Dropbox folder.

In order for Apache to follow the symbolic links not only does the option “FollowSymLinks” need to be set in the config but every directory along the path from “/” needs to have the execute bit set. My Dropbox directory did not have this set so I had to set it with:

$ chmod a+x ~/Dropbox

Once I changed that Apache could read the files just fine.

Posted in Apache, Mac OS X Lion, PHP | Tagged , , , | 2 Comments

Site back up!

Well, my blog has been down for a while but I finally got it back up so that I can test out the TrustAuth WordPress plugin! Checkout the TrustAuth blog for the latest information about TrustAuth.

Posted in Uncategorized | Leave a comment