OREGON STATE UNIVERSITY

What are good file naming conventions?

File naming conventions are an agreed upon method of naming files that a group of users utilize to help better coordinate and manage their shared files. There is really no forced industry standard in IT for naming files but there are definitely good file naming conventions and there are bad ones.

Important things to consider when naming your files are spaces in file names, use of non-alphanumeric characters, and case.

Spaces in File Names

Generally, in the computer science and IT worlds, it is typically frowned on when files, web addresses - or really any kind of programming at all - are named with spaces inside of the title.

The reasoning behind this is very simple. Empty space signifies the "end" of a character string.  Spaces inside of a URL or a linked file basically generate a faulty syntax that the server reads as the end of a character string.  The server sees "the end" and stops processing. When the full string is not processed, it can not be properly represented on your computer screen.

Non-Alphanumeric Characters

Using non-alphanumeric characters in your file names can be disastrous.  Many characters such as exclamation points (!), question marks (?), and percent signs (%) have a very explict, programmatic purpose.  Other characters that may be commonly used in non-English languages may not be recognized.

There are really only two non-alphanumeric characters that you should use in naming your files: dashes and underscores.

Dashes are preferred for a couple of reasons.

  • They are visible when an underlined hyperlink is rendered on the screen - underscores get covered by the underline.
  • Some search engines do not recognize underscores and will drop pages that contain them. Other search engines may actually downgrade your page ranking when underscores are used.

If you prefer to space your words out for clarity, good search engine optimization (SEO) practice encourages the use of a dash versus an underscore.

Case

This element can potentially affect both computer performance as well as human usability.

In terms of computer performance, different operating systems read case in different ways.  A Microsoft operating system is case insensitive.  This means that it will read a capital letter M the same way it will read a lowercase letter m.  By itself, this is not a problem at all.  The problem can begin to manifest, though, when Microsoft computer operators are exposed to computing on the Internet - which is by and large ruled by an entirely different operating system known as Linux.

Linux is case sensitive.  "What's the big deal?" you may ask.  It becomes a big deal when performing file operations between a Linux based web server and a Microsoft based local machine.  Here is the potential problem:

A user would like to download a directory of files to their desktop.  Some of these files have similar names - the only difference is the case.  For example there is a file named My-File and one named my-file.  In Linux this is fine, because Linux clearly distinguishes the M from the m. 

When the files get transferred to a Windows computer, though, one of those files will be overwritten because Windows sees the file name "My-File" as being exactly the same as the file name "my-file".

In terms of usability, a Linux operating system organizes things in a file directory based upon a character code.  A capital letter M, for example, has a different code than it's lowercase m version.  If there are only a few files in your directory, this doesn't pose much of an issue.  If there are many files, though, it can be very difficult to locate the file you need because the computer not only has to sort things based on alphanumeric, but it groups them by the character code as well. 

Capital letters have lower character codes than lower case letters and are grouped that way by the operating system.

For example, names beginning with capital M will group and sort alphabetically and then names beginning capital N will display, and then names beginning with capital O, etc.  The lowercase letters will all display grouped together after the upper cased ones. This is why it's suggested that file names be used that are all lowercase.

How words are separated will also change the ordering of your files.  MyFile is different than myfile, and both are different from my-file, My-File, my_file, My_File.

If multiple users will be involved in a site, this is often a subject that must be agreed upon by the users. Come to an agreement on how files should be named and make sure to communicate your convention to new users who are introduced to your system.