Filesystem Tutorial |
Filesystem Home Releases Reference Tutorial FAQ Portability V3 Intro V3 Design Deprecated |
Introduction
Preliminaries
Reporting the size of a file - (tut1.cpp)
Using status queries to determine file existence and type - (tut2.cpp)
Directory iteration plus catching
exceptions - (tut3.cpp)
Using path decomposition, plus sorting results - (tut4.cpp)
Class path: Constructors, including
Unicode - (tut5.cpp)
Class path: Generic format vs. Native format
Class path: Iterators, observers, composition, decomposition, and query - (path_info.cpp)
Error reporting
This tutorial develops a little command line program to list information
about files and directories - essentially a much simplified version of the POSIX ls
or Windows dir
commands. We'll start with the simplest possible version and progress to more
complex functionality. Along the way we'll digress to cover topics you'll need
to know about to understand Boost.Filesystem.
Source code for each of the tutorial programs is available, and you are encouraged to compile, test, and experiment with it. To conserve space, we won't always show boilerplate code here, but the provided source is complete and ready to build.
Install the Boost distribution if you haven't already done so. See the Boost Getting Started docs.
This tutorial assumes you are going to compile and test the examples using the provided scripts. That's highly recommended.
If you are planning to compile and test the examples but not use the scripts, make sure your build setup knows where to locate or build the Boost library binaries.
Fire up your command line interpreter, and type the following commands:
Ubuntu Linux | Microsoft Windows |
$ cd boost-root/libs/filesystem/example/test $ ./setup $ ./bld $ ./tut1 Usage: tut1 path |
>cd boost-root\libs\filesystem\example\test >setup >bld >tut1 Usage: tut1 path |
If the tut1
command outputs "Usage: tut1 path
", all
is well. A set of tutorial programs has been copied (by setup
) to
boost-root
/libs/filesystem/example/test
and then built. You are encouraged to modify and experiment with them as the
tutorial progresses. Just invoke the bld
script again to rebuild.
If something didn't work right, here are troubleshooting suggestions:
bjam
program executable isn't being found.
Check your path environmental variable if it should have been found,
otherwise see
Boost
Getting Started.bjam.log
to try to spot an indication of the
problem.Let's get started. One of the simplest things we can do is report the size of a file.
tut1.cpp #include <iostream> #include <boost/filesystem.hpp> using namespace boost::filesystem; int main(int argc, char* argv[]) { if (argc < 2) { std::cout << "Usage: tut1 path\n"; return 1; } std::cout << argv[1] << " " << file_size(argv[1]) << '\n'; return 0; } |
The Boost.Filesystem file_size
function returns a uintmax_t
containing the size of the file named by the argument. The declaration looks
like this:
uintmax_t file_size(const path& p);
For now, all you need to know is that class path has constructors that take
const char *
and many other useful types. (If you can't wait to
find out more, skip ahead to the class path section of
the tutorial.)
Please take a minute to try out tut1
on your system, using a
file that is known to exist, such as tut1.cpp
. Here is what the
results look like on two different operating systems:
Ubuntu Linux | Microsoft Windows |
$ ./tut1 tut1.cpp tut1.cpp 569 $ ls -l tut1.cpp -rwxrwxrwx 1 root root 569 2010-02-01 07:31 tut1.cpp |
>tut1 tut1.cpp tut1.cpp 592 >dir tut1.cpp ... 01/30/2010 10:47 AM 592 tut1.cpp ... |
So far, so good. The reported Linux and Windows sizes are different because
the Linux tests used "\n"
line endings, while the Windows tests
used "\r\n"
line endings.
Now try again, but give a path that doesn't exist:
Ubuntu Linux | Microsoft Windows |
$ ./tut1 foo terminate called after throwing an instance of 'boost::exception_detail:: clone_impl<boost::exception_detail::error_info_injector<boost:: filesystem::filesystem_error> >' what(): boost::filesystem::file_size: No such file or directory: "foo" Aborted |
>tut1 foo An exception is thrown; the exact form of the response depends on Windows system options. |
What happens?
There's no file named foo
in the current directory, so an
exception is thrown.
Try this:
Ubuntu Linux | Microsoft Windows |
$ ./tut1 . terminate called after throwing an instance of 'boost::exception_detail:: clone_impl<boost::exception_detail::error_info_injector<boost:: filesystem::filesystem_error> >' what(): boost::filesystem::file_size: Operation not permitted "." Aborted |
>tut1 . An exception is thrown; the exact form of the response depends on Windows system options. |
The current directory exists, but file_size()
works on regular
files, not directories, so again, an exception is thrown.
We'll deal with those situations in tut2.cpp
.
Boost.Filesystem includes status query functions such as
exists
,
is_directory
, and
is_regular_file
. These return
bool
's, and will return true
if the condition
described by their name is met. Otherwise they return false
,
including when any element
of the path argument can't be found.
tut2.cpp uses several of the status query functions to cope with non-existent files and with different kinds of files:
tut2.cpp int main(int argc, char* argv[]) { path p (argv[1]); // p reads clearer than argv[1] in the following code if (exists(p)) // does p actually exist? { if (is_regular_file(p)) // is p a regular file? cout << p << " size is " << file_size(p) << '\n'; else if (is_directory(p)) // is p a directory? cout << p << "is a directory\n"; else cout << p << "exists, but is neither a regular file nor a directory\n"; } else cout << p << "does not exist\n"; return 0; } |
Give it a try:
Ubuntu Linux | Microsoft Windows |
$ ./tut2 tut2.cpp tut2 size is cpp 1037 $ ./tut2 foo foo does not exist $ ./tut2 . . is a directory |
>tut2 tut2.cpp tut2.cpp size is 1079 >tut2 foo foo does not exist >tut2 . . is a directory |
Although tut2 works OK in these tests, the output is less than satisfactory
for a directory. We'd typically like to see a list of the directory's contents. In tut3.cpp
we will see how to iterate over directories.
But first, let's try one more test:
Ubuntu Linux | Microsoft Windows |
$ ls /home/jane/foo ls: cannot access /home/jane/foo: Permission denied $ ./tut2 /home/jane/foo terminate called after throwing an instance of 'boost::exception_detail:: clone_impl<boost::exception_detail::error_info_injector<boost:: filesystem::filesystem_error> >' what(): boost::filesystem::status: Permission denied: "/home/jane/foo" Aborted |
>dir e:\ The device is not ready. >tut2 e:\ An exception is thrown; the exact form of the response depends on Windows system options. |
On the Linux system, the test was being run from an account that did not have
permission to access /home/jane/foo
. On the Windows system,
e:
was a Compact Disc reader/writer that was not ready. End users
shouldn't have to interpret cryptic exceptions reports, so as we move on to tut3.cpp
we will increase the robustness of the code, too.
Boost.Filesystem's
directory_iterator
class is just what we need here. It follows the
general pattern of the standard library's istream_iterator
. Constructed from
a path, it iterates over the contents of the directory. A default constructed directory_iterator
acts as the end iterator.
The value type of directory_iterator
is
directory_entry
. A
directory_entry
object contains a path
and file_status
information. A
directory_entry
object
can be used directly, but can also be passed to path
arguments in function calls.
The other need is increased robustness in the face of the many kinds of errors that can affect file system operations. We could do that at the level of each call to a Boost.Filesystem function (see Error reporting), but it is easier to supply an overall try/catch block.
tut3.cpp int main(int argc, char* argv[]) { path p (argv[1]); // p reads clearer than argv[1] in the following code try { if (exists(p)) // does p actually exist? { if (is_regular_file(p)) // is p a regular file? cout << p << " size is " << file_size(p) << '\n'; else if (is_directory(p)) // is p a directory? { cout << p << " is a directory containing:\n"; copy(directory_iterator(p), directory_iterator(), // directory_iterator::value_type ostream_iterator<directory_entry>(cout, "\n")); // is directory_entry, which is // converted to a path by the // path stream inserter } else cout << p << " exists, but is neither a regular file nor a directory\n"; } else cout << p << " does not exist\n"; } catch (const filesystem_error& ex) { cout << ex.what() << '\n'; } return 0; } |
Give tut3
a try, passing it a path to a directory as a command line argument.
Here is a run on a checkout of the Boost Subversion trunk, followed by a repeat
of the test cases that caused exceptions on Linux and Windows:
Ubuntu Linux | Microsoft Windows |
$ ./tut3 ~/boost/trunk /home/beman/boost/trunk is a directory containing: /home/beman/boost/trunk/tools /home/beman/boost/trunk/boost-build.jam /home/beman/boost/trunk/dist /home/beman/boost/trunk/doc /home/beman/boost/trunk/bootstrap.sh /home/beman/boost/trunk/index.html /home/beman/boost/trunk/bootstrap.bat /home/beman/boost/trunk/boost.css /home/beman/boost/trunk/INSTALL /home/beman/boost/trunk/rst.css /home/beman/boost/trunk/boost /home/beman/boost/trunk/people /home/beman/boost/trunk/wiki /home/beman/boost/trunk/boost.png /home/beman/boost/trunk/LICENSE_1_0.txt /home/beman/boost/trunk/more /home/beman/boost/trunk/Jamroot /home/beman/boost/trunk/.svn /home/beman/boost/trunk/libs /home/beman/boost/trunk/index.htm /home/beman/boost/trunk/status /home/beman/boost/trunk/CMakeLists.txt |
>tut3 c:\boost\trunk c:\boost\trunk is a directory containing: c:\boost\trunk\.svn c:\boost\trunk\boost c:\boost\trunk\boost-build.jam c:\boost\trunk\boost.css c:\boost\trunk\boost.png c:\boost\trunk\bootstrap.bat c:\boost\trunk\bootstrap.sh c:\boost\trunk\CMakeLists.txt c:\boost\trunk\dist c:\boost\trunk\doc c:\boost\trunk\index.htm c:\boost\trunk\index.html c:\boost\trunk\INSTALL c:\boost\trunk\Jamroot c:\boost\trunk\libs c:\boost\trunk\LICENSE_1_0.txt c:\boost\trunk\more c:\boost\trunk\people c:\boost\trunk\rst.css c:\boost\trunk\status c:\boost\trunk\tools c:\boost\trunk\wiki >tut3 e:\ boost::filesystem::status: The device is not ready: "e:\" |
Not bad, but we can make further improvements:
Move on to tut4.cpp
to see how those changes play out!
tut4.cpp int main(int argc, char* argv[]) { path p (argv[1]); // p reads clearer than argv[1] in the following code try { if (exists(p)) // does p actually exist? { if (is_regular_file(p)) // is p a regular file? cout << p << " size is " << file_size(p) << '\n'; else if (is_directory(p)) // is p a directory? { cout << p << " is a directory containing:\n"; typedef vector<path> vec; // store paths, vec v; // so we can sort them later copy(directory_iterator(p), directory_iterator(), back_inserter(v)); sort(v.begin(), v.end()); // sort, since directory iteration // is not ordered on some file systems for (vec::const_iterator it (v.begin()); it != v.end(); ++it) { cout << " " << *it << '\n'; } } else cout << p << " exists, but is neither a regular file nor a directory\n"; } else cout << p << " does not exist\n"; } catch (const filesystem_error& ex) { cout << ex.what() << '\n'; } return 0; } |
The key difference between tut3.cpp
and tut4.cpp
is
what happens in the directory iteration loop. We changed:
cout << " " << *it << '\n'; // *it returns a directory_entry,
to:
path fn = it->path().filename(); // extract the filename from the path v.push_back(fn); // push into vector for later sorting
path()
is a directory_entry
observer function.
filename()
is one of
several path decomposition functions. It extracts the filename portion ("index.html"
)
from a path ("/home/beman/boost/trunk/index.html"
). These decomposition functions are
more fully explored in the Path iterators, observers,
composition, decomposition and query portion of this tutorial.
The above was written as two lines of code for clarity. It could have been written more concisely as:
v.push_back(it->path().filename()); // we only care about the filename
Here is the output from a test of tut4.cpp
:
Ubuntu Linux | Microsoft Windows |
$ ./tut4 ~/boost/trunk /home/beman/boost/trunk is a directory containing: .svn CMakeLists.txt INSTALL Jamroot LICENSE_1_0.txt boost boost-build.jam boost.css boost.png bootstrap.bat bootstrap.sh doc index.htm index.html libs more people rst.css status tools wiki |
C:\v3d>tut4 c:\boost\trunk c:\boost\trunk is a directory containing: .svn CMakeLists.txt INSTALL Jamroot LICENSE_1_0.txt boost boost-build.jam boost.css boost.png bootstrap.bat bootstrap.sh doc index.htm index.html libs more people rst.css status tools wiki |
That completes the main portion of this tutorial. If you haven't already worked through the Class path sections of this tutorial, dig into them now. The Error reporting section may also be of interest, although it can be skipped unless you are deeply concerned about error handling issues.
Traditional C interfaces pass paths as const char*
arguments.
C++ interfaces may add const std::string&
overloads, but adding
overloads becomes untenable if wide characters, containers, and iterator ranges
need to be supported.
Passing paths as const path&
arguments is far simpler, yet far
more flexible because class path
itself is far more flexible:
path
supports multiple character types and encodings, including Unicode, to
ease internationalization.path
supports multiple source types, such as iterators for null terminated
sequences, iterator ranges, containers (including std::basic_string
),
and directory_entry
's,
so functions taking paths don't need to provide several overloads.path
supports both native and generic pathname formats, so programs can be
portable between operating systems yet use native formats where desirable.path
supplies a full set of iterators, observers, composition,
decomposition, and query functions, making pathname manipulations easy,
convenient, reliable, and portable.Here is how (1) and (2) work. Class path constructors, assignments, and appends have member templates for sources. For example, here are the constructors that take sources:
template <class Source> path(Source const& source);template <class InputIterator> path(InputIterator begin, InputIterator end);
Let's look at a little program that shows how comfortable class path
is with
both narrow and wide characters in C-style strings, C++ strings, and via C++
iterators:
tut5.cpp #include <boost/filesystem.hpp> #include <string> #include <list> namespace fs = boost::filesystem; int main() { // \u263A is "Unicode WHITE SMILING FACE = have a nice day!" std::string narrow_string ("smile2"); std::wstring wide_string (L"smile2\u263A"); std::list<char> narrow_list; narrow_list.push_back('s'); narrow_list.push_back('m'); narrow_list.push_back('i'); narrow_list.push_back('l'); narrow_list.push_back('e'); narrow_list.push_back('3'); std::list<wchar_t> wide_list; wide_list.push_back(L's'); wide_list.push_back(L'm'); wide_list.push_back(L'i'); wide_list.push_back(L'l'); wide_list.push_back(L'e'); wide_list.push_back(L'3'); wide_list.push_back(L'\u263A'); { fs::ofstream f("smile"); } { fs::ofstream f(L"smile\u263A"); } { fs::ofstream f(narrow_string); } { fs::ofstream f(wide_string); } { fs::ofstream f(narrow_list); } { fs::ofstream f(wide_list); } narrow_list.pop_back(); narrow_list.push_back('4'); wide_list.pop_back(); wide_list.pop_back(); wide_list.push_back(L'4'); wide_list.push_back(L'\u263A'); { fs::ofstream f(fs::path(narrow_list.begin(), narrow_list.end())); } { fs::ofstream f(fs::path(wide_list.begin(), wide_list.end())); } return 0; } |
Testing tut5
:
Ubuntu Linux | Microsoft Windows |
$ ./tut5 $ ls smile* smile smile☺ smile2 smile2☺ smile3 smile3☺ smile4 smile4☺ |
>tut5 >dir /b smile* smile smile2 smile2☺ smile3 smile3☺ smile4 smile4☺ smile☺ |
Note that the exact appearance of the smiling face will depend on the font,
font size, and other settings for your command line window. The above tests were
run with out-of-the-box Ubuntu 9.10 and Windows 7, US Edition. If you don't get
the above results, take a look at the boost-root/libs/filesystem/example/test
directory with your system's GUI file browser, such as Linux Nautilus, Mac OS X
Finder, or Windows Explorer. These tend to be more comfortable with
international character sets than command line interpreters.
Class path
takes care of whatever character type or encoding
conversions are required by the particular operating system. Thus as
tut5
demonstrates, it's no problem to pass a wide character string to a
Boost.Filesystem operational function even if the underlying operating system
uses narrow characters, and visa versa. And the same applies to user supplied
functions that take const path&
arguments.
Class path
also provides path syntax that is portable across operating systems,
element iterators, and observer, composition, decomposition, and query
functions to manipulate the elements of a path. The next section of this
tutorial deals with path syntax.
Class path
deals with two different pathname
formats - generic format and native format. For POSIX-like
file systems, these formats are the same. But for users of Windows and
other non-POSIX file systems, the distinction is important. Even
programmers writing for POSIX-like systems need to understand the distinction if
they want their code to be portable to non-POSIX systems.
The generic format is the familiar /my_directory/my_file.txt
format used by POSIX-like
operating systems such as the Unix variants, Linux, and Mac OS X. Windows also
recognizes the generic format, and it is the basis for the familiar Internet URL
format. The directory
separator character is always one or more slash characters.
The native format is the format as defined by the particular
operating system. For Windows, either the slash or the backslash can be used as
the directory separator character, so /my_directory\my_file.txt
would work fine. Of course, if you write that in a C++ string literal, it
becomes "/my_directory\\my_file.txt"
.
If a drive specifier or a backslash appears in a pathname on a Windows system, it is always treated as the native format.
Class path
has observer functions that allow you to
obtain the string representation of a path object in either the native format
or the generic format. See the next section
for how that plays out.
The distinction between generic format and native format is important when communicating with native C-style API's and with users. Both tend to expect paths in the native format and may be confused by the generic format. The generic format is great, however, for writing portable programs that work regardless of operating system.
The next section covers class path
observers, composition,
decomposition, query, and iteration over the elements of a path.
The path_info.cpp
program is handy for learning how class path
iterators,
observers, composition, decomposition, and query functions work on your system.
If it hasn't already already been built on your system, please build it now. Run
the examples below on your system, and try some different path arguments as we
go along.
path_info
produces several dozen output lines every time it's
invoked. We will only show the output lines we are interested in at each step.
First we'll look at iteration over the elements of a path, and then use iteration to illustrate the difference between generic and native format paths.
Ubuntu Linux | Microsoft Windows |
$ ./path_info /foo/bar/baa.txt ... elements: / foo bar baa.txt |
>path_info /foo/bar/baa.txt ... elements: / foo bar baa.txt |
Thus on both POSIX and Windows based systems the path "/foo/bar/baa.txt"
is seen as having four elements.
Here is the code that produced the above listing:
cout << "\nelements:\n"; for (path::iterator it = p.begin(); it != p.end(); ++it) cout << " " << *it << '\n'; |
path::iterator::value_type
is path::string_type
,
and iteration treats path
as a container of filenames.
Let's look at some of the output from a slightly different example:
Ubuntu Linux | Microsoft Windows |
$ ./path_info /foo/bar/baa.txt composed path: cout << -------------: /foo/bar/baa.txt preferred()----------: /foo/bar/baa.txt ... observers, native format: native()-------------: /foo/bar/baa.txt c_str()--------------: /foo/bar/baa.txt string()-------------: /foo/bar/baa.txt wstring()------------: /foo/bar/baa.txt observers, generic format: generic_string()-----: /foo/bar/baa.txt generic_wstring()----: /foo/bar/baa.txt |
>path_info /foo/bar\baa.txt composed path: cout << -------------: /foo/bar/baa.txt preferred()----------: \foo\bar\baa.txt ... observers, native format: native()-------------: /foo/bar\baa.txt c_str()--------------: /foo/bar\baa.txt string()-------------: /foo/bar\baa.txt wstring()------------: /foo/bar\baa.txt observers, generic format: generic_string()-----: /foo/bar/baa.txt generic_wstring()----: /foo/bar/baa.txt |
Native format observers should be used when interacting with the operating system or with users; that's what they expect.
Generic format observers should be used when the results need to be portable and uniform regardless of the operating system.
path
objects always hold pathnames in the native
format, but otherwise leave them unchanged from their source. The
preferred() function will convert to the
preferred form, if the native format has several forms. Thus on Windows, it will
convert slashes to backslashes.
Let's move on to decomposition and query functions:
Ubuntu Linux | Microsoft Windows |
$ ./path_info /foo/bar/baa.txt ... decomposition: root_name()----------: root_directory()-----: / root_path()----------: / relative_path()------: foo/bar/baa.txt parent_path()--------: /foo/bar filename()-----------: baa.txt stem()---------------: baa extension()----------: .txt query: empty()--------------: false is_absolute()--------: true has_root_name()------: false has_root_directory()-: true has_root_path()------: true has_relative_path()--: true has_parent_path()----: true has_filename()-------: true has_stem()-----------: true has_extension()------: true |
>path_info /foo/bar/baa.txt ... decomposition: root_name()----------: root_directory()-----: / root_path()----------: / relative_path()------: foo/bar/baa.txt parent_path()--------: /foo/bar filename()-----------: baa.txt stem()---------------: baa extension()----------: .txt query: empty()--------------: false is_absolute()--------: false has_root_name()------: false has_root_directory()-: true has_root_path()------: true has_relative_path()--: true has_parent_path()----: true has_filename()-------: true has_stem()-----------: true has_extension()------: true |
These are pretty self-evident, but do note the difference in the
result of is_absolute()
between Linux and Windows. Because there is
no root name (i.e. drive specifier or network name), a lone slash (or backslash)
is a relative path on Windows.
On to composition!
Class path
uses /
and /=
operators to
append elements. That's a reminder
that these operations append the operating system's preferred directory
separator if needed. The preferred
directory separator is a slash on POSIX-like systems, and a backslash on
Windows-like systems.
path_info.cpp
composes a path by appending each of the command line elements to an initially
empty path:
path p; // compose a path from the command line arguments for (; argc > 1; --argc, ++argv) p /= argv[1]; cout << "\ncomposed path:\n"; cout << " cout << -------------: " << p << "\n"; cout << " preferred()----------: " << p.preferred() << "\n"; |
Let's give this code a try:
Ubuntu Linux | Microsoft Windows |
$ ./path_info / foo/bar baa.txt composed path: cout << -------------: /foo/bar/baa.txt preferred()----------: /foo/bar/baa.txt |
>path_info / foo/bar baa.txt composed path: cout << -------------: /foo/bar\baa.txt preferred()----------: \foo\bar\baa.txt |
The Boost.Filesystem file_size
function has two overloads:
uintmax_t file_size(const path& p); uintmax_t file_size(const path& p, system::error_code& ec);
The only significant difference between the two is how they report errors.
The
first signature will throw exceptions to report errors. A
filesystem_error
exception will be thrown
on an
operational error. filesystem_error
is derived from std::runtime_error
.
It has a
member function to obtain the
error_code
reported by the source
of the error. It also has member functions to obtain the path or paths that caused
the error.
Motivation for the second signature: Throwing exceptions on errors was the entire error reporting story for the earliest versions of Boost.Filesystem, and indeed throwing exceptions on errors works very well for many applications. But user reports trickled in that some code became so littered with try and catch blocks as to be unreadable and unmaintainable. In some applications I/O errors aren't exceptional, and that's the use case for the second signature.
Functions with a system::error_code&
argument set that
argument to report operational error status, and so do not throw exceptions when I/O
related errors occur. For a full explanation, see
Error reporting in the reference
documentation.
© Copyright Beman Dawes 2010
Distributed under the Boost Software License, Version 1.0. See www.boost.org/LICENSE_1_0.txt
Revised 20 February 2011