Files listing experiments: `find` vs. `ls`
I wanted to tweak a script of mine; it included the conventional find </path> -maxdepth 1 -type f
. I wasn’t fully convinced it was the best choice, so I checked out what’s the ls
equivalent.
This post is about the research I’ve done; as usual, it’s an exercise in extensive usage of the available tools.
Note that I’ve been notified by a reader of the ls
parameter -A
, which simplifies the logic. I’ve kept both sections, for two reasons:
- the concepts not needed with the new approach are interesting to know regardless;
- in the new section I don’t explain in detail the concepts already explained in the old one.
Contents:
Introduction to the problem, and the find
tool
Let’s say we want to copy all the files from a directory, without descending into the subdirectories, while processing the filenames.
An extra requirement is that we need to perform this operation from an arbitrary location (therefore, we need full paths).
For simplicity, we use files without spaces/wildcards; for covering those cases, sophisticated handling is required.
Full directory structure of the example:
$ find /tmp/source -mindepth 1
/tmp/source/dir_d
/tmp/source/dir_d/file_f.src
/tmp/source/.file_c.src
/tmp/source/file_a.src
/tmp/source/file_b.src
Note how we skip the parent directory /tmp/source
, via -mindepth 1
.
We can perform the operation via an interesting find
usage:
find /tmp/source -maxdepth 1 -type f -exec sh -c '
cp $0 /tmp/dest/$(basename ${0%.src})
' {} \;
the functionalities used are:
-maxdepth 1
: don’t descend into subdirectories-type f
: only files-exec
: exec the given command for each cycle;sh -c
executes a command in a dash (sub)shell; the general-exec
format is-exec <command> {} \;
, where{}
is the filename placeholderbasename <filename>
: linux tool for printing the basename of a file; the corresponding bash string function is${<variable>##*/}
${<variable>%<suffix>}
: bash string function for printing<variable>
with<suffix>
removed (in this case, the extension)
The clever part of this pattern is that -exec
passes the filename via placeholder {}
to the sh
subshell, so that the latter can use it as parameter ($0
).
We could in theory put the placeholder inside the command, in the form:
find /tmp/source -maxdepth 1 -type f -exec sh -c '
echo {}
' \;
but this way, we couldn’t use bash string manipulation functions, which need a variable.
Exploring ls
Let’s explore, instead, what ls
provides.
First, let’s start with a simple ls -1
:
$ ls -1 /tmp/source
dir_d
file_a.src
file_b.src
This won’t work: we need the full path for executing the command from an arbitrary location; additionally, the hidden file is missing.
We try using a wildcard:
$ ls -1 /tmp/source/*
/tmp/source/file_a.src
/tmp/source/file_b.src
/tmp/source/dir_d:
file_f.src
Now we have the full path: when ls -1
receives full paths as parameters, it also prints the files with a full path (like find
).
We still don’t have the hidden file in the list, and following this path, -a
won’t work, since the wildcard explicitly selects the files to be listed, and filters out the hidden ones (more on this later).
EDIT: While -a
is not fit for the purposes, -A
does - see the following section for an approach that uses it.
Let’s avoid descending into the subdirectory, using -d
:
$ ls -1d /tmp/source/*
/tmp/source/dir_d
/tmp/source/file_a.src
/tmp/source/file_b.src
We need to exclude directories (like find -type f
); we can achieve this via -p
and complementing with grep
:
$ ls -1dp /tmp/source/*
/tmp/source/dir_d/
/tmp/source/file_a.src
/tmp/source/file_b.src
$ ls -1dp /tmp/source/* | grep -v '/$'
/tmp/source/file_a.src
/tmp/source/file_b.src
The standard grep
supports basic regex metacharacters ($
= end of the line), so we don’t need to specify options for advanced regular expressions support (-E
or -P
).
The hidden file is missing! Let’s use an interesting bash feature - brace expansion:
$ ls -1dp /tmp/source/{,.}* | grep -v '/$'
/tmp/source/file_a.src
/tmp/source/file_b.src
/tmp/source/.file_c.src
the brace splits the braces content with comma, then expands the token using the containing string.
In this case, the tokens are an empty string (between {
and ,
) and .
, resulting in respectively /tmp/source/*
and /tmp/source/.*
:
$ ls -1dp /tmp/source/* /tmp/source/.* | grep -v '/$'
/tmp/source/file_a.src
/tmp/source/file_b.src
/tmp/source/.file_c.src
There is an unfortunate side effects to this expression. If there are no hidden files, the statement (expanded, for clarity) will print an error:
$ ls -1dp /tmp/source/* /tmp/source/.* | grep -v '/$'
ls: cannot access '/tmp/source/.*': No such file or directory
/tmp/source/file_a.src
/tmp/source/file_b.src
# we assume .file_c.src is not present
Shame. If we want the command to allows hidden files to be optional (and/or non-hidden), we need to filter out stderr
:
$ ls -1dp /tmp/source/{,.}* 2> /dev/null | grep -v '/$'
/tmp/source/file_a.src
/tmp/source/file_b.src
# we assume .file_c.src is not present
we accomplish this via sending stderr to /dev/null (2> /dev/null
).
We can finally write the cycle as:
for f in $(ls -1dp /tmp/source/{,.}* 2> /dev/null | grep -v '/$'); do
cp $f /tmp/dest/$(basename ${f%.src})
done
A better approach, via ls -A
A better approach exists, which uses the -A
option of ls
; it includes the hidden files, with the exclusion of .
and ..
.
We can therefore list all the children of /tmp/source
easily, without further descending the tree:
$ ls -1A /tmp/source
dir_d
file_a.src
file_b.src
.file_c.src
There is one difference to take into account with the previous approach: the parent directory is not included in the output, so we’ll need to specify it.
Now, let’s exclude the directories, via ls -p
and grep:
$ ls -1Ap /tmp/source | grep -v '/$'
file_a.src
file_b.src
.file_c.src
With this approach, the final version is considerably cleaner:
for f in $(ls -1Ap /tmp/source | grep -v '/$'); do
cp /tmp/source/$f /tmp/dest/$(basename ${f%.src})
done
Conclusion
While the mixed ls
version works in a relatively simple fashion, the find
options are more intuitive, and there is no subshell clutter.
However, we’ve discovered interesting ls
options, and dusted the find
functionalities.