egrep and xargs saved my life - Replacing Line Endings
From time to time I run into situations where I have a file which contains window line endings. Window line endings CRLF look like this to us linux peeps, \r\n - and can really be a pain in a PHP script if it gets in there and interrupts the output of the file. Typically this isn't an issue if you transfer files and sites through a nice FTP client like Filezilla because these clients have a utility that will convert non-binary files based on the operating system on the destination. You may have seen in files "^M" which is common when opening files in vi (I use vim) and the file is formatted in dos line endings.
So if you're uploading or download to a Unix machine from a Windows (DOS based) machine the FTP client should auto-detect everything nicely and convert. But for those developers out there whom wind up transferring massive projects, lets face it, you can't simply download 100,000 files when you want to. So, like many of us, we will zip up (or tar) an entire site and download it as 1 file, then upload to server and uncompress.
However, because we're simply transferring a compressed file, the FTP client will never convert all of those awesome files formatted with DOS style line endings. So I bet your next question is "so how can I fix all of these files without editing each by hand?"...
No fear this isn't that hard :-)
The Solution
The solution is fairly simple, use dos2unix, or fromdos depending on which flavor of linux you're running. But if you're wanting to convert more than 5 files, this starts to be time consuming again.
So, if you would like to recursively find all files which have offending line endings, and pass those along to be converted we need to grep for those files, and then pipe the results to xargs so that for each file found we can mimic a call to dos2unix for us.
egrep -Ilr $'\r\n' * | xargs -p dos2unix
Explain it please...
For the lovers of all things linux, you're probably familiar with egrep and grep, they are outstanding search tools. I have added to my egrep a couple arguments. The lowercase L (-l) tells grep to print each file out with it's path. The capital "I" (-I) tells egrep to IGNORE all binary files... Because after all if you have 20,000 images out of the 30,000 files why waste your day having egrep search against those files. Lastly we have the lowercase "r" (-r)" which tells egrep to search RECURSIVELY through all sub directories.
The output of this search is then passed (PIPED) to xargs. If you're not familiar with xargs, essentially it will take each line of output from this (each file name and path) and pass that to another command for you. So if egrep finds 10 files that matches the search, xargs will combine those file names into a command for you.
xargs -p dos2unix then is essentially just replacing man for machine :-)
The -p tells xargs to prompt you before executing each command, which I high suggest if you've never done this before.. Don't get your self into a mess.. and ALWAYS BACKUP FILES FIRST!