Ever wanted to grep a pattern in a PDF document? How about a database or the web? crgrep is a powerful grep-like utility written using JAVA that can do much more than just searching for patterns in text files. crgrep stands for Common Resource grep.
Resources crgrep supports:
- text documents, PDFs
- database tables
- ZIP, TAR, WAR, EAR and JAR archive formats
- image metadata (jpeg, gif etc.)
- text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using OCR
- Maven POM files, following dependency trees of resource artifacts
- web resources
- combinations of supported resources
DOWNLOAD
crgrep is distributed as binary from its SourceForge project page. After extracting the archive, crgrep binary can be found in the bin directory.
USAGE
- Normal calling convention
$ crgrep <pattern> <resource path(s)>
Wildcards such as * and ? are supported in pattern or resource path(s). Output is displayed in the format:
<resource>[[:pagenum]:linenum:matching_content]
For example:
Output Match ------------------------------------------------------------------ src/foo.java File listing match src/bar.txt:25:some text File content match (+lineno) lib/all.zip[image.gif] Archive file listing match lib/app.war[WEB-INF/web.xml]:6:<d..> Archive file content match pom.xml->stuff.zip[doc.txt] File listing match mypic.jpg: @{Size=25,Com=Scene} File meta-data match TAB: [COL1,COL2,COL3] Table column name match TAB: data1,data2,data3 Table data match Node[1]:{name:"John"} Graph database node match sample.pdf:1:1:Sample PDF Document Text extracted from a PDF (+pageno and +linenum)
- Find files and data matching key under target directory. Include archives.
$ crgrep -r key target target/simple_file.txt: a key moment target/misc.zip[misc/nested_monkey.txt] target/monkey-pics.txt:1:A file about happy monkeys. target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey
- What column data in my database matches ‘handle’?
(database username and password should be in ~/.crgrep)
For relational DB:$ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*'
For Neo4J graph DB:
$ crgrep -d -U "http://localhost:7474/" handle '*
-d stands for database and -U for URI.
- Search pattern in an image using OCR:
$ crgrep --ocr report report_scan.png
- Search in image metadata
$ crgrep --ocr report report_scan.png
- Does the google home page contain a ‘favicon’ reference?
$ crgrep google_favicon http://www.google.com
- Find maven (POM) dependencies in my project with content matching ‘RunWith’
$ crgrep -m RunWith pom.xml
Webpage: Common Resource grep
Similar software
- ag or The Silver Searcher is a faster grep alternative for developers.
Newer revs (1.0.2 is the latest) adds MS Office doc search (doc/ppt/xls[x]), full glob resource patterns such as ant style ‘crgrep foo dir/**/*.txt’, cleaner output and error reporting, continuously improving with each release. Hope this helps. FYI: I’m the author, I hope this tool is useful to you. Thanks for the write ups, please post feedback to the homepage – good or bad!
Thanks for knocking and the update, Craig!