crgrep: grep any resource

search_compEver wanted to grep a pattern in a PDF document? How about a database or the web? crgrep is a powerful grep-like utility written using JAVA that can do much more than just searching for patterns in text files. crgrep stands for Common Resource grep.

Resources crgrep supports:

  • text documents, PDFs
  • database tables
  • ZIP, TAR, WAR, EAR and JAR archive formats
  • image metadata (jpeg, gif etc.)
  • text in scanned documents (jpeg/gif/tiff/bmp/png), extracted using OCR
  • Maven POM files, following dependency trees of resource artifacts
  • web resources
  • combinations of supported resources

DOWNLOAD

crgrep is distributed as binary from its SourceForge project page. After extracting the archive, crgrep binary can be found in the bin directory.

USAGE

  • Normal calling convention
    $ crgrep <pattern> <resource path(s)>

    Wildcards such as * and ? are supported in pattern or resource path(s). Output is displayed in the format:

    <resource>[[:pagenum]:linenum:matching_content]

    For example:

    Output                                Match
    ------------------------------------------------------------------
    src/foo.java                          File listing match
    src/bar.txt:25:some text              File content match (+lineno)
    lib/all.zip[image.gif]                Archive file listing match
    lib/app.war[WEB-INF/web.xml]:6:<d..>  Archive file content match
    pom.xml->stuff.zip[doc.txt]           File listing match
    mypic.jpg: @{Size=25,Com=Scene}       File meta-data match
    TAB: [COL1,COL2,COL3]                 Table column name match
    TAB: data1,data2,data3                Table data match
    Node[1]:{name:"John"}                 Graph database node match
    sample.pdf:1:1:Sample PDF Document    Text extracted from a PDF 
                                          (+pageno and +linenum)
    
  • Find files and data matching key under target directory. Include archives.
    $ crgrep -r key target
    target/simple_file.txt: a key moment
    target/misc.zip[misc/nested_monkey.txt]
    target/monkey-pics.txt:1:A file about happy monkeys.
    target/test-ear.ear[META-INF/MANIFEST.MF]:5:Created-By: Apache monkey
  • What column data in my database matches ‘handle’?
    (database username and password should be in ~/.crgrep)
    For relational DB:

    $ crgrep -d -U "jdbc:sqlite:/databases/db.sqlite3" handle '*'

    For Neo4J graph DB:

    $ crgrep -d -U "http://localhost:7474/" handle '*

    -d stands for database and -U for URI.

  • Search pattern in an image using OCR:
    $ crgrep --ocr report report_scan.png
  • Search in image metadata
    $ crgrep --ocr report report_scan.png
  • Does the google home page contain a ‘favicon’ reference?
    $ crgrep google_favicon http://www.google.com
  • Find maven (POM) dependencies in my project with content matching ‘RunWith’
    $ crgrep -m RunWith pom.xml

Webpage: Common Resource grep

Similar software

  • ag or The Silver Searcher is a faster grep alternative for developers.

4 thoughts on “crgrep: grep any resource”

  1. Pingback: 디씨인사이드를 grep해보자 | dansamo
  2. Pingback: Linux Shell [crgrep] | [GNU]Linux[cat]
  3. Newer revs (1.0.2 is the latest) adds MS Office doc search (doc/ppt/xls[x]), full glob resource patterns such as ant style ‘crgrep foo dir/**/*.txt’, cleaner output and error reporting, continuously improving with each release. Hope this helps. FYI: I’m the author, I hope this tool is useful to you. Thanks for the write ups, please post feedback to the homepage – good or bad!

Leave a Reply

Your email address will not be published. Required fields are marked *