Embedded the awk.
This commit is contained in:
parent
80356b3878
commit
d64be96052
23 changed files with 56 additions and 59 deletions
|
@ -27,7 +27,7 @@ import(
|
|||
"github.com/surdeus/goblin/src/tool/useprog"
|
||||
"github.com/surdeus/goblin/src/tool/path"
|
||||
"github.com/surdeus/goblin/src/tool/mk"
|
||||
//"github.com/surdeus/goblin/src/tool/awk"
|
||||
"github.com/surdeus/goblin/src/tool/awk"
|
||||
)
|
||||
|
||||
func main() {
|
||||
|
@ -57,7 +57,7 @@ func main() {
|
|||
"useprog" : mtool.Tool{useprog.Run, "print the name of the first existing program in arg list"},
|
||||
"path" : mtool.Tool{path.Run, "print cross platform path based on cmd arguments"},
|
||||
"mk" : mtool.Tool{mk.Run, "file dependency system, simpler make"},
|
||||
//"awk" : mtool.Tool{awk.Run, "simple scripting language for working with string templates"},
|
||||
"awk" : mtool.Tool{awk.Run, "simple scripting language for working with string templates"},
|
||||
}
|
||||
|
||||
mtool.Main("goblin", tools)
|
||||
|
|
|
@ -93,12 +93,12 @@ if err != nil { ... }
|
|||
|
||||
Note that `INPUTMODE` and `OUTPUTMODE` set using `Vars` or in the `BEGIN` block will override these settings.
|
||||
|
||||
See the [full reference documentation](https://pkg.go.dev/github.com/benhoyt/goawk/interp#Config) for the `interp.Config` struct.
|
||||
See the [full reference documentation](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk/interp#Config) for the `interp.Config` struct.
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
Below are some examples using the [testdata/csv/states.csv](https://github.com/benhoyt/goawk/blob/master/testdata/csv/states.csv) file, which is a simple CSV file whose contents are as follows:
|
||||
Below are some examples using the [testdata/csv/states.csv](https://github.com/surdeus/goblin/src/tool/awk/blob/master/testdata/csv/states.csv) file, which is a simple CSV file whose contents are as follows:
|
||||
|
||||
```
|
||||
"State","Abbreviation"
|
||||
|
@ -278,7 +278,7 @@ NY
|
|||
|
||||
The [csvkit](https://csvkit.readthedocs.io/en/latest/index.html) suite is a set of tools that allow you to quickly analyze and extract fields from CSV files. Each csvkit tool allows you to do a specific task; GoAWK is more low-level and verbose, but also a more general tool ([`csvsql`](https://csvkit.readthedocs.io/en/latest/tutorial/3_power_tools.html#csvsql-and-sql2csv-ultimate-power) being the exception!). GoAWK also runs significantly faster than csvkit (the latter is written in Python).
|
||||
|
||||
Below are a few snippets showing how you'd do some of the tasks in the csvkit documentation, but using GoAWK (the input file is [testdata/csv/nz-schools.csv](https://github.com/benhoyt/goawk/blob/master/testdata/csv/nz-schools.csv)):
|
||||
Below are a few snippets showing how you'd do some of the tasks in the csvkit documentation, but using GoAWK (the input file is [testdata/csv/nz-schools.csv](https://github.com/surdeus/goblin/src/tool/awk/blob/master/testdata/csv/nz-schools.csv)):
|
||||
|
||||
### csvkit example: print column names
|
||||
|
||||
|
@ -363,7 +363,7 @@ $ goawk -i csv -H '/Girls/ { d+=@"Decile"; n++ } END { print d/n }' testdata/csv
|
|||
|
||||
The performance of GoAWK's CSV input and output mode is quite good, on a par with using the `encoding/csv` package from Go directly, and much faster than the `csv` module in Python. CSV input speed is significantly slower than `frawk`, though CSV output speed is significantly faster than `frawk`.
|
||||
|
||||
Below are the results of some simple read and write [benchmarks](https://github.com/benhoyt/goawk/blob/master/scripts/csvbench) using `goawk` and `frawk` as well as plain Python and Go. The output of the write benchmarks is a 1GB, 3.5 million row CSV file with 20 columns (including quoted columns); the input for the read benchmarks uses that same file. Times are in seconds, showing the best of three runs on a 64-bit Linux laptop with an SSD drive:
|
||||
Below are the results of some simple read and write [benchmarks](https://github.com/surdeus/goblin/src/tool/awk/blob/master/scripts/csvbench) using `goawk` and `frawk` as well as plain Python and Go. The output of the write benchmarks is a 1GB, 3.5 million row CSV file with 20 columns (including quoted columns); the input for the read benchmarks uses that same file. Times are in seconds, showing the best of three runs on a 64-bit Linux laptop with an SSD drive:
|
||||
|
||||
Test | goawk | frawk | Python | Go
|
||||
--------------- | ----- | ----- | ------ | ----
|
||||
|
@ -378,10 +378,10 @@ Writing 1GB CSV | 5.64 | 13.0 | 17.0 | 3.24
|
|||
- keys would be ordered by `OFIELDS` (eg: `OFIELDS[1] = "name"; OFIELDS[2] = "age"`) or by "smart name" if `OFIELDS` not set ("smart name" meaning numeric if `a` keys are numeric, string otherwise)
|
||||
- `printrow(a)` could take an optional second `fields` array arg to use that instead of the global `OFIELDS`
|
||||
* Consider allowing `-H` to accept an optional list of field names which could be used as headers in the absence of headers in the file itself (either `-H=name,age` or `-i 'csv header=name,age'`).
|
||||
* Consider adding TrimLeadingSpace CSV input option. See: https://github.com/benhoyt/goawk/issues/109
|
||||
* Consider adding TrimLeadingSpace CSV input option. See: https://github.com/surdeus/goblin/src/tool/awk/issues/109
|
||||
* Consider supporting `@"id" = 42` named field assignment.
|
||||
|
||||
|
||||
## Feedback
|
||||
|
||||
Please [open an issue](https://github.com/benhoyt/goawk/issues) if you have bug reports or feature requests for GoAWK's CSV support.
|
||||
Please [open an issue](https://github.com/surdeus/goblin/src/tool/awk/issues) if you have bug reports or feature requests for GoAWK's CSV support.
|
||||
|
|
|
@ -1,3 +0,0 @@
|
|||
module github.com/benhoyt/goawk
|
||||
|
||||
go 1.14
|
Binary file not shown.
|
@ -38,9 +38,9 @@ import (
|
|||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/lexer"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
const (
|
||||
|
|
|
@ -18,8 +18,8 @@ import (
|
|||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
var (
|
||||
|
|
|
@ -7,7 +7,7 @@ import (
|
|||
"strconv"
|
||||
"strings"
|
||||
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Program is an entire AWK program.
|
||||
|
|
|
@ -6,8 +6,8 @@ import (
|
|||
"math"
|
||||
"regexp"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
"github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
"github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Program holds an entire compiled program.
|
||||
|
|
|
@ -7,8 +7,8 @@ import (
|
|||
"io"
|
||||
"strings"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
"github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
"github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Disassemble writes a human-readable form of the program's virtual machine
|
||||
|
|
|
@ -9,8 +9,8 @@ import (
|
|||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
func Example() {
|
||||
|
|
|
@ -12,8 +12,8 @@ import (
|
|||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Call native-defined function with given name and arguments, return
|
||||
|
|
|
@ -13,8 +13,8 @@ import (
|
|||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
func isFuzzTest(test interpTest) bool {
|
||||
|
|
|
@ -28,9 +28,9 @@ import (
|
|||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
"github.com/benhoyt/goawk/internal/compiler"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
var (
|
||||
|
@ -250,7 +250,7 @@ type Config struct {
|
|||
// or "tsv" in Vars or in the BEGIN block (those override this setting).
|
||||
//
|
||||
// For further documentation about GoAWK's CSV support, see the full docs:
|
||||
// https://github.com/benhoyt/goawk/blob/master/csv.md
|
||||
// https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md
|
||||
InputMode IOMode
|
||||
|
||||
// Additional options if InputMode is CSVMode or TSVMode. The zero value
|
||||
|
|
|
@ -18,8 +18,8 @@ import (
|
|||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
var (
|
||||
|
|
|
@ -17,8 +17,8 @@ import (
|
|||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Print a line of output followed by a newline
|
||||
|
|
|
@ -6,7 +6,7 @@ import (
|
|||
"context"
|
||||
"math"
|
||||
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
const checkContextOps = 1000 // for efficiency, only check context every N instructions
|
||||
|
|
|
@ -10,8 +10,8 @@ import (
|
|||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/benhoyt/goawk/interp"
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/interp"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
// This definitely doesn't test that everything was reset, but it's a good start.
|
||||
|
|
|
@ -10,15 +10,15 @@ import (
|
|||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
"github.com/benhoyt/goawk/internal/compiler"
|
||||
"github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
|
||||
"github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// Execute a block of virtual machine instructions.
|
||||
//
|
||||
// A big switch seems to be the best way of doing this for now. I also tried
|
||||
// an array of functions (https://github.com/benhoyt/goawk/commit/8e04b069b621ff9b9456de57a35ff2fe335cf201)
|
||||
// an array of functions (https://github.com/surdeus/goblin/src/tool/awk/commit/8e04b069b621ff9b9456de57a35ff2fe335cf201)
|
||||
// and it was ever so slightly faster, but the code was harder to work with
|
||||
// and it won't be improved when Go gets faster switches via jump tables
|
||||
// (https://go-review.googlesource.com/c/go/+/357330/).
|
||||
|
@ -1205,7 +1205,7 @@ func (p *interp) getline(redirect lexer.Token) (float64, string, error) {
|
|||
if err != nil {
|
||||
if _, ok := err.(*os.PathError); ok {
|
||||
// File not found is not a hard error, getline just returns -1.
|
||||
// See: https://github.com/benhoyt/goawk/issues/41
|
||||
// See: https://github.com/surdeus/goblin/src/tool/awk/issues/41
|
||||
return -1, "", nil
|
||||
}
|
||||
return 0, "", err
|
||||
|
|
|
@ -8,7 +8,7 @@ import (
|
|||
"strings"
|
||||
"testing"
|
||||
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
func TestLexer(t *testing.T) {
|
||||
|
|
|
@ -11,9 +11,9 @@ import (
|
|||
"strconv"
|
||||
"strings"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
"github.com/benhoyt/goawk/internal/compiler"
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
// ParseError (actually *ParseError) is the type of error returned by
|
||||
|
|
|
@ -8,7 +8,7 @@ import (
|
|||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/benhoyt/goawk/parser"
|
||||
"github.com/surdeus/goblin/src/tool/awk/parser"
|
||||
)
|
||||
|
||||
// NOTE: apart from TestParseAndString, the parser doesn't have
|
||||
|
|
|
@ -7,8 +7,8 @@ import (
|
|||
"reflect"
|
||||
"sort"
|
||||
|
||||
"github.com/benhoyt/goawk/internal/ast"
|
||||
. "github.com/benhoyt/goawk/lexer"
|
||||
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
|
||||
. "github.com/surdeus/goblin/src/tool/awk/lexer"
|
||||
)
|
||||
|
||||
type varType int
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
|
||||
# GoAWK: an AWK interpreter with CSV support
|
||||
|
||||
[![Documentation](https://pkg.go.dev/badge/github.com/benhoyt/goawk)](https://pkg.go.dev/github.com/benhoyt/goawk)
|
||||
[![GitHub Actions Build](https://github.com/benhoyt/goawk/workflows/Go/badge.svg)](https://github.com/benhoyt/goawk/actions?query=workflow%3AGo)
|
||||
[![Documentation](https://pkg.go.dev/badge/github.com/surdeus/goblin/src/tool/awk)](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk)
|
||||
[![GitHub Actions Build](https://github.com/surdeus/goblin/src/tool/awk/workflows/Go/badge.svg)](https://github.com/surdeus/goblin/src/tool/awk/actions?query=workflow%3AGo)
|
||||
|
||||
|
||||
AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse [*The AWK Programming Language*](https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf) I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" and GNU AWK test suites.
|
||||
|
||||
GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the [library of the University of Antwerp](https://www.uantwerpen.be/en/library/). Read the [CSV documentation](https://github.com/benhoyt/goawk/blob/master/csv.md).
|
||||
GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the [library of the University of Antwerp](https://www.uantwerpen.be/en/library/). Read the [CSV documentation](https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md).
|
||||
|
||||
You can also read one of the articles I've written about GoAWK:
|
||||
|
||||
|
@ -21,7 +21,7 @@ You can also read one of the articles I've written about GoAWK:
|
|||
To use the command-line version, simply use `go install` to install it, and then run it using `goawk` (assuming `~/go/bin` is in your `PATH`):
|
||||
|
||||
```shell
|
||||
$ go install github.com/benhoyt/goawk@latest
|
||||
$ go install github.com/surdeus/goblin/src/tool/awk@latest
|
||||
|
||||
$ goawk 'BEGIN { print "foo", 42 }'
|
||||
foo 42
|
||||
|
@ -82,9 +82,9 @@ if err != nil {
|
|||
// 3:abc
|
||||
```
|
||||
|
||||
If you need to repeat execution of the same program on different inputs, you can call [`interp.New`](https://pkg.go.dev/github.com/benhoyt/goawk/interp#New) once, and then call the returned object's `Execute` method as many times as you need.
|
||||
If you need to repeat execution of the same program on different inputs, you can call [`interp.New`](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk/interp#New) once, and then call the returned object's `Execute` method as many times as you need.
|
||||
|
||||
Read the [package documentation](https://pkg.go.dev/github.com/benhoyt/goawk) for more details.
|
||||
Read the [package documentation](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk) for more details.
|
||||
|
||||
|
||||
## Differences from AWK
|
||||
|
@ -93,7 +93,7 @@ The intention is for GoAWK to conform to `awk`'s behavior and to the [POSIX AWK
|
|||
|
||||
Additional features GoAWK has over AWK:
|
||||
|
||||
* It has proper support for CSV and TSV files ([read the documentation](https://github.com/benhoyt/goawk/blob/master/csv.md)).
|
||||
* It has proper support for CSV and TSV files ([read the documentation](https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md)).
|
||||
* It supports negative field indexes to access fields from the right, for example, `$-1` refers to the last field.
|
||||
* It's embeddable in your Go programs! You can even call custom Go functions from your AWK scripts.
|
||||
* Most AWK scripts are faster than `awk` and on a par with `gawk`, though usually slower than `mawk`. (See [recent benchmarks](https://benhoyt.com/writings/goawk-compiler-vm/#virtual-machine-results).)
|
||||
|
@ -112,12 +112,12 @@ This project has a good suite of tests, which include my own intepreter tests, t
|
|||
|
||||
## AWKGo
|
||||
|
||||
The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can [read more about AWKGo](https://benhoyt.com/writings/awkgo/) or browse the code on the [`awkgo` branch](https://github.com/benhoyt/goawk/tree/awkgo/awkgo).
|
||||
The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can [read more about AWKGo](https://benhoyt.com/writings/awkgo/) or browse the code on the [`awkgo` branch](https://github.com/surdeus/goblin/src/tool/awk/tree/awkgo/awkgo).
|
||||
|
||||
|
||||
## License
|
||||
|
||||
GoAWK is licensed under an open source [MIT license](https://github.com/benhoyt/goawk/blob/master/LICENSE.txt).
|
||||
GoAWK is licensed under an open source [MIT license](https://github.com/surdeus/goblin/src/tool/awk/blob/master/LICENSE.txt).
|
||||
|
||||
|
||||
## The end
|
||||
|
|
Loading…
Reference in a new issue