Embedded the awk.

This commit is contained in:
Andrey Parhomenko 2023-02-01 02:41:58 +05:00
parent 80356b3878
commit d64be96052
23 changed files with 56 additions and 59 deletions

View file

@ -27,7 +27,7 @@ import(
"github.com/surdeus/goblin/src/tool/useprog"
"github.com/surdeus/goblin/src/tool/path"
"github.com/surdeus/goblin/src/tool/mk"
//"github.com/surdeus/goblin/src/tool/awk"
"github.com/surdeus/goblin/src/tool/awk"
)
func main() {
@ -57,7 +57,7 @@ func main() {
"useprog" : mtool.Tool{useprog.Run, "print the name of the first existing program in arg list"},
"path" : mtool.Tool{path.Run, "print cross platform path based on cmd arguments"},
"mk" : mtool.Tool{mk.Run, "file dependency system, simpler make"},
//"awk" : mtool.Tool{awk.Run, "simple scripting language for working with string templates"},
"awk" : mtool.Tool{awk.Run, "simple scripting language for working with string templates"},
}
mtool.Main("goblin", tools)

View file

@ -93,12 +93,12 @@ if err != nil { ... }
Note that `INPUTMODE` and `OUTPUTMODE` set using `Vars` or in the `BEGIN` block will override these settings.
See the [full reference documentation](https://pkg.go.dev/github.com/benhoyt/goawk/interp#Config) for the `interp.Config` struct.
See the [full reference documentation](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk/interp#Config) for the `interp.Config` struct.
## Examples
Below are some examples using the [testdata/csv/states.csv](https://github.com/benhoyt/goawk/blob/master/testdata/csv/states.csv) file, which is a simple CSV file whose contents are as follows:
Below are some examples using the [testdata/csv/states.csv](https://github.com/surdeus/goblin/src/tool/awk/blob/master/testdata/csv/states.csv) file, which is a simple CSV file whose contents are as follows:
```
"State","Abbreviation"
@ -278,7 +278,7 @@ NY
The [csvkit](https://csvkit.readthedocs.io/en/latest/index.html) suite is a set of tools that allow you to quickly analyze and extract fields from CSV files. Each csvkit tool allows you to do a specific task; GoAWK is more low-level and verbose, but also a more general tool ([`csvsql`](https://csvkit.readthedocs.io/en/latest/tutorial/3_power_tools.html#csvsql-and-sql2csv-ultimate-power) being the exception!). GoAWK also runs significantly faster than csvkit (the latter is written in Python).
Below are a few snippets showing how you'd do some of the tasks in the csvkit documentation, but using GoAWK (the input file is [testdata/csv/nz-schools.csv](https://github.com/benhoyt/goawk/blob/master/testdata/csv/nz-schools.csv)):
Below are a few snippets showing how you'd do some of the tasks in the csvkit documentation, but using GoAWK (the input file is [testdata/csv/nz-schools.csv](https://github.com/surdeus/goblin/src/tool/awk/blob/master/testdata/csv/nz-schools.csv)):
### csvkit example: print column names
@ -363,7 +363,7 @@ $ goawk -i csv -H '/Girls/ { d+=@"Decile"; n++ } END { print d/n }' testdata/csv
The performance of GoAWK's CSV input and output mode is quite good, on a par with using the `encoding/csv` package from Go directly, and much faster than the `csv` module in Python. CSV input speed is significantly slower than `frawk`, though CSV output speed is significantly faster than `frawk`.
Below are the results of some simple read and write [benchmarks](https://github.com/benhoyt/goawk/blob/master/scripts/csvbench) using `goawk` and `frawk` as well as plain Python and Go. The output of the write benchmarks is a 1GB, 3.5 million row CSV file with 20 columns (including quoted columns); the input for the read benchmarks uses that same file. Times are in seconds, showing the best of three runs on a 64-bit Linux laptop with an SSD drive:
Below are the results of some simple read and write [benchmarks](https://github.com/surdeus/goblin/src/tool/awk/blob/master/scripts/csvbench) using `goawk` and `frawk` as well as plain Python and Go. The output of the write benchmarks is a 1GB, 3.5 million row CSV file with 20 columns (including quoted columns); the input for the read benchmarks uses that same file. Times are in seconds, showing the best of three runs on a 64-bit Linux laptop with an SSD drive:
Test | goawk | frawk | Python | Go
--------------- | ----- | ----- | ------ | ----
@ -378,10 +378,10 @@ Writing 1GB CSV | 5.64 | 13.0 | 17.0 | 3.24
- keys would be ordered by `OFIELDS` (eg: `OFIELDS[1] = "name"; OFIELDS[2] = "age"`) or by "smart name" if `OFIELDS` not set ("smart name" meaning numeric if `a` keys are numeric, string otherwise)
- `printrow(a)` could take an optional second `fields` array arg to use that instead of the global `OFIELDS`
* Consider allowing `-H` to accept an optional list of field names which could be used as headers in the absence of headers in the file itself (either `-H=name,age` or `-i 'csv header=name,age'`).
* Consider adding TrimLeadingSpace CSV input option. See: https://github.com/benhoyt/goawk/issues/109
* Consider adding TrimLeadingSpace CSV input option. See: https://github.com/surdeus/goblin/src/tool/awk/issues/109
* Consider supporting `@"id" = 42` named field assignment.
## Feedback
Please [open an issue](https://github.com/benhoyt/goawk/issues) if you have bug reports or feature requests for GoAWK's CSV support.
Please [open an issue](https://github.com/surdeus/goblin/src/tool/awk/issues) if you have bug reports or feature requests for GoAWK's CSV support.

View file

@ -1,3 +0,0 @@
module github.com/benhoyt/goawk
go 1.14

Binary file not shown.

View file

@ -38,9 +38,9 @@ import (
"strings"
"unicode/utf8"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/lexer"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/lexer"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
const (

View file

@ -18,8 +18,8 @@ import (
"sync"
"testing"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
var (

View file

@ -7,7 +7,7 @@ import (
"strconv"
"strings"
. "github.com/benhoyt/goawk/lexer"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Program is an entire AWK program.

View file

@ -6,8 +6,8 @@ import (
"math"
"regexp"
"github.com/benhoyt/goawk/internal/ast"
"github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
"github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Program holds an entire compiled program.

View file

@ -7,8 +7,8 @@ import (
"io"
"strings"
"github.com/benhoyt/goawk/internal/ast"
"github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
"github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Disassemble writes a human-readable form of the program's virtual machine

View file

@ -9,8 +9,8 @@ import (
"fmt"
"strings"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
func Example() {

View file

@ -12,8 +12,8 @@ import (
"strings"
"unicode/utf8"
"github.com/benhoyt/goawk/internal/ast"
. "github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Call native-defined function with given name and arguments, return

View file

@ -13,8 +13,8 @@ import (
"testing"
"time"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
func isFuzzTest(test interpTest) bool {

View file

@ -28,9 +28,9 @@ import (
"strings"
"unicode/utf8"
"github.com/benhoyt/goawk/internal/ast"
"github.com/benhoyt/goawk/internal/compiler"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
var (
@ -250,7 +250,7 @@ type Config struct {
// or "tsv" in Vars or in the BEGIN block (those override this setting).
//
// For further documentation about GoAWK's CSV support, see the full docs:
// https://github.com/benhoyt/goawk/blob/master/csv.md
// https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md
InputMode IOMode
// Additional options if InputMode is CSVMode or TSVMode. The zero value

View file

@ -18,8 +18,8 @@ import (
"sync"
"testing"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
var (

View file

@ -17,8 +17,8 @@ import (
"strings"
"unicode/utf8"
"github.com/benhoyt/goawk/internal/ast"
. "github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Print a line of output followed by a newline

View file

@ -6,7 +6,7 @@ import (
"context"
"math"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
const checkContextOps = 1000 // for efficiency, only check context every N instructions

View file

@ -10,8 +10,8 @@ import (
"testing"
"time"
"github.com/benhoyt/goawk/interp"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/interp"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
// This definitely doesn't test that everything was reset, but it's a good start.

View file

@ -10,15 +10,15 @@ import (
"strings"
"time"
"github.com/benhoyt/goawk/internal/ast"
"github.com/benhoyt/goawk/internal/compiler"
"github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
"github.com/surdeus/goblin/src/tool/awk/lexer"
)
// Execute a block of virtual machine instructions.
//
// A big switch seems to be the best way of doing this for now. I also tried
// an array of functions (https://github.com/benhoyt/goawk/commit/8e04b069b621ff9b9456de57a35ff2fe335cf201)
// an array of functions (https://github.com/surdeus/goblin/src/tool/awk/commit/8e04b069b621ff9b9456de57a35ff2fe335cf201)
// and it was ever so slightly faster, but the code was harder to work with
// and it won't be improved when Go gets faster switches via jump tables
// (https://go-review.googlesource.com/c/go/+/357330/).
@ -1205,7 +1205,7 @@ func (p *interp) getline(redirect lexer.Token) (float64, string, error) {
if err != nil {
if _, ok := err.(*os.PathError); ok {
// File not found is not a hard error, getline just returns -1.
// See: https://github.com/benhoyt/goawk/issues/41
// See: https://github.com/surdeus/goblin/src/tool/awk/issues/41
return -1, "", nil
}
return 0, "", err

View file

@ -8,7 +8,7 @@ import (
"strings"
"testing"
. "github.com/benhoyt/goawk/lexer"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
func TestLexer(t *testing.T) {

View file

@ -11,9 +11,9 @@ import (
"strconv"
"strings"
"github.com/benhoyt/goawk/internal/ast"
"github.com/benhoyt/goawk/internal/compiler"
. "github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
"github.com/surdeus/goblin/src/tool/awk/internal/compiler"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
// ParseError (actually *ParseError) is the type of error returned by

View file

@ -8,7 +8,7 @@ import (
"strings"
"testing"
"github.com/benhoyt/goawk/parser"
"github.com/surdeus/goblin/src/tool/awk/parser"
)
// NOTE: apart from TestParseAndString, the parser doesn't have

View file

@ -7,8 +7,8 @@ import (
"reflect"
"sort"
"github.com/benhoyt/goawk/internal/ast"
. "github.com/benhoyt/goawk/lexer"
"github.com/surdeus/goblin/src/tool/awk/internal/ast"
. "github.com/surdeus/goblin/src/tool/awk/lexer"
)
type varType int

View file

@ -1,13 +1,13 @@
# GoAWK: an AWK interpreter with CSV support
[![Documentation](https://pkg.go.dev/badge/github.com/benhoyt/goawk)](https://pkg.go.dev/github.com/benhoyt/goawk)
[![GitHub Actions Build](https://github.com/benhoyt/goawk/workflows/Go/badge.svg)](https://github.com/benhoyt/goawk/actions?query=workflow%3AGo)
[![Documentation](https://pkg.go.dev/badge/github.com/surdeus/goblin/src/tool/awk)](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk)
[![GitHub Actions Build](https://github.com/surdeus/goblin/src/tool/awk/workflows/Go/badge.svg)](https://github.com/surdeus/goblin/src/tool/awk/actions?query=workflow%3AGo)
AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse [*The AWK Programming Language*](https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf) I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" and GNU AWK test suites.
GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the [library of the University of Antwerp](https://www.uantwerpen.be/en/library/). Read the [CSV documentation](https://github.com/benhoyt/goawk/blob/master/csv.md).
GoAWK is a POSIX-compatible version of AWK, and additionally has a CSV mode for reading and writing CSV and TSV files. This feature was sponsored by the [library of the University of Antwerp](https://www.uantwerpen.be/en/library/). Read the [CSV documentation](https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md).
You can also read one of the articles I've written about GoAWK:
@ -21,7 +21,7 @@ You can also read one of the articles I've written about GoAWK:
To use the command-line version, simply use `go install` to install it, and then run it using `goawk` (assuming `~/go/bin` is in your `PATH`):
```shell
$ go install github.com/benhoyt/goawk@latest
$ go install github.com/surdeus/goblin/src/tool/awk@latest
$ goawk 'BEGIN { print "foo", 42 }'
foo 42
@ -82,9 +82,9 @@ if err != nil {
// 3:abc
```
If you need to repeat execution of the same program on different inputs, you can call [`interp.New`](https://pkg.go.dev/github.com/benhoyt/goawk/interp#New) once, and then call the returned object's `Execute` method as many times as you need.
If you need to repeat execution of the same program on different inputs, you can call [`interp.New`](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk/interp#New) once, and then call the returned object's `Execute` method as many times as you need.
Read the [package documentation](https://pkg.go.dev/github.com/benhoyt/goawk) for more details.
Read the [package documentation](https://pkg.go.dev/github.com/surdeus/goblin/src/tool/awk) for more details.
## Differences from AWK
@ -93,7 +93,7 @@ The intention is for GoAWK to conform to `awk`'s behavior and to the [POSIX AWK
Additional features GoAWK has over AWK:
* It has proper support for CSV and TSV files ([read the documentation](https://github.com/benhoyt/goawk/blob/master/csv.md)).
* It has proper support for CSV and TSV files ([read the documentation](https://github.com/surdeus/goblin/src/tool/awk/blob/master/csv.md)).
* It supports negative field indexes to access fields from the right, for example, `$-1` refers to the last field.
* It's embeddable in your Go programs! You can even call custom Go functions from your AWK scripts.
* Most AWK scripts are faster than `awk` and on a par with `gawk`, though usually slower than `mawk`. (See [recent benchmarks](https://benhoyt.com/writings/goawk-compiler-vm/#virtual-machine-results).)
@ -112,12 +112,12 @@ This project has a good suite of tests, which include my own intepreter tests, t
## AWKGo
The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can [read more about AWKGo](https://benhoyt.com/writings/awkgo/) or browse the code on the [`awkgo` branch](https://github.com/benhoyt/goawk/tree/awkgo/awkgo).
The GoAWK repository also includes the creatively-named AWKGo, an AWK-to-Go compiler. This is experimental and is not subject to the stability requirements of GoAWK itself. You can [read more about AWKGo](https://benhoyt.com/writings/awkgo/) or browse the code on the [`awkgo` branch](https://github.com/surdeus/goblin/src/tool/awk/tree/awkgo/awkgo).
## License
GoAWK is licensed under an open source [MIT license](https://github.com/benhoyt/goawk/blob/master/LICENSE.txt).
GoAWK is licensed under an open source [MIT license](https://github.com/surdeus/goblin/src/tool/awk/blob/master/LICENSE.txt).
## The end