awk to extract a block of text

Accepted Answer

April 11, 2025 Score: 7 Rep: 38,352 Quality: Expert Completeness: 60%

I would ameloriate your code


awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt

following way, let dump.txt content be

undesired text Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768 another undesired text more undesired text

then


awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/' dump.txt

gives output

Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768

Explanation: I altered ending condition by requiring line to start with 2 white-space character followed by any alphabetic character AND (&&) NOT (!) being Board Info line (by negating start condition).

(tested in GNU Awk 5.3.1)

How do I modify what you have to just print the indented block?

You might add action which will print if there are (at least) 3 white-space character at beginning of line following way


awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/{if(/^\s{3}/){print}}' dump.txt

which will give following output

Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE."

April 11, 2025 Score: 8 Rep: 164,872 Quality: Medium Completeness: 50%

Assuming there are only indented lines with 4 whitespace characters after the start, you can make sure that there is at least a single indented line present, or else do not print anything.


awk '
  /^\s{2}Board Info/ {
    start = 1; buffer = $0; indent = 0; next
  }
  start && /^\s{4}\S/ {
    buffer = buffer == "" ? $0 : buffer "\n" $0;
    indent = 1;
  }
  start && /^\s{2}\S/ {
    if (indent) {
      print buffer "\n" $0;
    }
    start = 0;
  }
' dump.txt

Not sure if you want to print the start and the closing line, but you can omit printing them if you want by not adding them to the buffer.

You could change these lines:


start = 1; buffer = ""; indent = 0; next

And:


print buffer;

April 11, 2025 Score: 8 Rep: 790,917 Quality: Medium Completeness: 70%

This awk solution should work with any version of awk:

awk '/^  [^[:blank:]]/ { blk = index($0, "Board Info:"); next } blk' dump.txt    Manufacturer: "Dell Inc."
    Product: "0X3D66"
    Version: "A02"
    Serial: "..CN7016343F00IE."

Explanation:

Regular expression /^ [^[:blank:]]/ matches a line with 2 spaces at the start followed by any non-whitespace character.
blk = index($0, "Board Info:"): sets value of the flag blk to 1 or 0 depending on the condition whether line contains Board Info: or not.
blk in the end prints a line if blk is 1

April 12, 2025 Score: 6 Rep: 209,032 Quality: High Completeness: 100%

Using any awk:


$ awk '/^  [^ ]/{ f=(/Board Info/); next } f' dump.txt
    Manufacturer: "Dell Inc."
    Product: "0X3D66"
    Version: "A02"
    Serial: "..CN7016343F00IE."

The awk script says in pseudo-code:


' = WHILE lines to be read into $0 DO:
    /^  [^ ]/ { = IF the current line starts with 2 blanks then a non-blank THEN
        f=(/Board Info/) = IF the current line contains "Board Info" THEN
                               set f=1
                           ELSE
                               set f=0
                           ENDIF
        next = stop processing the current line
    } = ENDIF
    f = IF f is non-zero THEN
            print the current record
        ENDIF
' = DONE

so if a line starts with only 2 blanks and contains Board Info then f gets set to 1 and awk skips to the next line. The next 4 lines start with more than 2 blanks so the /^ [^ ]/ can't be true so awk just evaluates f which is still 1 and so the lines after Board Info get printed. Next time we sas a line that starts with only 2 blanks it doesn;t contain Board Info, it contains Chassis Info instead, so f is set to 0 and no subsequent lines are printed. Regarding your original code:


awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt

Using range expressions (/start/,/end/) is usually trickier to get right in general and always harder to enhance than using a flag (f in my code), often leading to duplicate code or other bad software. See Is a /start/,/end/ range expression ever useful in awk? for more info.
Only GNU awk would recognize \s as shorthand for [[:space:]] so that makes your code non-portable but both constructs are unnecessary anyway when your spaces are all just blank chars.

April 11, 2025 Score: 1 Rep: 3,062 Quality: Low Completeness: 50%

A minor tweak of FS is all you need for a printing 1/0 indicator without using pattern ranges :

echo ' Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Chassis Info: #768' |


awk '^ < NF ?  = !! index($!NF, " Board Info:") : ! < NF' FS='^  [A-Z]'

1 Board Info: #512 2 Manufacturer: "Dell Inc." 3 Product: "0X3D66" 4 Version: "A02" 5 Serial: "..CN7016343F00IE."

April 13, 2025 Score: 1 Rep: 1 Quality: Low Completeness: 60%

I know you said awk, but would you consider Perl?

Sometimes Perl is a bit easier when there are newlines in the solution, because Perl can grab the entire file at once. Then sort through the output using a regex. I created a sample output file with two Board Info: sections with some sample text before, after, and in between them.

The regular expression works like this, when you see Board Info: line, backreference everything non-greedy until you see a newline, and two spaces followed by a non space character, then close backreference at the end of the line.

Regex looks like this...


/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g

I'm not sure if you want the last line or not. If you don't want it, change the regex to close the backreference before the closing section, like this...


/( {2}Board Info:[\w\W]?\n) {2}\S.\n/g

Here is the code...


#!/usr/bin/perl -w
my @boardInfo;undef $/; #grab entire file at once because there are newlines
while(){ #while loop goes exactly once per file
  while(/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g){ #when you see Board Info: line, backreference everything non-greedy
    push(@boardInfo,$1);                            #until you see a newline, and two spaces followed by a non space character,
  }                                                  #then close backreference at the end of the line
}
my $size = @boardInfo;
print "found $size \"Board Info:\" sections\n\n";
for(0 .. $#boardInfo){
  print "match $:\n$boardInfo[$]\n";
}
print "\n";

Output looks like this...

$ perl extract.block.pl extract.block.txt found 2 "Board Info:" sections match 0: Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Chassis Info: #768

match 1: Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768

Golfed at 61 characters


perl -0777 -ne 'while(/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g){print "$1\n"}' extract.block.txt

April 14, 2025 Score: -1 Rep: 1,027 Quality: Low Completeness: 60%

Raku/Sparrow is ideal for this task:

between: { ^^ \s\s "Board Info:" }  { ^^ \s\s \S+ }
  regexp: ^^ \s\s\s\s .*
end:
code:  $l {
      say $l;
    }
    say "===";
}
RAKU

Input:

  Board Info: #512
      Manufacturer: "Dell Inc."
      Product: "0X3D66"
      Version: "A02"
      Serial: "..CN7016343F00IE."
  Chassis Info: #768
  Board Info: #513
      Manufacturer: "Dell2 Inc."
      Product: "0X3D68"
      Version: "A03"
      Serial: "..CN7016346F00IE."
  bla bla bla: #769

Output:

[task check]
stdout match (r)  True
      Manufacturer: "Dell Inc."
      Product: "0X3D66"
      Version: "A02"
      Serial: "..CN7016343F00IE."
===
      Manufacturer: "Dell2 Inc."
      Product: "0X3D68"
      Version: "A03"
      Serial: "..CN7016346F00IE."
===

awk to extract a block of text

Question Details

Tags

Answers (7)

Manufacturer: "Dell Inc."

Product: "0X3D66"

Version: "A02"

Serial: "..CN7016343F00IE."

===

Manufacturer: "Dell2 Inc."

Product: "0X3D68"

Version: "A03"

Serial: "..CN7016346F00IE."

===

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data