Question Details

No question body available.

Tags

regex linux awk

Answers (7)

Accepted Answer Available
Accepted Answer
April 11, 2025 Score: 7 Rep: 38,352 Quality: Expert Completeness: 60%

I would ameloriate your code

awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt

following way, let dump.txt content be

undesired text Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768 another undesired text more undesired text

then

awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/' dump.txt

gives output

Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768

Explanation: I altered ending condition by requiring line to start with 2 white-space character followed by any alphabetic character AND (&&) NOT (!) being Board Info line (by negating start condition).

(tested in GNU Awk 5.3.1)

How do I modify what you have to just print the indented block?

You might add action which will print if there are (at least) 3 white-space character at beginning of line following way

awk '/^\s{2}Board Info/,/^\s{2}[[:alpha:]]/&&!/^\s{2}Board Info/{if(/^\s{3}/){print}}' dump.txt

which will give following output

Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE."
April 11, 2025 Score: 8 Rep: 164,872 Quality: Medium Completeness: 50%

Assuming there are only indented lines with 4 whitespace characters after the start, you can make sure that there is at least a single indented line present, or else do not print anything.

awk ' /^\s{2}Board Info/ { start = 1; buffer = $0; indent = 0; next } start && /^\s{4}\S/ { buffer = buffer == "" ? $0 : buffer "\n" $0; indent = 1; } start && /^\s{2}\S/ { if (indent) { print buffer "\n" $0; } start = 0; } ' dump.txt

Not sure if you want to print the start and the closing line, but you can omit printing them if you want by not adding them to the buffer.

You could change these lines:

start = 1; buffer = ""; indent = 0; next

And:

print buffer;
April 11, 2025 Score: 8 Rep: 790,917 Quality: Medium Completeness: 70%

This awk solution should work with any version of awk:

awk '/^  [^[:blank:]]/ { blk = index($0, "Board Info:"); next } blk' dump.txt

Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE."

Explanation:

  • Regular expression /^ [^[:blank:]]/ matches a line with 2 spaces at the start followed by any non-whitespace character.
  • blk = index($0, "Board Info:"): sets value of the flag blk to 1 or 0 depending on the condition whether line contains Board Info: or not.
  • blk in the end prints a line if blk is 1
April 12, 2025 Score: 6 Rep: 209,032 Quality: High Completeness: 100%

Using any awk:

$ awk '/^ [^ ]/{ f=(/Board Info/); next } f' dump.txt Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE."

The awk script says in pseudo-code:

' = WHILE lines to be read into $0 DO: /^ [^ ]/ { = IF the current line starts with 2 blanks then a non-blank THEN f=(/Board Info/) = IF the current line contains "Board Info" THEN set f=1 ELSE set f=0 ENDIF next = stop processing the current line } = ENDIF f = IF f is non-zero THEN print the current record ENDIF ' = DONE

so if a line starts with only 2 blanks and contains Board Info then f gets set to 1 and awk skips to the next line. The next 4 lines start with more than 2 blanks so the /^ [^ ]/ can't be true so awk just evaluates f which is still 1 and so the lines after Board Info get printed. Next time we sas a line that starts with only 2 blanks it doesn;t contain Board Info, it contains Chassis Info instead, so f is set to 0 and no subsequent lines are printed. Regarding your original code:

awk '/^\s{2}Board Info/,/^\s{2}[^B ]/' dump.txt
  1. Using range expressions (/start/,/end/) is usually trickier to get right in general and always harder to enhance than using a flag (f in my code), often leading to duplicate code or other bad software. See Is a /start/,/end/ range expression ever useful in awk? for more info.
  2. Only GNU awk would recognize \s as shorthand for [[:space:]] so that makes your code non-portable but both constructs are unnecessary anyway when your spaces are all just blank chars.
April 11, 2025 Score: 1 Rep: 3,062 Quality: Low Completeness: 50%

A minor tweak of FS is all you need for a printing 1/0 indicator without using pattern ranges :

echo ' Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Chassis Info: #768' |

awk '^ < NF ? = !! index($!NF, " Board Info:") : ! < NF' FS='^ [A-Z]'

1 Board Info: #512 2 Manufacturer: "Dell Inc." 3 Product: "0X3D66" 4 Version: "A02" 5 Serial: "..CN7016343F00IE."
April 13, 2025 Score: 1 Rep: 1 Quality: Low Completeness: 60%

I know you said awk, but would you consider Perl?

Sometimes Perl is a bit easier when there are newlines in the solution, because Perl can grab the entire file at once. Then sort through the output using a regex. I created a sample output file with two Board Info: sections with some sample text before, after, and in between them.

The regular expression works like this, when you see Board Info: line, backreference everything non-greedy until you see a newline, and two spaces followed by a non space character, then close backreference at the end of the line.

Regex looks like this...

/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g

I'm not sure if you want the last line or not. If you don't want it, change the regex to close the backreference before the closing section, like this...

/( {2}Board Info:[\w\W]?\n) {2}\S.\n/g

Here is the code...

#!/usr/bin/perl -w

my @boardInfo;

undef $/; #grab entire file at once because there are newlines while(){ #while loop goes exactly once per file while(/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g){ #when you see Board Info: line, backreference everything non-greedy push(@boardInfo,$1); #until you see a newline, and two spaces followed by a non space character, } #then close backreference at the end of the line } my $size = @boardInfo; print "found $size \"Board Info:\" sections\n\n"; for(0 .. $#boardInfo){ print "match $:\n$boardInfo[$]\n"; } print "\n";

Output looks like this...

$ perl extract.block.pl extract.block.txt found 2 "Board Info:" sections

match 0: Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Chassis Info: #768

match 1: Board Info: #512 Manufacturer: "Dell Inc." Product: "0X3D66" Version: "A02" Serial: "..CN7016343F00IE." Other Info: #768

Golfed at 61 characters

perl -0777 -ne 'while(/( {2}Board Info:[\w\W]?\n {2}\S.\n)/g){print "$1\n"}' extract.block.txt
April 14, 2025 Score: -1 Rep: 1,027 Quality: Low Completeness: 60%

Raku/Sparrow is ideal for this task:

between: { ^^ \s\s "Board Info:" }  { ^^ \s\s \S+ }
  regexp: ^^ \s\s\s\s .*
end:

code: $l { say $l; } say "==="; } RAKU

Input:

  Board Info: #512
      Manufacturer: "Dell Inc."
      Product: "0X3D66"
      Version: "A02"
      Serial: "..CN7016343F00IE."
  Chassis Info: #768

Board Info: #513 Manufacturer: "Dell2 Inc." Product: "0X3D68" Version: "A03" Serial: "..CN7016346F00IE." bla bla bla: #769

Output:

[task check]
stdout match (r)  True

Manufacturer: "Dell Inc."

Product: "0X3D66"

Version: "A02"

Serial: "..CN7016343F00IE."

===

Manufacturer: "Dell2 Inc."

Product: "0X3D68"

Version: "A03"

Serial: "..CN7016346F00IE."

===