Count lines containing word
I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:
0 hello world the man is world
1 this is the world
2 a different man is the possible one
The result I'm expecting is:
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.
text-processing
New contributor
add a comment |
I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:
0 hello world the man is world
1 this is the world
2 a different man is the possible one
The result I'm expecting is:
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.
text-processing
New contributor
What have you try to the moment?
– Romeo Ninov
4 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago
add a comment |
I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:
0 hello world the man is world
1 this is the world
2 a different man is the possible one
The result I'm expecting is:
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.
text-processing
New contributor
I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:
0 hello world the man is world
1 this is the world
2 a different man is the possible one
The result I'm expecting is:
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.
text-processing
text-processing
New contributor
New contributor
edited 1 hour ago
Jeff Schaller
38.9k1053125
38.9k1053125
New contributor
asked 4 hours ago
Netzsooc
485
485
New contributor
New contributor
What have you try to the moment?
– Romeo Ninov
4 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago
add a comment |
What have you try to the moment?
– Romeo Ninov
4 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago
What have you try to the moment?
– Romeo Ninov
4 hours ago
What have you try to the moment?
– Romeo Ninov
4 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago
add a comment |
4 Answers
4
active
oldest
votes
Another Perl variant, using List::Util
$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2
add a comment |
It's a pretty straight-forward perl script:
#!/usr/bin/perl -w
use strict;
my %words = ();
while (<>) {
chomp;
my %linewords = ();
map { $linewords{$_}=1 } split / /;
foreach my $word (keys %linewords) {
$words{$word}++;
}
}
foreach my $word (sort keys %words) {
print "$word:$words{$word}n";
}
The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
add a comment |
Straightfoward-ish in bash:
declare -A wordcount
while read -ra words; do
# unique words on this line
declare -A uniq
for word in "${words[@]}"; do
uniq[$word]=1
done
# accumulate the words
for word in "${!uniq[@]}"; do
((wordcount[$word]++))
done
unset uniq
done < file
Looking at the data:
$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'
and formatting as you want:
$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
add a comment |
A solution that calls several programs from a shell:
fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'
Here, the string "pattern", given through the -I
option, is a placeholder for xargs
that it substitutes for each single line in its standard input.
The indirection with sh
is needed because xargs
only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs
does not handle things like command substitution or piping.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Another Perl variant, using List::Util
$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2
add a comment |
Another Perl variant, using List::Util
$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2
add a comment |
Another Perl variant, using List::Util
$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2
Another Perl variant, using List::Util
$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2
answered 3 hours ago
steeldriver
34.5k35083
34.5k35083
add a comment |
add a comment |
It's a pretty straight-forward perl script:
#!/usr/bin/perl -w
use strict;
my %words = ();
while (<>) {
chomp;
my %linewords = ();
map { $linewords{$_}=1 } split / /;
foreach my $word (keys %linewords) {
$words{$word}++;
}
}
foreach my $word (sort keys %words) {
print "$word:$words{$word}n";
}
The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
add a comment |
It's a pretty straight-forward perl script:
#!/usr/bin/perl -w
use strict;
my %words = ();
while (<>) {
chomp;
my %linewords = ();
map { $linewords{$_}=1 } split / /;
foreach my $word (keys %linewords) {
$words{$word}++;
}
}
foreach my $word (sort keys %words) {
print "$word:$words{$word}n";
}
The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
add a comment |
It's a pretty straight-forward perl script:
#!/usr/bin/perl -w
use strict;
my %words = ();
while (<>) {
chomp;
my %linewords = ();
map { $linewords{$_}=1 } split / /;
foreach my $word (keys %linewords) {
$words{$word}++;
}
}
foreach my $word (sort keys %words) {
print "$word:$words{$word}n";
}
The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.
It's a pretty straight-forward perl script:
#!/usr/bin/perl -w
use strict;
my %words = ();
while (<>) {
chomp;
my %linewords = ();
map { $linewords{$_}=1 } split / /;
foreach my $word (keys %linewords) {
$words{$word}++;
}
}
foreach my $word (sort keys %words) {
print "$word:$words{$word}n";
}
The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.
answered 3 hours ago
Jeff Schaller
38.9k1053125
38.9k1053125
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
add a comment |
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
1
1
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
2 hours ago
add a comment |
Straightfoward-ish in bash:
declare -A wordcount
while read -ra words; do
# unique words on this line
declare -A uniq
for word in "${words[@]}"; do
uniq[$word]=1
done
# accumulate the words
for word in "${!uniq[@]}"; do
((wordcount[$word]++))
done
unset uniq
done < file
Looking at the data:
$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'
and formatting as you want:
$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
add a comment |
Straightfoward-ish in bash:
declare -A wordcount
while read -ra words; do
# unique words on this line
declare -A uniq
for word in "${words[@]}"; do
uniq[$word]=1
done
# accumulate the words
for word in "${!uniq[@]}"; do
((wordcount[$word]++))
done
unset uniq
done < file
Looking at the data:
$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'
and formatting as you want:
$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
add a comment |
Straightfoward-ish in bash:
declare -A wordcount
while read -ra words; do
# unique words on this line
declare -A uniq
for word in "${words[@]}"; do
uniq[$word]=1
done
# accumulate the words
for word in "${!uniq[@]}"; do
((wordcount[$word]++))
done
unset uniq
done < file
Looking at the data:
$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'
and formatting as you want:
$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
Straightfoward-ish in bash:
declare -A wordcount
while read -ra words; do
# unique words on this line
declare -A uniq
for word in "${words[@]}"; do
uniq[$word]=1
done
# accumulate the words
for word in "${!uniq[@]}"; do
((wordcount[$word]++))
done
unset uniq
done < file
Looking at the data:
$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'
and formatting as you want:
$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2
answered 2 hours ago
glenn jackman
50.4k570107
50.4k570107
add a comment |
add a comment |
A solution that calls several programs from a shell:
fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'
Here, the string "pattern", given through the -I
option, is a placeholder for xargs
that it substitutes for each single line in its standard input.
The indirection with sh
is needed because xargs
only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs
does not handle things like command substitution or piping.
add a comment |
A solution that calls several programs from a shell:
fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'
Here, the string "pattern", given through the -I
option, is a placeholder for xargs
that it substitutes for each single line in its standard input.
The indirection with sh
is needed because xargs
only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs
does not handle things like command substitution or piping.
add a comment |
A solution that calls several programs from a shell:
fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'
Here, the string "pattern", given through the -I
option, is a placeholder for xargs
that it substitutes for each single line in its standard input.
The indirection with sh
is needed because xargs
only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs
does not handle things like command substitution or piping.
A solution that calls several programs from a shell:
fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'
Here, the string "pattern", given through the -I
option, is a placeholder for xargs
that it substitutes for each single line in its standard input.
The indirection with sh
is needed because xargs
only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs
does not handle things like command substitution or piping.
answered 2 hours ago
Larry
964
964
add a comment |
add a comment |
Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What have you try to the moment?
– Romeo Ninov
4 hours ago
This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago