Count lines containing word












4














I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • What have you try to the moment?
    – Romeo Ninov
    4 hours ago










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    3 hours ago
















4














I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • What have you try to the moment?
    – Romeo Ninov
    4 hours ago










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    3 hours ago














4












4








4







I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.







text-processing






share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago









Jeff Schaller

38.9k1053125




38.9k1053125






New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 4 hours ago









Netzsooc

485




485




New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • What have you try to the moment?
    – Romeo Ninov
    4 hours ago










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    3 hours ago


















  • What have you try to the moment?
    – Romeo Ninov
    4 hours ago










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    3 hours ago
















What have you try to the moment?
– Romeo Ninov
4 hours ago




What have you try to the moment?
– Romeo Ninov
4 hours ago












This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago




This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
3 hours ago










4 Answers
4






active

oldest

votes


















4














Another Perl variant, using List::Util



$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2





share|improve this answer





























    3














    It's a pretty straight-forward perl script:



    #!/usr/bin/perl -w
    use strict;

    my %words = ();
    while (<>) {
    chomp;
    my %linewords = ();
    map { $linewords{$_}=1 } split / /;
    foreach my $word (keys %linewords) {
    $words{$word}++;
    }
    }

    foreach my $word (sort keys %words) {
    print "$word:$words{$word}n";
    }


    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






    share|improve this answer

















    • 1




      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
      – Larry
      2 hours ago



















    3














    Straightfoward-ish in bash:



    declare -A wordcount
    while read -ra words; do
    # unique words on this line
    declare -A uniq
    for word in "${words[@]}"; do
    uniq[$word]=1
    done
    # accumulate the words
    for word in "${!uniq[@]}"; do
    ((wordcount[$word]++))
    done
    unset uniq
    done < file


    Looking at the data:



    $ declare -p wordcount
    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


    and formatting as you want:



    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
    0:1
    1:1
    2:1
    a:1
    different:1
    hello:1
    is:3
    man:2
    one:1
    possible:1
    the:3
    this:1
    world:2





    share|improve this answer





























      1














      A solution that calls several programs from a shell:



      fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



      Here, the string "pattern", given through the -I option, is a placeholder for xargs that it substitutes for each single line in its standard input.



      The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution or piping.






      share|improve this answer





















        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "106"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });






        Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.










        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        4














        Another Perl variant, using List::Util



        $ perl -MList::Util=uniq -alne '
        map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
        ' file
        0: 1
        1: 1
        2: 1
        a: 1
        different: 1
        hello: 1
        is: 3
        man: 2
        one: 1
        possible: 1
        the: 3
        this: 1
        world: 2





        share|improve this answer


























          4














          Another Perl variant, using List::Util



          $ perl -MList::Util=uniq -alne '
          map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
          ' file
          0: 1
          1: 1
          2: 1
          a: 1
          different: 1
          hello: 1
          is: 3
          man: 2
          one: 1
          possible: 1
          the: 3
          this: 1
          world: 2





          share|improve this answer
























            4












            4








            4






            Another Perl variant, using List::Util



            $ perl -MList::Util=uniq -alne '
            map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
            ' file
            0: 1
            1: 1
            2: 1
            a: 1
            different: 1
            hello: 1
            is: 3
            man: 2
            one: 1
            possible: 1
            the: 3
            this: 1
            world: 2





            share|improve this answer












            Another Perl variant, using List::Util



            $ perl -MList::Util=uniq -alne '
            map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
            ' file
            0: 1
            1: 1
            2: 1
            a: 1
            different: 1
            hello: 1
            is: 3
            man: 2
            one: 1
            possible: 1
            the: 3
            this: 1
            world: 2






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 3 hours ago









            steeldriver

            34.5k35083




            34.5k35083

























                3














                It's a pretty straight-forward perl script:



                #!/usr/bin/perl -w
                use strict;

                my %words = ();
                while (<>) {
                chomp;
                my %linewords = ();
                map { $linewords{$_}=1 } split / /;
                foreach my $word (keys %linewords) {
                $words{$word}++;
                }
                }

                foreach my $word (sort keys %words) {
                print "$word:$words{$word}n";
                }


                The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                share|improve this answer

















                • 1




                  A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                  – Larry
                  2 hours ago
















                3














                It's a pretty straight-forward perl script:



                #!/usr/bin/perl -w
                use strict;

                my %words = ();
                while (<>) {
                chomp;
                my %linewords = ();
                map { $linewords{$_}=1 } split / /;
                foreach my $word (keys %linewords) {
                $words{$word}++;
                }
                }

                foreach my $word (sort keys %words) {
                print "$word:$words{$word}n";
                }


                The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                share|improve this answer

















                • 1




                  A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                  – Larry
                  2 hours ago














                3












                3








                3






                It's a pretty straight-forward perl script:



                #!/usr/bin/perl -w
                use strict;

                my %words = ();
                while (<>) {
                chomp;
                my %linewords = ();
                map { $linewords{$_}=1 } split / /;
                foreach my $word (keys %linewords) {
                $words{$word}++;
                }
                }

                foreach my $word (sort keys %words) {
                print "$word:$words{$word}n";
                }


                The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                share|improve this answer












                It's a pretty straight-forward perl script:



                #!/usr/bin/perl -w
                use strict;

                my %words = ();
                while (<>) {
                chomp;
                my %linewords = ();
                map { $linewords{$_}=1 } split / /;
                foreach my $word (keys %linewords) {
                $words{$word}++;
                }
                }

                foreach my $word (sort keys %words) {
                print "$word:$words{$word}n";
                }


                The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 3 hours ago









                Jeff Schaller

                38.9k1053125




                38.9k1053125








                • 1




                  A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                  – Larry
                  2 hours ago














                • 1




                  A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                  – Larry
                  2 hours ago








                1




                1




                A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                – Larry
                2 hours ago




                A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                – Larry
                2 hours ago











                3














                Straightfoward-ish in bash:



                declare -A wordcount
                while read -ra words; do
                # unique words on this line
                declare -A uniq
                for word in "${words[@]}"; do
                uniq[$word]=1
                done
                # accumulate the words
                for word in "${!uniq[@]}"; do
                ((wordcount[$word]++))
                done
                unset uniq
                done < file


                Looking at the data:



                $ declare -p wordcount
                declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                and formatting as you want:



                $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                0:1
                1:1
                2:1
                a:1
                different:1
                hello:1
                is:3
                man:2
                one:1
                possible:1
                the:3
                this:1
                world:2





                share|improve this answer


























                  3














                  Straightfoward-ish in bash:



                  declare -A wordcount
                  while read -ra words; do
                  # unique words on this line
                  declare -A uniq
                  for word in "${words[@]}"; do
                  uniq[$word]=1
                  done
                  # accumulate the words
                  for word in "${!uniq[@]}"; do
                  ((wordcount[$word]++))
                  done
                  unset uniq
                  done < file


                  Looking at the data:



                  $ declare -p wordcount
                  declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                  and formatting as you want:



                  $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                  0:1
                  1:1
                  2:1
                  a:1
                  different:1
                  hello:1
                  is:3
                  man:2
                  one:1
                  possible:1
                  the:3
                  this:1
                  world:2





                  share|improve this answer
























                    3












                    3








                    3






                    Straightfoward-ish in bash:



                    declare -A wordcount
                    while read -ra words; do
                    # unique words on this line
                    declare -A uniq
                    for word in "${words[@]}"; do
                    uniq[$word]=1
                    done
                    # accumulate the words
                    for word in "${!uniq[@]}"; do
                    ((wordcount[$word]++))
                    done
                    unset uniq
                    done < file


                    Looking at the data:



                    $ declare -p wordcount
                    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                    and formatting as you want:



                    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                    0:1
                    1:1
                    2:1
                    a:1
                    different:1
                    hello:1
                    is:3
                    man:2
                    one:1
                    possible:1
                    the:3
                    this:1
                    world:2





                    share|improve this answer












                    Straightfoward-ish in bash:



                    declare -A wordcount
                    while read -ra words; do
                    # unique words on this line
                    declare -A uniq
                    for word in "${words[@]}"; do
                    uniq[$word]=1
                    done
                    # accumulate the words
                    for word in "${!uniq[@]}"; do
                    ((wordcount[$word]++))
                    done
                    unset uniq
                    done < file


                    Looking at the data:



                    $ declare -p wordcount
                    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                    and formatting as you want:



                    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                    0:1
                    1:1
                    2:1
                    a:1
                    different:1
                    hello:1
                    is:3
                    man:2
                    one:1
                    possible:1
                    the:3
                    this:1
                    world:2






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 2 hours ago









                    glenn jackman

                    50.4k570107




                    50.4k570107























                        1














                        A solution that calls several programs from a shell:



                        fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                        Here, the string "pattern", given through the -I option, is a placeholder for xargs that it substitutes for each single line in its standard input.



                        The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution or piping.






                        share|improve this answer


























                          1














                          A solution that calls several programs from a shell:



                          fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                          Here, the string "pattern", given through the -I option, is a placeholder for xargs that it substitutes for each single line in its standard input.



                          The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution or piping.






                          share|improve this answer
























                            1












                            1








                            1






                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            Here, the string "pattern", given through the -I option, is a placeholder for xargs that it substitutes for each single line in its standard input.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution or piping.






                            share|improve this answer












                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            Here, the string "pattern", given through the -I option, is a placeholder for xargs that it substitutes for each single line in its standard input.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution or piping.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 2 hours ago









                            Larry

                            964




                            964






















                                Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.










                                draft saved

                                draft discarded


















                                Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.













                                Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.












                                Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
















                                Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.





                                Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                Please pay close attention to the following guidance:


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Understanding the information contained in the Deep Space Network XML data?

                                Ross-on-Wye

                                Eastern Orthodox Church