Home java How do I split a String into individual words in Java?

How do I split a String into individual words in Java?




Hello everyone! I have such a task, I need to split a string into words, write it to an array and then compare each element with each in the array (i.e., each word with each) delete those that match, I recently read about equals () which does a great job of comparing strings, but for some reason It doesn’t work on an array. Java started learning recently, so don’t judge strictly by the code, thank you all!

public static void main (String [] args) {
  String b = "Hello Hello Hello";
  String s [] = b.split ("");
  int i;
  for (i = 0; i & lt; s.length; i ++) {
    if (s [i] .equals (s [i + 1])) {
      System.out.println (s [i]);

Answer 1, authority 100%

If you take advantage of the Stream API, you can solve the problem even easier:

String s = ...
Stream.of (s.split ("[^ A-Za-zA-Za-z] +"))
  .map (String :: toLowerCase)
  .distinct (). sorted ()
  .forEach (System.out :: println);

Answer 2, authority 50%

Your version does not take into account many spaces. Here you need to use a regular expression. Extract all words and then put them in SortedSet . Why in SortedSet ? Firstly, it does not allow duplication, and secondly, it will sort all words in ascending order, which makes it easier to check.

import java.util.SortedSet;
import java.util.TreeSet;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class WordParser {
  private static final String EXAMPLE_TEST =
      "There is grass in the yard, firewood on the grass. Don't chop wood on the grass in the yard!";
  public static void main (String [] args) {
    Pattern pattern =
        Pattern.compile ("\\ w +", Pattern.UNICODE_CHARACTER_CLASS
            | Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher (EXAMPLE_TEST);
    SortedSet & lt; String & gt; words = new TreeSet & lt; & gt; ();
    while (matcher.find ())
      words.add (matcher.group (). toLowerCase ());
    for (String word: words)
      System.out.println ("word =" + word);

“\ w +” – the modifier finds only words, that is, excludes characters, etc.

Pattern.UNICODE_CHARACTER_CLASS – sets the Unicode flag so that you can search in any encoding. (To be honest, I don’t know how things are with the search for words in Asian languages ​​like Chinese, Korean, Japanese, etc.)

This is what this class prints upon startup:

word = yard
word = yard
word = firewood
word = to
word = not
word = ruby ​​
word = grass
word = grass

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions