Monday, April 17, 2023

String.replace and String.replaceAll are totally different beasts - and named quite badly

We start out with some innocent code that will replace a variable with a certain value:

  public static void main(String[] args) {

  String quote = "$villain$ has an eye.";

        String villain = "Sauron";

  System.out.println(quote.replace("$villain$", villain));

  }

We get this output:

Sauron has an eye.

All is well until we remember that there was also someone called Saurons' Mouth, so we change our quote to:

        String quote = "$villain$ has an eye and $villain$ has a mouth.";

And since we now have multiple occurences of $villain$ we'll also use replaceAll instead of replace, right? Like this:

    public static void main(String[] args) {

        String quote = "$villain$ has an eye and $villain$ has a mouth.";

        String villain = "Sauron";

        System.out.println(quote.replaceAll("$villain$", villain));

    }

Surprisingly, we get this output:

$villain$ has an eye and $villain$ has a mouth.

This is because replaceAll expects a regularExpression as its first parameter and the dollar sign has special significance in regular expressions (denoting "end-of-line"). Even worse, though the presence of two methods named replace / replaceAll suggests that the former will only replace the first occurrence this is not the case, this is not the case. So the correct solution is to indeed still use replace:

    public static void main(String[] args) {
        String quote = "$villain$ has an eye and $villain$ has a mouth.";
        String villain = "Sauron";
        System.out.println(quote.replace("$villain$", villain));
    }

Takeaways:
  • This is a case of bad API design in Javas' String class. 
  • Always check, if parameters to String methods are plain strings or interpreted as regular expressions. 
    • There is also a possible performance penalty when using regular expressions