Class JaroWinklerSimilarity

  • All Implemented Interfaces:
    SimilarityScore<java.lang.Double>

    public class JaroWinklerSimilarity
    extends java.lang.Object
    implements SimilarityScore<java.lang.Double>
    A similarity algorithm indicating the percentage of matched characters between two character sequences.

    The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.

    This implementation is based on the Jaro Winkler similarity algorithm from http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.

    This code has been adapted from Apache Commons Lang 3.3.

    Since:
    1.7
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.Double apply​(java.lang.CharSequence left, java.lang.CharSequence right)
      Computes the Jaro Winkler Similarity between two character sequences.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • JaroWinklerSimilarity

        public JaroWinklerSimilarity()
    • Method Detail

      • apply

        public java.lang.Double apply​(java.lang.CharSequence left,
                                      java.lang.CharSequence right)
        Computes the Jaro Winkler Similarity between two character sequences.
         sim.apply(null, null)          = IllegalArgumentException
         sim.apply("foo", null)         = IllegalArgumentException
         sim.apply(null, "foo")         = IllegalArgumentException
         sim.apply("", "")              = 1.0
         sim.apply("foo", "foo")        = 1.0
         sim.apply("foo", "foo ")       = 0.94
         sim.apply("foo", "foo  ")      = 0.91
         sim.apply("foo", " foo ")      = 0.87
         sim.apply("foo", "  foo")      = 0.51
         sim.apply("", "a")             = 0.0
         sim.apply("aaapppp", "")       = 0.0
         sim.apply("frog", "fog")       = 0.93
         sim.apply("fly", "ant")        = 0.0
         sim.apply("elephant", "hippo") = 0.44
         sim.apply("hippo", "elephant") = 0.44
         sim.apply("hippo", "zzzzzzzz") = 0.0
         sim.apply("hello", "hallo")    = 0.88
         sim.apply("ABC Corporation", "ABC Corp") = 0.91
         sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95
         sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92
         sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88
         
        Specified by:
        apply in interface SimilarityScore<java.lang.Double>
        Parameters:
        left - the first CharSequence, must not be null
        right - the second CharSequence, must not be null
        Returns:
        result similarity
        Throws:
        java.lang.IllegalArgumentException - if either CharSequence input is null