Class LongestCommonSubsequence

  • All Implemented Interfaces:
    SimilarityScore<java.lang.Integer>

    public class LongestCommonSubsequence
    extends java.lang.Object
    implements SimilarityScore<java.lang.Integer>
    A similarity algorithm indicating the length of the longest common subsequence between two strings.

    The Longest common subsequence algorithm returns the length of the longest subsequence that two strings have in common. Two strings that are entirely different, return a value of 0, and two strings that return a value of the commonly shared length implies that the strings are completely the same in value and position. Note. Generally this algorithm is fairly inefficient, as for length m, n of the input CharSequence's left and right respectively, the runtime of the algorithm is O(m*n).

    This implementation is based on the Longest Commons Substring algorithm from https://en.wikipedia.org/wiki/Longest_common_subsequence_problem.

    For further reading see:

    Lothaire, M. Applied combinatorics on words. New York: Cambridge U Press, 2005. 12-13

    Since:
    1.0
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      java.lang.Integer apply​(java.lang.CharSequence left, java.lang.CharSequence right)
      Calculates longest common subsequence similarity score of two CharSequence's passed as input.
      java.lang.CharSequence logestCommonSubsequence​(java.lang.CharSequence left, java.lang.CharSequence right)
      Deprecated.
      Deprecated as of 1.2 due to a typo in the method name.
      java.lang.CharSequence longestCommonSubsequence​(java.lang.CharSequence left, java.lang.CharSequence right)
      Computes the longest common subsequence between the two CharSequence's passed as input.
      int[][] longestCommonSubstringLengthArray​(java.lang.CharSequence left, java.lang.CharSequence right)
      Computes the lcsLengthArray for the sake of doing the actual lcs calculation.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • LongestCommonSubsequence

        public LongestCommonSubsequence()
    • Method Detail

      • apply

        public java.lang.Integer apply​(java.lang.CharSequence left,
                                       java.lang.CharSequence right)
        Calculates longest common subsequence similarity score of two CharSequence's passed as input.
        Specified by:
        apply in interface SimilarityScore<java.lang.Integer>
        Parameters:
        left - first character sequence
        right - second character sequence
        Returns:
        longestCommonSubsequenceLength
        Throws:
        java.lang.IllegalArgumentException - if either String input null
      • logestCommonSubsequence

        @Deprecated
        public java.lang.CharSequence logestCommonSubsequence​(java.lang.CharSequence left,
                                                              java.lang.CharSequence right)
        Deprecated.
        Deprecated as of 1.2 due to a typo in the method name. Use longestCommonSubsequence(CharSequence, CharSequence) instead. This method will be removed in 2.0.
        Computes the longest common subsequence between the two CharSequence's passed as input.

        Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs and xyzghfm have both the same common substring and subsequence, namely xyz. However, axbyczqrs and abcxyzqtv have the longest common subsequence xyzq because a subsequence need not have adjacent characters.

        For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

        Parameters:
        left - first character sequence
        right - second character sequence
        Returns:
        The longest common subsequence found
        Throws:
        java.lang.IllegalArgumentException - if either String input null
      • longestCommonSubsequence

        public java.lang.CharSequence longestCommonSubsequence​(java.lang.CharSequence left,
                                                               java.lang.CharSequence right)
        Computes the longest common subsequence between the two CharSequence's passed as input.

        Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs and xyzghfm have both the same common substring and subsequence, namely xyz. However, axbyczqrs and abcxyzqtv have the longest common subsequence xyzq because a subsequence need not have adjacent characters.

        For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

        Parameters:
        left - first character sequence
        right - second character sequence
        Returns:
        The longest common subsequence found
        Throws:
        java.lang.IllegalArgumentException - if either String input null
        Since:
        1.2
      • longestCommonSubstringLengthArray

        public int[][] longestCommonSubstringLengthArray​(java.lang.CharSequence left,
                                                         java.lang.CharSequence right)
        Computes the lcsLengthArray for the sake of doing the actual lcs calculation. This is the dynamic programming portion of the algorithm, and is the reason for the runtime complexity being O(m*n), where m=left.length() and n=right.length().
        Parameters:
        left - first character sequence
        right - second character sequence
        Returns:
        lcsLengthArray