Knuth–Morris–Pratt (KMP) Pattern Matching Substring Search - First Occurrence Of Substring
Free 5-Day Mini-Course: https://backtobackswe.com
Try Our Full Platform: https://backtobackswe.com/pricing
📹 Intuitive Video Explanations
🏃 Run Code As You Learn
💾 Save Progress
❓New Unseen Questions
🔎 Get All Solutions
Subscribe To Live Tech Offers: https://offerfeed.io
Join Our Coaching Service: https://backtobackswe.com/coaching
Question: Given a string s and a pattern p, determine if the pattern exists in the string. Return the index of where the first occurrence starts.
Further Explanation From Tuschar Roy: https://www.youtube.com/watch?v=GTJr8OvyEVQ
The Brute Force
The naive approach to solving this is looking in s for the first character in p.
If a match is found we begin a search from that index, call it i (for intersect).
We compare the 2nd character of p with index i + 1 of s
We compare the 3rd character of p with index i + 2 of s
...and so on until we have matched to all of p without either having overrun s or having found a mismatch between characters being compared.
We can then return i as our answer.
It doesn’t work well in cases where we see many matching characters followed by a mismatching character.
Complexities:
Time: O( len(s) * len(p) )
In a simple worst case we can have
s = "aaaaaab"
p = "aaab"
The problem is that for each first character match we have the potential to naively go into a search on a string that would never yield a correct answer repeatedly.
Other Algorithms
There are three linear time string matching algorithms: KMP (nuth–Morris–Pratt), Boyer-Moore, and Rabin-Karp.
Of these, Rabin-Karp is by far the simplest to understand and implement
Analysis
The time complexity of the KMP algorithm is O(len(s) + len(p)) "linear" in the worst case.
The key behind KMP is that it takes advantage of the succesful character checks during an unsuccessful pattern comparison subroutine.
We may have a series of many comparisons that succeed and then even if one fails at the end, we should not repeat the comparison work done since we already saw that a series matched.
What we will do is very similar to the naive algorithm, it is just that we save comparisons by tracking the longest propert prefixes of pattern that are also suffixes.
The key is that everytime we have a mismatch we try our best to prevent going backwards in s and repeating comparisons.
Algorithm Steps
We will preprocess the pattern string and create an array that indicates the longest proper prefix which is also suffix at each point in the pattern string.
A proper prefix does not include the original string.
For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”.
For example, suffixes of "ABC" are, "", "C", "BC", and "ABC". Proper prefixes are "", "C", and "BC".
Why do we care about these??
We know all characters behind our mismatch character match.
If we can find the length of the longest prefix that matches a suffix to that point, we can skip len(prefix) comparisons at the beginning.
The key reason we care about the prefix to suffix is because we want to "teleport" back to as early in the string to the point that we still know that there is a match.
Our goal is to minimize going backwards in our search string.
Complexities:
Time: O( len(p) + len(s) )
We spend len(p) time to build the prefix-suffix table and we spend len(s) time for the traversal on average.
Space: O( len(p) )
Our prefix-suffix table is going to be the length of the pattern string.
++++++++++++++++++++++++++++++++++++++++++++++++++
HackerRank: https://www.youtube.com/channel/UCOf7UPMHBjAavgD0Qw5q5ww
Tuschar Roy: https://www.youtube.com/user/tusharroy2525
GeeksForGeeks: https://www.youtube.com/channel/UC0RhatS1pyxInC00YKjjBqQ
Jarvis Johnson: https://www.youtube.com/user/VSympathyV
Success In Tech: https://www.youtube.com/channel/UC-vYrOAmtrx9sBzJAf3x_xw
++++++++++++++++++++++++++++++++++++++++++++++++++
This question is number 7.13 in "Elements of Programming Interviews" by Adnan Aziz (they do Rabin-Karp but same problem, different algorithm)
Видео Knuth–Morris–Pratt (KMP) Pattern Matching Substring Search - First Occurrence Of Substring канала Back To Back SWE
Try Our Full Platform: https://backtobackswe.com/pricing
📹 Intuitive Video Explanations
🏃 Run Code As You Learn
💾 Save Progress
❓New Unseen Questions
🔎 Get All Solutions
Subscribe To Live Tech Offers: https://offerfeed.io
Join Our Coaching Service: https://backtobackswe.com/coaching
Question: Given a string s and a pattern p, determine if the pattern exists in the string. Return the index of where the first occurrence starts.
Further Explanation From Tuschar Roy: https://www.youtube.com/watch?v=GTJr8OvyEVQ
The Brute Force
The naive approach to solving this is looking in s for the first character in p.
If a match is found we begin a search from that index, call it i (for intersect).
We compare the 2nd character of p with index i + 1 of s
We compare the 3rd character of p with index i + 2 of s
...and so on until we have matched to all of p without either having overrun s or having found a mismatch between characters being compared.
We can then return i as our answer.
It doesn’t work well in cases where we see many matching characters followed by a mismatching character.
Complexities:
Time: O( len(s) * len(p) )
In a simple worst case we can have
s = "aaaaaab"
p = "aaab"
The problem is that for each first character match we have the potential to naively go into a search on a string that would never yield a correct answer repeatedly.
Other Algorithms
There are three linear time string matching algorithms: KMP (nuth–Morris–Pratt), Boyer-Moore, and Rabin-Karp.
Of these, Rabin-Karp is by far the simplest to understand and implement
Analysis
The time complexity of the KMP algorithm is O(len(s) + len(p)) "linear" in the worst case.
The key behind KMP is that it takes advantage of the succesful character checks during an unsuccessful pattern comparison subroutine.
We may have a series of many comparisons that succeed and then even if one fails at the end, we should not repeat the comparison work done since we already saw that a series matched.
What we will do is very similar to the naive algorithm, it is just that we save comparisons by tracking the longest propert prefixes of pattern that are also suffixes.
The key is that everytime we have a mismatch we try our best to prevent going backwards in s and repeating comparisons.
Algorithm Steps
We will preprocess the pattern string and create an array that indicates the longest proper prefix which is also suffix at each point in the pattern string.
A proper prefix does not include the original string.
For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”.
For example, suffixes of "ABC" are, "", "C", "BC", and "ABC". Proper prefixes are "", "C", and "BC".
Why do we care about these??
We know all characters behind our mismatch character match.
If we can find the length of the longest prefix that matches a suffix to that point, we can skip len(prefix) comparisons at the beginning.
The key reason we care about the prefix to suffix is because we want to "teleport" back to as early in the string to the point that we still know that there is a match.
Our goal is to minimize going backwards in our search string.
Complexities:
Time: O( len(p) + len(s) )
We spend len(p) time to build the prefix-suffix table and we spend len(s) time for the traversal on average.
Space: O( len(p) )
Our prefix-suffix table is going to be the length of the pattern string.
++++++++++++++++++++++++++++++++++++++++++++++++++
HackerRank: https://www.youtube.com/channel/UCOf7UPMHBjAavgD0Qw5q5ww
Tuschar Roy: https://www.youtube.com/user/tusharroy2525
GeeksForGeeks: https://www.youtube.com/channel/UC0RhatS1pyxInC00YKjjBqQ
Jarvis Johnson: https://www.youtube.com/user/VSympathyV
Success In Tech: https://www.youtube.com/channel/UC-vYrOAmtrx9sBzJAf3x_xw
++++++++++++++++++++++++++++++++++++++++++++++++++
This question is number 7.13 in "Elements of Programming Interviews" by Adnan Aziz (they do Rabin-Karp but same problem, different algorithm)
Видео Knuth–Morris–Pratt (KMP) Pattern Matching Substring Search - First Occurrence Of Substring канала Back To Back SWE
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![9.1 Knuth-Morris-Pratt KMP String Matching Algorithm](https://i.ytimg.com/vi/V5-7GzOfADQ/default.jpg)
![](https://i.ytimg.com/vi/7yC1aN87Ydo/default.jpg)
![Knuth–Morris–Pratt(KMP) Pattern Matching(Substring search)](https://i.ytimg.com/vi/GTJr8OvyEVQ/default.jpg)
![Minimum Window Substring: Utilizing Two Pointers & Tracking Character Mappings With A Hashtable](https://i.ytimg.com/vi/eS6PZLjoaq8/default.jpg)
![The 5 String Interview Patterns You Need to Know](https://i.ytimg.com/vi/9clnwaqOU2E/default.jpg)
![Add Two Numbers Without The "+" Sign (Bit Shifting Basics)](https://i.ytimg.com/vi/qq64FrA2UXQ/default.jpg)
![All Nodes Distance K In A Binary Tree - Performing Bidirectional Search On A Tree Using A Hashtable](https://i.ytimg.com/vi/nPtARJ2cYrg/default.jpg)
![Knuth-Morris-Pratt (KMP) algorithm | String Matching Algorithm | Substring Search](https://i.ytimg.com/vi/4jY57Ehc14Y/default.jpg)
![Total Occurrences Of K In A Sorted Array (Facebook Software Engineering Interview Question)](https://i.ytimg.com/vi/RlXtTF34nnE/default.jpg)
![Sliding Window Technique - Algorithmic Mental Models](https://i.ytimg.com/vi/MK-NZ4hN7rs/default.jpg)
![Max Contiguous Subarray Sum - Cubic Time To Kadane's Algorithm ("Maximum Subarray" on LeetCode)](https://i.ytimg.com/vi/2MmGzdiKR9Y/default.jpg)
![Search A 2D Sorted Matrix - Fundamentals of Search Space Reduction](https://i.ytimg.com/vi/FOa55B9Ikfg/default.jpg)
![This Makes Him Fight For You & Not Take You For Granted](https://i.ytimg.com/vi/aZJE0NrE7n0/default.jpg)
![What Is Asymptotic Analysis? And Why Does It Matter? A Deeper Understanding of Asymptotic Bounding.](https://i.ytimg.com/vi/myZKhztFhzE/default.jpg)
![7 things I wish I knew at 20](https://i.ytimg.com/vi/lUqIPo4DDHE/default.jpg)
![Longest Common Subsequence (2 Strings) - Dynamic Programming & Competing Subproblems](https://i.ytimg.com/vi/ASoaQq66foQ/default.jpg)
![KMP字符串匹配算法1](https://i.ytimg.com/vi/dgPabAsTFa8/default.jpg)
![Serialize & Deserialize A Binary Tree - Crafting Recursive Solutions To Interview Problems](https://i.ytimg.com/vi/suj1ro8TIVY/default.jpg)
![Knuth Morris Pratt (KMP) String Search Algorithm - tutorial with failure function in Java](https://i.ytimg.com/vi/EL4ZbRF587g/default.jpg)
![Edit Distance Between 2 Strings - The Levenshtein Distance ("Edit Distance" on LeetCode)](https://i.ytimg.com/vi/MiqoA-yF-0M/default.jpg)