1. Introduction
In today's article, You will learn What is the "Run Length Encoding" technique which is heavily used in string data compression. This is an experienced java interview question for 4-6 years developers. Earlier days this is used to compress the black and white photos.
Run Length Encoding is abbreviated as "RLE".
Run-length encoding (RLE) is a form of lossless data compression in which runs/flows of data are stored as a single data value and count, rather than as the original run.
This works for only sequences in which the same data value occurs in many consecutive data elements.
Examples:
Input: aaabbccccd
Output:a3b2c4d1
Input: xxxyyziiii
Output: x3y2z1i4
Input: kkkkkkiiiiiiuuuuasdfgllll
Output: k6i6u4a1s1d1f1l4
All of the above examples are valid. If the string "aaabba" is invalid because character 'a' is repeated in non-consecutive which is appeared after character 'b'. If the string is invalid then it may produce unexpected results. Please keep in mind this note before implementing it in real-time applications.
2. Algorithm
Now it's time to think about the algorithm that produces the desired output.
Follow these steps to learn and understand the algorithm.
A) Create the output string which is empty and count variable which value is 0 for now.
B) Take the first character and compare it with the next character. If matches, increment count by 1. For each successful match, the count will be incremented by 1. Then take the next character and do compare until it finds no match.
C) Once you find the no match, add the current character and count to the output string.
D) Take the next unmatched character and repeat steps B and C.
F) After completing all characters finally, the output string will have the compressed string.
3. Java Program on String Compression using Run Length Encoding
package com.java.w3schools.blog.java.programs;
public class StringCompression {
public static void main(String[] args) {
String inputString = "aaabbccccd";
String outputString = "";
int count = 1;
for (int i = 0; i < inputString.length(); i++) {
// resetting to 1 for every new character (counting current character).
count = 1;
while (i < inputString.length() - 1 && inputString.charAt(i) == inputString.charAt(i + 1)) {
count++;
i++;
}
outputString = outputString + inputString.charAt(i) + count;
}
System.out.println("Input data string : " + inputString);
System.out.println("Output data string after applying data compression technique : " + outputString);
}
}
Output:
Input data string : aaabbccccd
Output data string after applying data compression technique : a3b2c4d1
This works well for longer inputs.
Input data string : jjjjjjjjjdddddddddiiiuusskkpppuuutttrewqnbhyj
Output data string after applying data compression technique : j9d9i3u2s2k2p3u3t3r1e1w1q1n1b1h1y1j1
4. Time Complexity
This program runs a loop from index 0 to length on the input string. You have seen that the above program has one for loop and nested while loop. Looks time complexity O(n2) but it is not. Actually, it is O(n) only. The reason behind this is the outer loop run for unique character and inner loop for repeated characters.
So in our example, input string "aaabbccccd" outer for loop runs for the character a,b,c,d and inner while loop runs for only repeated characters such as "aa b ccc".
Outer loop: 4 times
Inner loop: 6 times
Input String length: 10
input length = outer loop length + inner loop length. So it is O(n).
5. Conclusion
In this article, you have learned how to use the "Run Length Encoding" compression technique on a string. But this compression is applicable only for the inputs which as many consecutive data elements.
Share this article with your friends and post your questions to me.
0 Comments