Tuy nhiên, kỹ thuật giấu này tạo ra một sự phân bố bất thường trên biểu đồ tần suất RL sau khi được giấu thông tin. Trong bài báo này đưa ra kỹ thuật phát hiện ảnh nhị có giấu tin sử dụng kỹ thuật giấu RL này.
Dưới đây là bài báo khoa học “Phát hiện ảnh nhị phân có giấu tin sử dụng kỹ thuật giấu Run - Length” của ThS. Hồ Thị Hương Thơm - Khoa Công Nghệ Thông Tin, Đại học Dân lập Hải Phòng
In 2008, Guorong Xuan et al proposed a reversible binary image data hiding scheme using run-length (RL) histogram modification at the 19th International Conference on Pattern Recognition (ICPR 2008) . The binary image is scanned from left to right and from top to bottom to create a sequence of black RL and white RL. Concatenating one black RL and its next white RL to form one RL couple, thus generating a sequence of RL couples. Using two thresholds T1 and T to embed secret message bits: the threshold T1 indicates only RL couples which their length is not shorter than T1 for embedding; the threshold T indicates where to embed data in black RL histogram. The capacity of payload of the scheme depends on the selected thresholds T and T1. However, the technique creates an informal distribution in the black RL histogram when the secret massage is hidden in a cover binary image. In this paper, we propose the steganalytic schemes to detect stego-image and estimate length of the embedded secret message approximately for the steganography.
Recently, reversible data hiding schemes have been interested. Reversible data hiding, or so-called lossless data hiding, invertible data hiding, is a kind of fragile watermarking; this enables exact recovery of the original image from the watermarked image after secret message removal. There are such many schemes proposed for both color images and grayscale images [1, 2, 3, 4] and for binary images such as [5, 6, 7].
In 2008, Xuan et al introduced an approach to reversible binary data hiding that can be applied to all types of binary images: text, graph, and their mixture, halftone and non-halftone using run-length histogram modification (we denote RL scheme). In this approach, the cover binary image is scanned, RLs of the image are derived; the RL histogram is modified to embed message bits reversibly. In other words, only the black RL histogram is applied to embed message. The isolated white pixels in the cover image, that may break reversibility, is changed suitably.
However, this scheme forms an informal distribution in the black RL histogram of the cover binary image after embedding secret message bits. Thus, we can easily detect a stego binary image using the RL scheme based on analyzing difference of RL histogram between a binary cover image and a binary stego image.
In this paper, we present a steganalytic technique for the RL scheme. In addition, we also give an expression to estimate length of embedded message bits approximately. The rest of this paper is organized as follows. In Section 2, we describe again the details of Xuan’s steganography. We present our proposed steganalytic methods in Section 3. Our experimental results and conclusion are given in Section 4 and 5, respectively.
2. The RL coding
In this subsection, we give summary of Xuan and his colleague’s binary image data hiding scheme which was proposed in 2008 .
2.1. Data embedding algorithm
Assume there are M bits which are supposed to be embedded into black RL histogram. The two threshold T1 and T are known, in , they chose T1=3, T=1. There, RL stands for run-length, WRL for white RL, and BRL for black RL. Embedding the data following the steps:
(1). The binary image is scanned from left to right and from top to bottom to create a sequence of black RL and white RL. Combining one black RL and its immediate next white RL to form one RL couple. Thus, generating a sequence of RL couples.
(2). Checking all RL couples, only RL couples will be used for embedding if the sum of BRL and WRL within the RL couple is equal to or greater than T1. Eliminating all RL couple with isolated WRL (length of WRL equal to 1) by increasing WRL+1 and BRL – 1. The change needs to be recorded as bookkeeping for future original image recovery. Then, change all RL couple with BRL ≥ T to form a BRL pair [h(T), h(T+1)=0], this make histogram of BRL of T+1 is empty.
(3). with the histogram – pair created, it is now ready for embedding data M. We scan the binary image for all RL couples with their BRL equaling T and their length greater than or equal to T1, when we meet the first RL couple satisfying, examining the to – be – embedded bit from M, if it is ‘1’, we change the pixel at location T+1 of the BRL from ‘0’ to ‘1’ in order to change length of the BRL from T to T+1, if the to-be-embedded bit is ‘0’, we do not change anything.
(4). If all the data are embedded then stop here, we obtain the stego binary image. Otherwise, we go to next RL couple to embed the remaining to-be-embedded data.
2.2. Data extracting algorithm
Data extraction is the reverse process of data embedding. Assume T1 equals to 3. The data extraction process is performed as follows:
(1) Using the same sequencing as used in data embedding to check each pixel.
(2) Scanning all RL couples with their length greater than or equal to T1, when a BRL of 1 is encountered, a bit 0 is extracted, when a BRL of 2 is met, a bit 1 is extracted and the BRL of 2 is changed back to BRL of 1 to recover original BRL of cover image.
(3) Using bookkeeping data for recovering changed pixels in step 2 of the data embedding process to recover isolated white pixels.
The block diagram of the RL scheme is shown in Fig. 1. As discussed in  T1 and T parameters can be selected via computational search for the maximum embedding capacity.
Fig. 1. The flowchart of the RL encoding scheme .
3. Proposed steganalytic method
After embedding a same message into a set of original binary image using RL scheme to get a set of stego image, we calculate again the histogram of BRLs for each image in the original binary image set and the stego binary image set, we found out that the RL method changed natural of the BRL histogram of typical binary image significantly as the following example:
We also use a binary Baboon image of 512 x 512 pixels (Fig.2. (a)) like the example in  to illustrate. We use two message sequences of 128 bits and 7696 bits to embed into the Baboon image, we obtained two stego images, the stego 128 Baboon image (Fig. 2. (b)) and the stego 7696 Baboon image (Fig. 2. (c)), respectively.
Fig. 2. Binary Baboon image of 512 x 512 pixels: (a) Original, (b) 128 bits embedded, (c) 7696 bits embedded
We build the BRL histograms of the three images, resulting are shown in Fig. 3. From Fig. 3. (a), we found out that the BRL histogram of 1 (denote h1) greater than of 2 (denote h2) and the BRL histogram of 2 greater than of 3 (denote h3) and so on (h3>h4>h5…). Opposite, from Fig. 3. (b) The BRL histogram of the stego 128 Baboon image and of the stego 7696 Baboon image (Fig. 3. (c) ) we see h1>h2, h2<h3,>
According to the RL algorithm, the column of h2 of the original Baboon image will be empty prior to embed message bits. After embedding data into the Baboon image, a part of h1 is changed to h2 based on above step 2 of RL’s algorithm.
Based on this observation, the following rule is designed to discriminate a stego binary image of the RL encoding scheme from a nature binary image:
In Equation (1), an image I is detected to be watermarked by the RL scheme if the measured value h2 ≤ h3. Otherwise, the checked image is an original image.
Fig. 3. The BRL histogram of Baboon images
We assume that the message bits are randomly distributed, then the expected probability of bit 0 and the probability bit 1 are the same, i.e. Pmessage_bit(0)»Pmessage_bit(1)=0.5. Therefore, embedding 128 (excluding bookkeeping data) message bits into Baboon image, the RL embedding process had to change about 69 BRLs of 1 to BRLs of 2 to embed bits 1. A part of BRL of 1 did not change to embed bits 0 by default. From Fig.3. (b) and (c), we can estimate length of embedded bits (denote Lembedded_bits) based on h2 as the following expression:
Lembedded_bits » (h2 *2) (2)
Appling to the stego 128 Baboon image, we get Lembedded_bits = 138 (including bookkeeping data), and to the stego 7696 Baboon image, we get Lembedded_bits = 9187 (including bookkeeping data).
In general case, user choose another T1 and T value, (1) and (2) will not suite, assume that user choose T1=4, T=2, that is message bits will be embedded into BRLs of 2 and BRLs of 3, BRLs of 3 will be empty prior to embed message. In this case, the BRL histogram of 3 will smaller than of 4 (h3 < h4). If T1=5 and T=3, then h4<h5, so on. Generally, we have the following general equation:
Where i=2 … , d, i is length of each BRL, d is a threshold, d should be selected smaller than 10, because user don’t usually choose T1 so great, that will embed few message bits. The detection process will check BRL histogram from i=2 to d. If find value i that make S(I)=true, the process will stop immediately, estimating length of message bits based on the following equation:
Lembedded_bits= hi*2 (4)
Other wise, if S(I)=false with all i, then Lembedded_bits will equal to zero.
4. Experimental Results
To show the feasibility of the proposed method, we take 700 images from content based image retrieval (CBIR) image database  and USC-SIPI Image Database . We then transform 700 into binary bitmap format by applying im2bw in Matlap to get a new image set (denote the cover-image set). Next, we embed a sequence of message with the various length (its max length is not greater than 2352 bits including bookkeeping data) into all image of the cover-image set with T=1 and T2=3 to obtain the stego-image set. The hidden messages used in our tests are produced by the pseudo random number generator.
Fig. 5. Detecting stego images on the cover- image set, obtaining 5 stego images and 695 cover images
Fig. 6. Detecting stego images on the stego – image set, obtaining 699 stego-images and 1 cover-image
We use our detection for the cover-image set to classify stego image based on (1) and (2), the program detects 5 stego-images (0,71%) and 695 cover-images (99,29%), resulting is shown in Fig. 5., for the stego-image set, the program detects 699 stego-images (99,86%) and 1 cover-images (0.14%), resulting is shown in Fig. 6.There the horizontal axis represents image number # and the vertical axis represents the estimated data length corresponding image number #.
Fig. 7. Detecting stego images on the im2bw – binary set, obtaining 10 stego-images and 40 cover-images
Fig. 8. Detecting stego images on the stego - im2bw - set, obtaining 50 stego-images and 0 cover-image
Fig. 9. Detecting stego images on the dither – binary set, obtaining 2 stego-images and 48 cover-images
Fig. 10. Detecting stego images on the stego - dither set, obtaining 50 stego-images and 0 cover-image
Another experimentation, we take 50 Mickey images of different small sizes from  and other websites. We then perform two transformations by applying im2bw and dither function in Matlap to get two binary image sets (denote im2bw-binary set and dither-binary set, each set has 50 binary image). We embed a sequence of bits with length not over 128 bits into two sets to get two new stego sets, denote stego-im2bw set and stego-dither set. We use our detection for four sets to classify stego images and cover images, Fig. 7. shows result for detection on the im2bw-binary set with 10 stego-images and 40 cover-images. Fig. 8. shows for the dither-binary set, 50 stego-images and zero cover-image. Fig. 9. illustrates for the stego-im2bw set, 2 stego-images and 48 cover-images. Fig. 10. depicts for the stego-dither set, detect 50 stego-images and zero cover-image.
We found out that, it’s very hard to classify a stego image for small images, because their black RL histogram is irregular, for example, Mickey image has size of 274x312 pixels (show Fig. 4. (a)), Fig.4. (b) is BRL histogram of the Mickey image. Thus, our above proposed method detect incorrectly.
Fig. 4. Tested cover Mickey image
However, we have another way to detect for these images. We embed a message bit sequence into the cover Mickey image with T1=3, T=1 to get the stego Mickey image. The length of message equals to the capacity of the cover Mickey image. We check the number of RL couple of ‘100’ (including RL couples with length of BRL equal 1 and length of WRL greater than 1, for example ‘100’, ‘1000’, ‘10000’, etc) and of ‘110’ (including RL couples with length of BRL equals 2 and length of WRL equal to and greater than 1, for example ‘1100’, ‘11000’, etc) in the stego Mickey image, their value are very close together. While these value in the cover Mickey image are very different. As above saying, the message bits are randomly distributed, the probability of bit 0 and the probability bit 1 are the same, i.e. Pmessage_bit(0)»Pmessage_bit(1)=0.5. That makes the number of ‘100’ equal to the number of ‘110’. Thus, we detect a stego image following this factor, i.e. if the number of RL couple including ‘100’ is close to the number of RL couple including ‘110’, otherwise, this image is a cover image. In general, we don’t know value of T and T1, we can check some of RL couple pairs, such as, RL couple pair including ‘100’ and ‘110’, ‘1100’ and ’1110’,…
In case, the length of message bit sequence is less than capacity of image, we can depend on another factor, which is a binary image is scanned from left to right and from top to bottom to create a sequence of black RL and white RL for embedding. Therefore, we scan a binary image following this order to test with ratios from 10% to 100% of size of image. However, a very small image has deviation of RL couple pairs is not great. So, it’s very hard to detect correctly.
This paper proposes a new steganalytic algorithm, which bases on histogram of black RL. Experimental results show that our methods are reliable. However, it is hard to detect stego-image correctly with small size. We acknowledge that there are many elements in our algorithms that can be changed or replaced with other elements.
. Zhicheng Ni, Yun-Qing Shi, Nirwan Ansari, and Wei Su, “Reversible Data Hiding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, No.3, pp. 354-362, 2006.
. Wen-Chung Kuo, Yan-Hung Lin, “On the Security of Reversible Data Hiding Based-on Histogram Shift”, ICICIC 2008, ISSN/ISBN 9780-769531618, pp. 174-177, 2008.
. Wen-Chung Kuo, Dong-Jin Jiang, Yu-Chih Huang, “A Reversible Data Hiding Scheme Based on Block Division”, Vol. 1, CISP., pp. 365-369, 2008.
. Chia-Chen Lin, Wei-Liang Tai, Chin-Chen Chang, Multilevel reversible data hiding based on histogram modification of difference images, Pattern Recognition 41 (2008), Science Direct Journal, pp 3582-3591.
. S. V. D. Pamboukian and H. Y. Kim, “Reversible data hiding and reversible authentication watermarking for binary images,” in VI Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais (SBSeg), Santos, SP, Brasil, 2006.
. C. C. Wang, C. C. Chang, X. Zhang, and J. K Jan. Senary Huffman Compression - A Reversible Data Hiding Scheme for Binary Images. International Workshop on Multimedia Content Analysis and Mining (MCAM07), Wei-Hai, China. pp. 351-360, Jun., 2007.
. Guorong Xuan, Yun Q. Shi, Peiqi Chai, Xuefeng Tong, Jianzhong Teng, Jue Li, "Reversible Binary Image Data Hiding By Run-Length Histogram Modification", The 19th International Conference on Pattern Recognition (ICPR 2008), December 8-11, 2008, Tampa, Florida, USA.
. USC-SIPI Image Database, “http://sipi.usc.edu/services/database/Database.htm”
. CBIR Image Database, University of Washington,http://www.cs.washington.edu/research/imagedatabase/groundtruth/..
.Mickey image, http://vector-cartoons.com.