Apache Pig - REPLACE()

Syntax

Given below is the syntax of the REPLACE() function. This function accepts three parameters, namely,

string − The string that is to be replaced. If we want to replace the string within a relation, we have to pass the column name the string belongs to.
regEXP − Here we have to pass the string/regular expression we want to replace.
newChar − Here we have to pass the new value of the string.

grunt> REPLACE(string, 'regExp', 'newChar');

Example

Assume that there is a file named emp.txt in the HDFS directory /pig_data/ as shown below. This file contains the employee details such as id, name, age, and city.

emp.txt

001,Robin,22,newyork
002,BOB,23,Kolkata
003,Maya,23,Tokyo
004,Sara,25,London 
005,David,23,Bhuwaneshwar 
006,Maggy,22,Chennai
007,Robert,22,newyork 
008,Syam,23,Kolkata
009,Mary,25,Tokyo 
010,Saran,25,London 
011,Stacy,25,Bhuwaneshwar 
012,Kelly,22,Chennai

And, we have loaded this file into Pig with a relation named emp_data as shown below.

grunt> emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp1.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);

Following is an example of the REPLACE() function. In this example, we have replaced the name of the city Bhuwaneshwar with a shorter form Bhuw.

grunt> replace_data = FOREACH emp_data GENERATE (id,city),REPLACE(city,'Bhuwaneshwar','Bhuw');

The above statement replaces the string 'Bhuwaneshwar' with 'Bhuw' in the column named city in the emp_data relation and returns the result. This result is stored in the relation named replace_data. Verify the content of the relation replace_data using the Dump operator as shown below.

grunt> Dump replace_data;
 
((1,newyork),newyork)
((2,Kolkata),Kolkata)
((3,Tokyo),Tokyo)
((4,London),London) 
((5,Bhuwaneshwar),Bhuw)
((6,Chennai),Chennai)
((7,newyork),newyork) 
((8,Kolkata),Kolkata)
((9,Tokyo),Tokyo) 
((10,London),London) 
((11,Bhuwaneshwar),Bhuw) 
((12,Chennai),Chennai)

apache_pig_string_functions.htm