Apache Pig - LAST_INDEX_OF()
The LAST_INDEX_OF() function accepts a string value and a character. It returns the last occurrence of the given character in the string, searching backward from the end of the string.
Syntax
Given below is the syntax of the LAST_INDEX_OF() function
grunt> LAST_INDEX_OF(string, 'character')
Example
Assume that there is a file named emp.txt in the HDFS directory /pig_data/ as shown below. This file contains the employee details such as id, name, age, and city.
emp.txt
001,Robin,22,newyork 002,BOB,23,Kolkata 003,Maya,23,Tokyo 004,Sara,25,London 005,David,23,Bhuwaneshwar 006,Maggy,22,Chennai 007,Robert,22,newyork 008,Syam,23,Kolkata 009,Mary,25,Tokyo 010,Saran,25,London 011,Stacy,25,Bhuwaneshwar 012,Kelly,22,Chennai
And, we have loaded this file into Pig with a relation named emp_data as shown below.
grunt> emp_data = LOAD 'hdfs://localhost:9000/pig_data/emp.txt' USING PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
Given below is an example of the LAST_INDEX_OF() function. In this example, we are going to find the occurrence of the letter 'g' from the end, in the names of every employee.
grunt> last_index_data = FOREACH emp_data GENERATE (id,name), LAST_INDEX_OF(name, 'g');
The above statement parses the name of each employee from the end and returns the index value at which the letter ‘g’ occurred for the first time. If the name doesn’t contain the letter ‘g’ it returns the value −1
The result of the statement will be stored in the relation named last_index_data. Verify the content of the relation last_index_data using the Dump operator as shown below.
grunt> Dump last_index_data; ((1,Robin),-1) ((2,BOB),-1) ((3,Maya),-1) ((4,Sara),-1) ((5,David),-1) ((6,Maggy),3) ((7,Robert),-1) ((8,Syam),-1) ((9,Mary),-1) ((10,Saran),-1) ((11,Stacy),-1) ((12,Kelly),-1)