Cloud for Beginners

I generally write tutorials related to cloud and Hadoop in our private wiki. I realized that some of those tutorials might be also helpful to others, specially for someone who wants to get their hands dirty in this domain. So my primary intention here is to write on those things which will help someone at the very beginning and will show a path to move forward. If you are already an expert in this field, this may not be the right place for you. This blog is only for beginners. Happy Clouding!

Thursday, February 21, 2013

Sqoop import/export from/to Oracle

I love Sqoop! It's a fun tool to work with and it's very powerful. Importing and exporting data from/to Oracle by Sqoop is pretty straightforward. One crucial thing you need to remember when working with Sqoop and Oracle together, that is using all capital letters for Oracle table names. Otherwise, Sqoop will not recognize Oracle tables.

This is my database (Oracle) related information:

URL: 10.0.0.24
db_name (SID): test
Schema: Employee_Reporting
Username: username1
Password: password1
Tablename: employee ( I'm going to export this table to HDFS by Sqoop)

Import from Oracle to HDFS:

Let's go through this option file (option.par):

You can see most of the parameters are self-explanatory. Notice that I'm providing table name in all capital letters. How you want to see the imported columns in HDFS? For that, we need to use --fields-terminated-by parameter. Here I'm passing "\t" for that parameter, which means that the column or field for each rows will be tab delimited after import. Sqoop will generate a class(with a list of setter and getter) to invoke the employee object, the name of that class name is defined by --class-name parameter. So in this case, it will create a class named Employee.java inside com.example.reporting package. I'm using --verbose parameter to print out information while Sqoop is working. It's not mandatory and you can ignore it if you want. --split-by parameter represents the name of the column which I want to use for splitting the import data. Here, ID is the primary key of the table Employee. You can use any WHERE clause for your import, in that case you need to pass that with the --where parameter. For the above example, it will import all rows from the table Employee where ID is less than or equal to 100000 (e.g. importing 100000 rows). You need to mention a HDFS location which will be used as a destination directory for the imported data (--target-dir parameter). Remember one thing here is that the target directory should not be existing prior to run import command otherwise Sqoop will throw an error. The last parameter -m represents the number of map tasks to run in parallel for the entire import job.

Once you have your option file ready, you can execute the Sqoop import command as:

sqoop import --options-file option.par

Using option file is not mandatory, I'm just using it for my convenience. You can also pass each of the parameter from your console and execute the import job. Example:

Export from Hive to Oracle:

For export, I will be using some of the parameters which I used during import as they are common for both import and export job. Assume I processed(by MR jobs) the data generated by import job and inserted them into Hive tables. Now I want to export those Hive tables to Oracle. If you are familiar with Hive, you may know that Hive moves/copies data to its warehouse folder by default. I'm using Hortonworks's distribution and for my case Hive'e warehouse folder is located at: "/apps/hive/warehouse/emp_record". Here, emp_record is one of the Hive table I want to export from.

I have already created a matching table "Emp_Record" in my Oracle inside the same schema "Employee_Reporting". To export the Hive table, I'm executing the following command:

Notice that instead of using --target-dir, I'm using --export-dir, this is the location of the Hive table's warehouse folder and data will be exporting from there.

Now assume, inside the warehouse directory, I have a file 00000_1 (which contains the data of Hive table Emp_Record) and some of its lines are:

As you can see, each of columns/fields are tab delimited and each of the rows are separated by a new line. Again we see here that there is an entire row which contains null as their values (Ideally you might not have null values as you might want to filter those values from your M-R jobs). But say we have all kind of values, so we need to tell Sqoop how to treat each of those values. Because of that, I'm using --input-fields-terminated-by parameter to inform that the fields are tab delimited and --input-lines-terminated-by parameter to distinguish rows. Again from Sqoop side, there are two kinds of column - string column and non-string column. Sqoop needs to know what string value is interpreting a null value. Because of that I'm using --input-null-string and --input-null-non-string parameters for two column types and passing '\\N' as their value because for my case '\\N' is null.

Wrapping Up:

Sometimes you will face some issues during export when your Oracle table has a very tight constraint (e.g. not null, time-stamp, expecting value in specific format, etc). In that case, the best idea is to export the Hive table/HDFS data to a temporary Oracle table without any modification to make Sqoop happy :). And then write a SQL script to convert and filter those data to your expected values and load them into your main Oracle table.

The above two examples are just showing a very basic import and export Sqoop job. There are a lot of setting in Sqoop you can use. Once you are able to run export/import job successfully, I would recommend you to try the same job with different parameters and see how it goes. You can find all available options in the Sqoop user guide.

Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.

21 comments:

rohanJuly 26, 2014 at 8:25 AM
This is very informative. We use sqoop import to pull data from oracle tables to our Hive database. However of late we are encountering java heap size errors in the log for sqoop import. Increasing the mappers have worked temporarily but the same problem has resurfaced in the following weeks. Any idea as to what this java heap error with respect to sqoop import be related to ? any insights would be very helpful.
ReplyDelete
Replies
Tanzir MusabbirJuly 26, 2014 at 3:24 PM
Hi Rohan,
Thanks for stop-by. What's the size of import you are using? Is it constant for all runs? We also experienced pretty similar issue but in our case decreasing the mappers helped us. We figured out that when the export size is too small and if we use >6 # of mappers, then it happens only (not always though). So the work-around which we followed was to change mapper size dynamically based on input size. Hope it helps.
ReplyDelete
Replies
AnonymousAugust 22, 2014 at 11:22 AM
Nice Post
ReplyDelete
Replies
AnonymousAugust 22, 2014 at 11:24 AM
Hi, Nice Post. I have tried to add a comment mentioning the problem what I am facing, am not sure weather it reached you or not. Here am again publishing it as Anonymous user. Import goes fine CLOB data in Oracle table to HIVE/HBASE. But after import the newline/tabs which are in CLOB data are been imported as '\x0A' and '\x09' which I dont want. Could you please let me know if there is any way to avoid such extra characters..thanks in advance.
ReplyDelete
Replies
technobugsSeptember 8, 2014 at 3:34 AM
Hi did you try importing oracle views and synonyms to hive
ReplyDelete
Replies
MufyOctober 17, 2014 at 9:51 AM
Is there a way to preserve new line characters and special characters as is while importing from Oracle to Hive using Sqoop?
ReplyDelete
Replies
UnknownNovember 3, 2015 at 3:54 PM
It looks like there is no activity on the post for a year, but I decided to try my luck and post my question here anyways :). Hopefully, one of you sqoop experts can guide me in the right direction.

I have a script to create a schema(TestV100), a table (Xy100) in that schema and export a tab delimited flat file to this oracle table.

This is the shell script: – ExportOracleTestV100.sh
#!/bin/bash
HOST=$1
USER=$2
PASS=$3
SCHEMA=$4
PORT=$5
SID=$6
SQOOP=/usr/bin/sqoop
JDBC="jdbc:oracle:thin:@$1:$5:$6"
SQOOP_EVAL="$SQOOP eval --connect $JDBC --username $USER --password $PASS --query"
#Create Schema and Tables;
${SQOOP_EVAL} "CREATE USER \"TestV100\" identified by \"password\""
${SQOOP_EVAL} "GRANT CONNECT TO \"TestV100\""
${SQOOP_EVAL} "ALTER USER \"TestV100\" QUOTA UNLIMITED ON USERS"
${SQOOP_EVAL} "DROP TABLE \"TestV100\".\"Xy100\""
${SQOOP_EVAL} "CREATE TABLE \"TestV100\".\"Xy100\"( \"a\" NVARCHAR2(255) DEFAULT NULL, \"x\" NUMBER(10,0) DEFAULT NULL, \"y\" NUMBER(10,0) DEFAULT NULL )"

## Load Data into tables; ##
SQOOP_EXPORT="/usr/bin/sudo -u hdfs $SQOOP export --connect ${JDBC} --username $USER --password $PASS --export-dir"
${SQOOP_EXPORT} "/hdfs_nfs_mount/tmp/oracle/TestV100/Xy100.txt" --table "\"\"$SCHEMA\".\"Xy100\"\"" --fields-terminated-by "\t" --input-null-string null -m 1

Input file: – cat /hdfs_nfs_mount/tmp/oracle/TestV100/Xy100.txt
c 8 3
a 1 4

Execution - sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose

Output:
[root@abc-repo-app1 rv]# sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/11/02 12:40:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.1
15/11/02 12:40:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/11/02 12:40:07 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
15/11/02 12:40:07 INFO manager.SqlManager: Using default fetchSize of 1000
15/11/02 12:40:07 INFO tool.CodeGenTool: Beginning code generation
15/11/02 12:40:08 INFO manager.OracleManager: Time zone has been set to GMT
15/11/02 12:40:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TestV100"."Xy100" t WHERE 1=0
15/11/02 12:40:08 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."Xy100". Please ensure that your table name is correct.
java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."Xy100". Please ensure that your table name is correct.
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1658)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

I see a lot of posting online for sqoop import and this error, and solution is to change the table name to UPPERCASE in the command. But, I am running export. Also, the oracle table HAS to be created with mixed case in my environment.

I am running:
Sqoop version: 1.4.5-cdh5.4.1
Oracle Version: 11.1.0.6.0
Ojdbc Driver: ojdbc6.jar
ReplyDelete
Replies
UnknownJanuary 17, 2017 at 1:00 AM
The blog gave me idea to import and export from oracle Thanks for sharing it
Hadoop Training in Chennai
ReplyDelete
Replies
Kavin cookOctober 24, 2017 at 1:51 AM
Its support covers support for entire Oracle stack, where the user can resolve any issue, ranging from application to the hardware, with a single service request.
salesforce datawarehouse
ReplyDelete
Replies
UnknownDecember 3, 2017 at 11:26 PM
Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this."Oracle Training in Bangalore"
ReplyDelete
Replies
YogeshNovember 12, 2018 at 1:33 AM
This is an awesome post. Really very informative and creative contents. This concept is a good way to enhance knowledge. I like it and help me to development very well. Thank you for this brief explanation and very nice information. Well, got good knowledge.
WordPress website development Chennai
ReplyDelete
Replies
vijayJuly 25, 2019 at 6:02 AM

Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
Oracle training in Chennai
| Best Oracle training in Chennai | Top oracle training in Chennai |Oracle training Institute in Chennai | Oracle training in KK nagar
ReplyDelete
Replies
shankarAugust 9, 2019 at 8:56 AM
And indeed, Iím just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective Iíve had.
Database Administration Training | Database Administration course in chennai | Database Administration Training in chennai
ReplyDelete
Replies
janani ramJanuary 9, 2020 at 12:08 AM
Excellent post and it is really useful for most of the freshers.
Ionic Training in Chennai
Ionic Corporate Training
german classes
Best IELTS Coaching in Chennai
learn Japanese in Chennai
TOEFL Coaching Centres in Chennai
content writing course in chennai
spanish coaching in chennai
Ionic Training in Porur
Ionic Training in OMR
ReplyDelete
Replies
chrismornanMarch 20, 2020 at 12:37 AM

Great article by the great author, it is very massive and informative but still preaches the way to sounds like that it has some beautiful thoughts described so I really appreciate this article. Get for more information toy wholesale distributors

ReplyDelete
Replies
DeviJanuary 13, 2021 at 3:25 AM
Thank you for posting informative insights, I think we have got some more information to share with! Do check out Python Training In Chennai and let us know your thoughts. Let’s have great learning!
ReplyDelete
Replies
Muharrem234September 28, 2023 at 12:14 AM
https://bayanlarsitesi.com/
Manisa
Denizli
Malatya
Çankırı

VUEB
ReplyDelete
Replies
InfinityCyberNomad123October 4, 2023 at 11:40 AM
sakarya
elazığ
sinop
siirt
van
J1ZO
ReplyDelete
Replies
TimeSorcerer666October 4, 2023 at 1:03 PM
sinop
sakarya
gümüşhane
amasya
kilis
TPOT
ReplyDelete
Replies
SergencoOctober 6, 2023 at 5:16 PM
van
kastamonu
elazığ
tokat
sakarya
İMJ51U
ReplyDelete
Replies

Add comment