本文共 26994 字,大约阅读时间需要 89 分钟。
本文主要描述Sqoop 1.4.6的安装配置以及使用。
一、安装配置1、Sqoop安装[hadoop@hdp01 ~]$ wget http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz[hadoop@hdp01 ~]$ tar -xzf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz[hadoop@hdp01 ~]$ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /u01/sqoop--编辑Sqoop环境变量[hadoop@hdp01 ~]$ cd /u01/sqoop/conf[hadoop@hdp01 conf]$ cp sqoop-env-template.sh sqoop-env.sh[hadoop@hdp01 conf]$ vi sqoop-env.shexport HADOOP_COMMON_HOME=/u01/hadoopexport HADOOP_MAPRED_HOME=/u01/hadoopexport HBASE_HOME=/u01/hbaseexport HIVE_HOME=/u01/hiveexport ZOOCFGDIR=/u01/zookeeper/conf--注释掉configure-sqoop中的以下内容#if [ -z "${HCAT_HOME}" ]; then# if [ -d "/usr/lib/hive-hcatalog" ]; then# HCAT_HOME=/usr/lib/hive-hcatalog# elif [ -d "/usr/lib/hcatalog" ]; then# HCAT_HOME=/usr/lib/hcatalog# else# HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog# if [ ! -d ${HCAT_HOME} ]; then# HCAT_HOME=${SQOOP_HOME}/../hcatalog# fi# fi#fi#if [ -z "${ACCUMULO_HOME}" ]; then# if [ -d "/usr/lib/accumulo" ]; then# ACCUMULO_HOME=/usr/lib/accumulo# else# ACCUMULO_HOME=${SQOOP_HOME}/../accumulo# fi#fi## Moved to be a runtime check in sqoop.#if [ ! -d "${HCAT_HOME}" ]; then# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'#fi##if [ ! -d "${ACCUMULO_HOME}" ]; then# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'#fi--编辑用户环境环境变量[hadoop@hdp01 ~]$ vi .bash_profileexport SQOOP_HOME=/u01/sqoopexport SQOOP_CONF_DIR=$SQOOP_HOME/confexport SQOOP_CLASSPATH=$SQOOP_CONF_DIRexport PATH=$PATH:$SQOOP_HOME/bin[hadoop@hdp01 ~]$ source .bash_profile--验证sqoop安装[hadoop@hdp01 ~]$ sqoop version2017-12-28 09:30:01,801 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Sqoop 1.4.6git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25Compiled by root on Mon Apr 27 14:38:36 CST 2015或者运行sqoop-version--拷贝jdbc驱动将MySQL、PostgreSQL以及Oracle的jdbc驱动拷贝到$SQOOP_HOME/lib
二、Sqoop使用
1、Sqoop测试各个jdbc驱动连接1.1 Sqoop与MySQL的连接[hadoop@hdp01 bin]$ sqoop list-tables --username root -P --connect jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false2017-12-28 09:38:19,587 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 09:38:23,067 [myid:] - INFO [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.PhoneTestPhonehistory_storetbAreaprefixtbAreaprefix_baktbBilltbBilltmptbCattbContacttbDataPathtbDeliverMsgtbDeliverMsg2tbDesttbLocPrefixtbMessagetbPricetbReceivertbSSLogtbSendStatetbSendState2tbSmsSendStatetbTesttbUser
1.2 Sqoop与PostgreSQL的连接
[hadoop@hdp01 ~]$ sqoop list-tables --username rhnuser -P --connect jdbc:postgresql://192.168.120.93:5432/rhndb2017-12-28 09:40:24,842 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 09:40:29,775 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 1000rhnservergroupmembersrhntemplatestringrhnservergrouptypefeaturerhnserverhistoryqrtz_fired_triggers
1.3 Sqoop与Oracle的连接
[hadoop@hdp01 ~]$ sqoop list-tables --username spwuser -P --connect jdbc:oracle:thin:@192.168.120.121:1521/rhndb --driver oracle.jdbc.driver.OracleDriver2017-12-28 10:01:43,337 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 10:01:43,425 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 1000rhnservergroupmembersrhntemplatestringrhnservergrouptypefeaturerhnserverhistoryqrtz_fired_triggers
1.4 Sqoop与Hive的连接
基于PostgreSQL在hive上创建一个名为rhnpackagefile的表,但不导入数据,后面介绍数据导入。[hadoop@hdp01 ~]$ sqoop create-hive-table --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --hive-database hivedb2017-12-28 10:32:01,376 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 10:32:04,699 [myid:] - INFO [main:BaseSqoopTool@1353] - Using Hive-specific delimiters for output. You can override2017-12-28 10:32:04,699 [myid:] - INFO [main:BaseSqoopTool@1354] - delimiters with --fields-terminated-by, etc.2017-12-28 10:32:04,819 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 10002017-12-28 10:32:05,015 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 12017-12-28 10:32:05,674 [myid:] - INFO [main:HiveImport@194] - Loading uploaded data into Hive2017-12-28 10:32:09,089 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Class path contains multiple SLF4J bindings.2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,090 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Found binding in [jar:file:/u01/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]2017-12-28 10:32:09,091 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.2017-12-28 10:32:09,095 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-12-28 10:32:11,996 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - 2017-12-28 10:32:11,996 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - Logging initialized using configuration in jar:file:/u01/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true2017-12-28 10:32:16,650 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - OK2017-12-28 10:32:16,783 [myid:] - INFO [Thread-6:LoggingAsyncSink$LoggingThread@85] - Time taken: 3.433 seconds2017-12-28 10:32:17,248 [myid:] - INFO [main:HiveImport@242] - Hive import complete.
2、数据迁移
2.1 PostgreSQL☞Hive[hadoop@hdp01 ~]$ sqoop import --connect jdbc:postgresql://192.168.120.93:5432/rhndb --table rhnpackagefile --username rhnuser -P --fields-terminated-by ',' --hive-import --hive-database hivedb --columns package_id,capability_id,device,inode,file_mode,username,groupname,rdev,file_size,mtime,checksum_id,linkto,flags,verifyflags,lang,created,modified --split-by modified -m 42017-12-28 11:24:46,666 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 11:24:48,891 [myid:] - INFO [main:SqlManager@98] - Using default fetchSize of 10002017-12-28 11:24:48,894 [myid:] - INFO [main:CodeGenTool@92] - Beginning code generation2017-12-28 11:24:49,091 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 12017-12-28 11:24:49,127 [myid:] - INFO [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoopNote: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.2017-12-28 11:24:50,481 [myid:] - INFO [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/ca09f6bb133fa32808220902aedc0437/rhnpackagefile.jar2017-12-28 11:24:50,493 [myid:] - WARN [main:PostgresqlManager@119] - It looks like you are importing from postgresql.2017-12-28 11:24:50,493 [myid:] - WARN [main:PostgresqlManager@120] - This transfer can be faster! Use the --direct2017-12-28 11:24:50,494 [myid:] - WARN [main:PostgresqlManager@121] - option to exercise a postgresql-specific fast path.2017-12-28 11:24:50,495 [myid:] - INFO [main:ImportJobBase@235] - Beginning import of rhnpackagefile2017-12-28 11:24:50,496 [myid:] - INFO [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2017-12-28 11:24:50,634 [myid:] - INFO [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar2017-12-28 11:24:51,160 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 11:24:51,506 [myid:] - INFO [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/2017-12-28 11:24:51,696 [myid:] - INFO [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:102012017-12-28 11:24:53,801 [myid:] - INFO [main:DBInputFormat@192] - Using read commited transaction isolation2017-12-28 11:24:53,805 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 11:24:53,805 [myid:] - INFO [main:DataDrivenDBInputFormat@147] - BoundingValsQuery: SELECT MIN("modified"), MAX("modified") FROM "rhnpackagefile"2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@64] - Generating splits for a textual index column.2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@65] - If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.2017-12-28 11:25:14,854 [myid:] - WARN [main:TextSplitter@67] - You are strongly encouraged to choose an integral split column.2017-12-28 11:25:14,903 [myid:] - INFO [main:JobSubmitter@396] - number of splits:62017-12-28 11:25:14,997 [myid:] - INFO [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_00092017-12-28 11:25:15,453 [myid:] - INFO [main:YarnClientImpl@236] - Submitted application application_1514358672274_00092017-12-28 11:25:15,485 [myid:] - INFO [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0009/2017-12-28 11:25:15,486 [myid:] - INFO [main:Job@1334] - Running job: job_1514358672274_00092017-12-28 11:25:24,763 [myid:] - INFO [main:Job@1355] - Job job_1514358672274_0009 running in uber mode : false2017-12-28 11:25:24,764 [myid:] - INFO [main:Job@1362] - map 0% reduce 0%2017-12-28 11:26:00,465 [myid:] - INFO [main:Job@1362] - map 17% reduce 0%2017-12-28 11:26:01,625 [myid:] - INFO [main:Job@1362] - map 50% reduce 0%2017-12-28 11:26:03,643 [myid:] - INFO [main:Job@1362] - map 83% reduce 0%2017-12-28 11:34:22,028 [myid:] - INFO [main:Job@1362] - map 100% reduce 0%2017-12-28 11:34:22,035 [myid:] - INFO [main:Job@1373] - Job job_1514358672274_0009 completed successfully2017-12-28 11:34:22,162 [myid:] - INFO [main:Job@1380] - Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=860052 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=913 HDFS: Number of bytes written=3985558014 HDFS: Number of read operations=24 HDFS: Number of large read operations=0 HDFS: Number of write operations=12 Job Counters Killed map tasks=1 Launched map tasks=7 Other local map tasks=7 Total time spent by all maps in occupied slots (ms)=1208611 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=1208611 Total vcore-seconds taken by all map tasks=1208611 Total megabyte-seconds taken by all map tasks=4331661824 Map-Reduce Framework Map input records=18680041 Map output records=18680041 Input split bytes=913 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=4453 CPU time spent (ms)=180780 Physical memory (bytes) snapshot=1957969920 Virtual memory (bytes) snapshot=30116270080 Total committed heap usage (bytes)=1611661312 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=39855580142017-12-28 11:34:22,170 [myid:] - INFO [main:ImportJobBase@184] - Transferred 3.7118 GB in 571.0001 seconds (6.6566 MB/sec)2017-12-28 11:34:22,174 [myid:] - INFO [main:ImportJobBase@186] - Retrieved 18680041 records.2017-12-28 11:34:22,215 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM "rhnpackagefile" AS t LIMIT 12017-12-28 11:34:22,245 [myid:] - INFO [main:HiveImport@194] - Loading uploaded data into Hive2017-12-28 11:34:28,609 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - 2017-12-28 11:34:28,609 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - Logging initialized using configuration in jar:file:/u01/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true2017-12-28 11:34:31,619 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - OK2017-12-28 11:34:31,622 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - Time taken: 1.666 seconds2017-12-28 11:34:32,026 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - Loading data to table hivedb.rhnpackagefile2017-12-28 11:36:14,783 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - OK2017-12-28 11:36:14,908 [myid:] - INFO [Thread-98:LoggingAsyncSink$LoggingThread@85] - Time taken: 103.285 seconds2017-12-28 11:36:15,363 [myid:] - INFO [main:HiveImport@242] - Hive import complete.2017-12-28 11:36:15,372 [myid:] - INFO [main:HiveImport@278] - Export directory is contains the _SUCCESS file only, removing the directory.
2.2 MySQL☞HDFS
[hadoop@hdp01 ~]$ sqoop import --connect jdbc:mysql://192.168.120.92:3306/smsqw --username smsqw -P --table tbDest --columns iMsgID,cDest,tTime,cSMID,iReSend,tLastProcess,cEnCode,tCreateDT,iNum,iResult,iPriority,iPayment,cState,tGpTime --split-by tGpTime --target-dir /user/DataSource/MySQL/tbDest2017-12-28 14:36:52,550 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 14:36:55,496 [myid:] - INFO [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.2017-12-28 14:36:55,497 [myid:] - INFO [main:CodeGenTool@92] - Beginning code generationThu Dec 28 14:36:55 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.2017-12-28 14:36:56,233 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest` AS t LIMIT 12017-12-28 14:36:56,253 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest` AS t LIMIT 12017-12-28 14:36:56,260 [myid:] - INFO [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoopNote: /tmp/sqoop-hadoop/compile/4a4024e6b2baa336939a9310f627636a/tbDest.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.2017-12-28 14:36:57,637 [myid:] - INFO [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/4a4024e6b2baa336939a9310f627636a/tbDest.jar2017-12-28 14:36:57,650 [myid:] - WARN [main:MySQLManager@107] - It looks like you are importing from mysql.2017-12-28 14:36:57,650 [myid:] - WARN [main:MySQLManager@108] - This transfer can be faster! Use the --direct2017-12-28 14:36:57,650 [myid:] - WARN [main:MySQLManager@109] - option to exercise a MySQL-specific fast path.2017-12-28 14:36:57,650 [myid:] - INFO [main:MySQLManager@189] - Setting zero DATETIME behavior to convertToNull (mysql)2017-12-28 14:36:57,652 [myid:] - INFO [main:ImportJobBase@235] - Beginning import of tbDest2017-12-28 14:36:57,653 [myid:] - INFO [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2017-12-28 14:36:57,820 [myid:] - INFO [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar2017-12-28 14:36:58,229 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 14:36:58,581 [myid:] - INFO [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/2017-12-28 14:36:58,770 [myid:] - INFO [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:10201Thu Dec 28 14:37:01 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.2017-12-28 14:37:01,123 [myid:] - INFO [main:DBInputFormat@192] - Using read commited transaction isolation2017-12-28 14:37:01,124 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 14:37:01,124 [myid:] - INFO [main:DataDrivenDBInputFormat@147] - BoundingValsQuery: SELECT MIN(`tGpTime`), MAX(`tGpTime`) FROM `tbDest`2017-12-28 14:37:17,446 [myid:] - INFO [main:JobSubmitter@396] - number of splits:42017-12-28 14:37:17,541 [myid:] - INFO [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_00122017-12-28 14:37:17,966 [myid:] - INFO [main:YarnClientImpl@236] - Submitted application application_1514358672274_00122017-12-28 14:37:17,996 [myid:] - INFO [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0012/2017-12-28 14:37:17,996 [myid:] - INFO [main:Job@1334] - Running job: job_1514358672274_00122017-12-28 14:37:26,149 [myid:] - INFO [main:Job@1355] - Job job_1514358672274_0012 running in uber mode : false2017-12-28 14:37:26,150 [myid:] - INFO [main:Job@1362] - map 0% reduce 0%2017-12-28 14:39:52,733 [myid:] - INFO [main:Job@1362] - map 25% reduce 0%2017-12-28 14:40:14,978 [myid:] - INFO [main:Job@1362] - map 75% reduce 0%2017-12-28 14:40:43,183 [myid:] - INFO [main:Job@1362] - map 100% reduce 0%2017-12-28 14:40:43,191 [myid:] - INFO [main:Job@1373] - Job job_1514358672274_0012 completed successfully2017-12-28 14:40:43,321 [myid:] - INFO [main:Job@1380] - Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=573248 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=609 HDFS: Number of bytes written=5399155888 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Killed map tasks=2 Launched map tasks=6 Other local map tasks=6 Total time spent by all maps in occupied slots (ms)=724670 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=724670 Total vcore-seconds taken by all map tasks=724670 Total megabyte-seconds taken by all map tasks=2597217280 Map-Reduce Framework Map input records=31037531 Map output records=31037531 Input split bytes=609 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=3675 CPU time spent (ms)=588590 Physical memory (bytes) snapshot=4045189120 Virtual memory (bytes) snapshot=20141694976 Total committed heap usage (bytes)=1943535616 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=53991558882017-12-28 14:40:43,329 [myid:] - INFO [main:ImportJobBase@184] - Transferred 5.0284 GB in 225.0893 seconds (22.8755 MB/sec)2017-12-28 14:40:43,335 [myid:] - INFO [main:ImportJobBase@186] - Retrieved 31037531 records.
2.3 HDFS☞MySQL
[hadoop@hdp01 ~]$ sqoop export --connect jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false --username smsqw -P --table tbDest2 --export-dir /user/DataSource/MySQL/tbDest2017-12-28 16:03:18,922 [myid:] - INFO [main:Sqoop@92] - Running Sqoop version: 1.4.6Enter password: 2017-12-28 16:03:21,934 [myid:] - INFO [main:MySQLManager@69] - Preparing to use a MySQL streaming resultset.2017-12-28 16:03:21,934 [myid:] - INFO [main:CodeGenTool@92] - Beginning code generation2017-12-28 16:03:22,343 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest2` AS t LIMIT 12017-12-28 16:03:22,365 [myid:] - INFO [main:SqlManager@757] - Executing SQL statement: SELECT t.* FROM `tbDest2` AS t LIMIT 12017-12-28 16:03:22,373 [myid:] - INFO [main:CompilationManager@94] - HADOOP_MAPRED_HOME is /u01/hadoopNote: /tmp/sqoop-hadoop/compile/332a6c4b30e942c56cf7f507cdff5761/tbDest2.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.2017-12-28 16:03:23,752 [myid:] - INFO [main:CompilationManager@330] - Writing jar file: /tmp/sqoop-hadoop/compile/332a6c4b30e942c56cf7f507cdff5761/tbDest2.jar2017-12-28 16:03:23,762 [myid:] - INFO [main:ExportJobBase@378] - Beginning export of tbDest22017-12-28 16:03:23,762 [myid:] - INFO [main:Configuration@1019] - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address2017-12-28 16:03:24,011 [myid:] - INFO [main:Configuration@1019] - mapred.jar is deprecated. Instead, use mapreduce.job.jar2017-12-28 16:03:24,738 [myid:] - INFO [main:Configuration@1019] - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative2017-12-28 16:03:24,742 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative2017-12-28 16:03:24,743 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 16:03:25,087 [myid:] - INFO [main:TimelineClientImpl@123] - Timeline service address: http://hdp01:8188/ws/v1/timeline/2017-12-28 16:03:25,269 [myid:] - INFO [main:AHSProxy@42] - Connecting to Application History server at hdp01.thinkjoy.tt/192.168.120.96:102012017-12-28 16:03:27,400 [myid:] - INFO [main:FileInputFormat@281] - Total input paths to process : 42017-12-28 16:03:27,406 [myid:] - INFO [main:FileInputFormat@281] - Total input paths to process : 42017-12-28 16:03:27,484 [myid:] - INFO [main:JobSubmitter@396] - number of splits:42017-12-28 16:03:27,493 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative2017-12-28 16:03:27,493 [myid:] - INFO [main:Configuration@1019] - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps2017-12-28 16:03:27,577 [myid:] - INFO [main:JobSubmitter@479] - Submitting tokens for job: job_1514358672274_00202017-12-28 16:03:28,062 [myid:] - INFO [main:YarnClientImpl@236] - Submitted application application_1514358672274_00202017-12-28 16:03:28,091 [myid:] - INFO [main:Job@1289] - The url to track the job: http://hdp01:8088/proxy/application_1514358672274_0020/2017-12-28 16:03:28,092 [myid:] - INFO [main:Job@1334] - Running job: job_1514358672274_00202017-12-28 16:17:18,663 [myid:] - INFO [main:Job@1355] - Job job_1514358672274_0020 running in uber mode : false2017-12-28 16:17:18,665 [myid:] - INFO [main:Job@1362] - map 0% reduce 0%2017-12-28 16:17:34,148 [myid:] - INFO [main:Job@1362] - map 1% reduce 0%2017-12-28 16:17:43,200 [myid:] - INFO [main:Job@1362] - map 2% reduce 0%2017-12-28 16:17:55,269 [myid:] - INFO [main:Job@1362] - map 3% reduce 0%......2017-12-28 16:40:15,427 [myid:] - INFO [main:Job@1362] - map 100% reduce 0%2017-12-28 16:40:32,491 [myid:] - INFO [main:Job@1373] - Job job_1514358672274_0020 completed successfully2017-12-28 16:40:32,659 [myid:] - INFO [main:Job@1380] - Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=571960 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=5401517442 HDFS: Number of bytes written=0 HDFS: Number of read operations=70 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=4 Other local map tasks=1 Rack-local map tasks=3 Total time spent by all maps in occupied slots (ms)=4931826 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4931826 Total vcore-seconds taken by all map tasks=4931826 Total megabyte-seconds taken by all map tasks=17675664384 Map-Reduce Framework Map input records=31037531 Map output records=31037531 Input split bytes=2192 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=21815 CPU time spent (ms)=1522470 Physical memory (bytes) snapshot=3453595648 Virtual memory (bytes) snapshot=20112125952 Total committed heap usage (bytes)=477102080 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=02017-12-28 16:40:32,667 [myid:] - INFO [main:ExportJobBase@301] - Transferred 5.0306 GB in 2,227.9141 seconds (2.3122 MB/sec)2017-12-28 16:40:32,671 [myid:] - INFO [main:ExportJobBase@303] - Exported 31037531 records.
另附import和export常用参数说明表:
转载地址:http://xmual.baihongyu.com/