Apache Hudi 0.12.2发布
Apache Hudi 0.12.2发布
长期支持版本
我们的目标是维护 0.12 更长时间,并通过最新的 0.12.x 版本提供稳定版本供用户迁移。 此版本 (0.12.2) 是最新的 0.12 版本。
迁移指南
此版本 (0.12.2) 没有引入任何新的表版本,因此如果您使用的是 0.12.0,则无需迁移。 如果从旧版本迁移,请查看之前发行说明中的迁移指南,特别是0.6.0, 0.9.0, 0.10.0, 0.11.0, and 0.12.0.中的升级说明。
bug修复
0.12.2 版本主要用于错误修复和稳定性。 这些修复跨越许多组件,包括
- DeltaStreamer
- 数据类型/模式相关的错误修复
- Table服务
- 元数据表
- Spark SQL
- Presto 稳定性/性能修复
- Trino 稳定性/性能修复
- 元同步
- Flink 引擎
- 单元、功能、集成测试和 CI
Release Notes
Sub-task
- [HUDI-5244] - Fix bugs in schema evolution client with lost operation field and not found schema
Bug
- [HUDI-3453] - Metadata table throws NPE when scheduling compaction plan
- [HUDI-3661] - Flink async compaction is not thread safe when use watermark
- [HUDI-4281] - Using hudi to build a large number of tables in spark on hive causes OOM
- [HUDI-4588] - Ingestion failing if source column is dropped
- [HUDI-4855] - Bootstrap table from Deltastreamer cannot be read in Spark
- [HUDI-4893] - More than 1 splits are created for a single log file for MOR table
- [HUDI-4898] - for mor table, presto/hive shoud respect payload class during merge parquet file and log file
- [HUDI-4901] - Add avro version to Flink profiles
- [HUDI-4946] - merge into with no preCombineField has dup row in only insert
- [HUDI-4952] - Reading from metadata table could fail when there are no completed commits
- [HUDI-4966] - Meta sync throws exception if TimestampBasedKeyGenerator is used to generate partition path containing slashes
- [HUDI-4971] - aws bundle causes class loading issue
- [HUDI-4975] - datahub sync bundle causes class loading issue
- [HUDI-4998] - Inference of META_SYNC_PARTITION_EXTRACTOR_CLASS does not work
- [HUDI-5003] - InLineFileSystem will throw NumberFormatException, cause the type of startOffset is int and out of bounds
- [HUDI-5007] - Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read
- [HUDI-5008] - Avoid unset HoodieROTablePathFilter in IncrementalRelation
- [HUDI-5025] - Rollback failed with log file not found when rollOver in rollback process
- [HUDI-5041] - lock metric register confict error
- [HUDI-5057] - Fix msck repair hudi table
- [HUDI-5058] - The primary key cannot be empty when Flink reads an error from the hudi table
- [HUDI-5061] - bulk insert operation don't throw other exception except IOE Exception
- [HUDI-5063] - totalScantime and other run time stats missing from commit metadata
- [HUDI-5070] - Fix Flaky TestCleaner test : testInsertAndCleanByCommits
- [HUDI-5076] - Non serializable path used with engineContext with metadata table initialization
- [HUDI-5087] - Max value read from metatable incorrect
- [HUDI-5088] - Failed to synchronize the hive metadata of the Flink table
- [HUDI-5092] - Querying Hudi table throws NoSuchMethodError in Databricks runtime
- [HUDI-5096] - boolean param is broken in HiveSyncTool
- [HUDI-5097] - Read 0 records from partitioned table without partition fields in table configs
- [HUDI-5151] - Flink data skipping doesn't work with ClassNotFoundException of InLineFileSystem
- [HUDI-5157] - Duplicate partition path for chained hudi tables.
- [HUDI-5163] - Failure handling w/ spark ds write failures
- [HUDI-5176] - Incremental source may miss commits if there are inflight commits before completed commits
- [HUDI-5185] - Compaction run fails with --hoodieConfigs
- [HUDI-5203] - Debezium payload does not handle null-field cases
- [HUDI-5228] - Flink table service job fs view conf overwrites the one of writing job
- [HUDI-5242] - Do not fail Meta sync in Deltastreamer when inline table service fails
- [HUDI-5251] - Unexpected avro dependency in flink 1.15 bundle
- [HUDI-5253] - HoodieMergeOnReadTableInputFormat could have duplicate records issue if it contains delta files while still splittable
- [HUDI-5260] - Insert into sql with strict insert mode and no preCombineField should not overwrite existing records
- [HUDI-5277] - RunClusteringProcedure can't exit corretly
- [HUDI-5286] - UnsupportedOperationException throws when enabling filesystem retry
- [HUDI-5291] - NPE in collumn stats for null values
- [HUDI-5320] - Spark SQL CTAS does not propagate Table properties to actual SparkSqlWriter
- [HUDI-5325] - Fix Create Table to propagate properly Metadata Table enabling config
- [HUDI-5336] - Fix log file parsing to consider "." at the beginning
- [HUDI-5346] - Fixing performance traps in CTAS
- [HUDI-5347] - Fix Merge Into performance traps
- [HUDI-5350] - oom cause compaction event lost
- [HUDI-5351] - Handle meta fields being disabled in Bulk Insert Partitioners
- [HUDI-5373] - Different fileids are assigned to the same bucket
- [HUDI-5375] - Fix re-using of file readers w/ metadata table in FileIndex
- [HUDI-5393] - Remove the reuse of metadata table writer for flink write client
- [HUDI-5403] - Input Format class has metadata table enabled for file listing unexpectedly by default
- [HUDI-5409] - Avoid file index and use fs view cache in COW input format
- [HUDI-5412] - Send the boostrap event if the JM also rebooted
Improvement
- [HUDI-4526] - improve spillableMapBasePath disk directory is full
- [HUDI-4799] - improve analyzer exception tip when can not resolve expression
- [HUDI-4960] - Upgrade Jetty version for Timeline server
- [HUDI-4980] - Make avg record size calculated based on commit instant only
- [HUDI-4995] - Dependency conflicts on apache http with other projects
- [HUDI-4997] - use jackson-v2 replace jackson-v1 import
- [HUDI-5002] - Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement
- [HUDI-5027] - Replace hardcoded hbase config keys with HbaseConstants
- [HUDI-5045] - Add tests to integ test to test bulk_insert followed by upsert
- [HUDI-5066] - Support hoodie source metaclient cache for flink planner
- [HUDI-5102] - source operator(monitor and reader) support user uid
- [HUDI-5104] - Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter
- [HUDI-5111] - Add metadata on read support to integ tests
- [HUDI-5184] - Remove export PYSPARK_SUBMIT_ARGS="--master local[*]" from HoodiePySparkQuickstart.py
- [HUDI-5247] - Clean up java client tests
- [HUDI-5296] - Support disabling schema on read if not required
- [HUDI-5338] - Adjust coalesce behavior within "NONE" sort mode for bulk insert
- [HUDI-5344] - Upgrade com.google.protobuf:protobuf-java
- [HUDI-5345] - Avoid fs.exists calls for metadata table in HFileBootstrapIndex
- [HUDI-5348] - Cache file slices within MDT reader
- [HUDI-5357] - Optimize release artifacts' deployment
- [HUDI-5370] - Properly close file handles for Metadata writer
Test
- [HUDI-5383] - Test 0.12.2 release branch
Task
Powered by Waline v2.14.1