Neta Art XL V2.0 - Better Dynamics, Anatomy and Epic Scenes

First release address:

Civitai - https://civitai.com/models/410737

Liblib - https://www.liblib.art/modelinfo/55b06e35dd724862b3524ff00b069fe8

I. Overview

We have updated Neta Art XL V2.0. Comparing to version 1.0, we have optimized the character’s posture dynamics, further strengthened hand stability, and the model is very good at telling stories with atmospheric epic scenes.
Main motivation:
- In V1.0, the default poses generated by our characters are relatively fixed, while the new version enhances the sense of dynamics, diverse character camera expressions, and more precise prompt control.
- On the basis of increasing more stylistic diversity, more stable anatomy has been maintained, especially the probability of having five fingers (instead of four or six) on the hands is now higher!
- Further training methods for issues such as Rectified Flow and Noise Offset have been better deployed. The current picture has a strong sense of epic, and can now show very bright and dark scenery (as shown in the figure below).

{
  "prompt": "1girl, full moon, moonlight shadow, cinematic lighting, battle field, fighting armies, castle, warriors around, waving, looking at viewer, Heterochromatic pupil, sitting on a cliff, very dark, epic scene, inferno, cowboy shot, depressed, angel wings, solo, white long hair, wings bangs, beautiful color, amazing quality,",
  "negative_prompt": "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, rating: sensitive, low contrast, signature, flexible deformity, abstract, low contrast, ",
  "resolution": "1344 x 768",
  "guidance_scale": 8,
  "num_inference_steps": 28,
  "sampler": "Euler a",
  "use_lora": null,
  "use_upscaler": {
    "upscale_method": "bislerp",
    "upscaler_strength": 0.65,
    "upscale_by": 1.5,
    "upscale_cfg": 11,
    "new_resolution": "2016 x 1152"
  }
}

Prompting guide

In Neta V2, we re-trained based on ArtiWaifu Diffusion V1.0 (AWA1). Therefore, we used the prompt order summarized in ArtiWaifu:

Tag order: art style ( by xxx ) - > character ( 1 frieren (sousou no frieren) ) - > race (elf) - > composition (cowboy shot) - > painting style ( impasto ) - > theme (fantasy theme) - > main environment (in the forest, at day) - > background (gradient background) - > action (sitting on ground) - > expression (expressionless) - > main characteristics (white hair) - > other characteristics (Twintails, green eyes, parted lips) - > clothing (wearing a white dress) - > clothing accessories (frills) - > other items (holding a magic wand) - > secondary environment (grass, sunshine) - > aesthetics ( beautiful color , detailed ) - > quality ( best quality) - > secondary description (birds, clouds, butterflies)

Prompt: (oil-pasto:0.6), half-pasto, anime coloring, 1girl, raiden shogun(genshin_impact), genshin_impact, solo, smile, sitting beside window, indoor, classroom, hand rest, japanese school uniform,

Negative prompt: ( worst quality : 1.3), low quality , lowres , messy , abstract, ugly, disfigured, bad anatomy, draft, deformed hands, fused fingers, signature, text, multi views

Sampler parameters: Eular a is recommended as default for 28 + inferences.

Neta Art XL 2.0 still supports a very wide range of CFGs (3-20). But this time we have re-aligned the standard CFG of the model to the normal range of 7-9 for everyone to switch experiments.

II. New style recommendations

We have updated some style activation words and optimized the training. To avoid too much confusion with the original meaning of these words, it is now recommended to add a style suffix when activating.

Classic anime(Activation word: anime coloring)	Optimize thick impasto(Activating word: oil-pasto style)	Semi-thick impasto(Activation word: half-impasto style)	Heavy training 2.5d(Activation word: 2.5d style)

Neta Art XL 2.0 updates a number of special artists, activated using the by xxx tag.

by chi4	by hitomio16	by ikky	by ciloranko	by yomu

You can refer to https://civitai.com/models/435207/artiwaifu-diffusion to find more supported artists

III. Better role dynamics

	1girl, jack-o’ challenge	1girl, upside-down	1boy, harry potter, magic wand, amazing quality	1girl, yoimiya(genshin impact), by as109, by antifreeze3, by sho (sho lwlw), by junny
Neta Art XL V2.0
Neta Art XL V1.0
Original question	Incoordination of limbs	Not distinguishing hands and feet	Scenes are more templated	V1’s stylization is not obvious and more like a fixed character illustration, while V2’s characters are very vivid and lively

IV. Epic scenes

Prompt: 1girl, amazing quality, ultra wide shot of Gal Gadot Wonder Woman, holding Mjölnir Hammer, fighting in THOR movie, action-packed, ultra-wide, battle scene, intense, high-quality digital camera, epic cinematic shot, high-resolution digital image, sharp focus, panchromatic film,

V. Training strategies

Model Training is divided into three stages.

Rough practice: Teach models basic knowledge and concepts of the second dimension.
Learning Rate: Constants 1e-5 (unet), 7.5e-6 (text encoder)
Equivalent batch size: 96
Optimizer: AdaFactor (False relative_step, scale_parameter and warmup_init)
Noise-free offset
Time step sampling: LogitNormal (ln3,1)
Minimum signal to noise ratio: 5
Note: Since the basic model has most of the prior knowledge, this stage is skipped in actual training.
Concept Reinforcement: This stage teaches and reinforces all the concepts that the model needs to learn, including artists and characters with less data. The core strategy is data weighting - increasing the weight of data with less data, that is, the number of times it is repeatedly trained in a single epoch, and supplemented with other strategies to help concept learning, such as randomly removing core features of characters.
The training parameters for the concept reinforcement stage are exactly the same as those for rough practice.
Refinement: This stage uses data with quality ratings of best and amazing to continue fine-tuning the model, freezes the text encoder, enables label randomization algorithm, adds noise offset of 0.0357, and uniformly samples time steps during training.
Learning Rate: Constants 2e-6 (unet), 1e-6 (text encoder)
Equivalent batch size: 48
Optimizer: AdaFactor (False relative_step, scale_parameter and warmup_init)
Noise offset: 0.0357
Time step sampling: uniform distribution (same as normal training)
Minimum signal to noise ratio: 5

VI. Terms of Use

The model was developed by Nieta Lab : Neta.art Lab - https://civitai.com/user/neta_art

Collaborative participants:

Euge: https://civitai.com/user/Euge\_
汤人烂: https://space.bilibili.com/8594480
Chenkin: https://civitai.com/user/Chenkin
Bo Dai: https://daibo.info/

Terms of Use:

The model was developed from ArtiWaifu and AAM XL using [Fair AI Public License 1.0-SD]
If you later modify, merge, or develop the model again, you need to open source the subsequent derived model .

VII. Summary and Outlook

Neta Art DiT follow-up training. Stay tuned and test our product for free: https://neta.art .

Bilibili: https://space.bilibili.com/505727005

RED: https://www.xiaohongshu.com/user/profile/63f2ebf2000000001001e206

Twitter: https://twitter.com/netaart_ai

Civitai：https://civitai.com/user/neta_art

Following is backlog for V1

This is the English Release Note for Neta Art XL V1.0, 中文翻译的发布报告请查看：

Neta Art XL V1.0 发布 - 最好用的 XL 二次元基础模型

Civitai - https://civitai.com/models/410737

Huggingface - https://huggingface.co/neta-art/neta-art-xl-1.0

SeaArt - https://www.seaart.me/models/detail/72e3e0818dd822314296be910b7a88b3

{
  "prompt": "1girl, by yoneyama mai, by maimuro, pseudo-impasto, waving, Heterochromatic pupil, (sitting on a neon planet), cowboy shot,amazing quality, depressed, angel wings, solo, white long hair, wings bangs, provocative, cosmic background, cables and wires contrast aesthetic, beautiful color, (amazing quality:1.5), masterpiece",
  "negative_prompt": "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, rating: sensitive, low contrast, signature, flexible deformity, abstract, low contrast, ",
  "resolution": "1344 x 768",
  "guidance_scale": 8,
  "num_inference_steps": 28,
  "sampler": "Euler a",
  "use_lora": null,
  "use_upscaler": {
    "upscale_method": "bislerp",
    "upscaler_strength": 0.65,
    "upscale_by": 1.5,
    "upscale_cfg": 11,
    "new_resolution": "2016 x 1152"
  }
}

I. Overview

Introducing Neta Art XL V1.0, the easiest-to-use SDXL Anime model so far.
Keywords: Best Character Coverage, Vivid storytelling, Diverse styles, Stable anatomy.
Major motivation:
- Better stability and anatomy for character visual storytelling purpose:
- Ordered prompt guide for model to easier follow prompts;
- A very good balance between better knowledge and stability.
Maintain a high ceiling standard for aesthetics across versatile anime art styles, while keeping the baseline of output appealing for general users.
Huge knowledge base, less loras for characters / styles / artists, so we make better use of static model acceleration techniques.

Characters Coverage - refer to both A3.1 + AIDXL lists in the following json file

characterfull.txt

cp2chs_aidvcnt10_2024-04.json

Prompting Guide

To avoid possible ambiguity in text prompt, and leave room for very complicated scene such as multi-character, we found enforcing an ordering in prompts leads to better instruct-following behaviors (Learn from NAI3 / Animagine3 / AIDXL). Specifically, we use the following order in Neta Art XL:

Tag Order: subject (1boy / 1girl) -> character (a girl named frieren from sousou no frieren series) -> Artist trigger (by xxx) -> race (elf) -> composition (cowboy shot) -> style (impasto style) -> theme (fantasy theme) -> main environment (in the forest, at day) -> background (gradient background) -> action (sitting on ground) -> expression (is expressionless) -> main characteristics (white hair) -> body characteristics (twintails, green eyes, parted lip) -> clothing (wearing a white dress) -> clothing accessories (frills) -> other items (a cat) -> secondary environment (grass, sunshine) -> aesthetics (beautiful color, detailed, aesthetic) -> quality ((best quality:1.3))

Prompt: 1girl, solo, character: lucy, white background, cowboy shot, hip cutout, sad expressions, pantyhose, best quality, beautiful color, by bigroll

More examples: Neta Art XL V1.0 生成样例（对外分享）

Negative prompts: (worst quality:1.3), low quality, lowres, messy, abstract, ugly, disfigured, bad anatomy, draft, deformed hands, fused fingers, signature, text, multi views

Sampler: Eular a normal as default, 28+ steps recommended.

One additional merit of Neta Art XL is that it supports a very wide range of CFGs (5 - 20 compared to 7 - 9 of previous models). While we empirically found higher CFG leads to more details and higher contrast, generally CFG 9 - 14 (important!) can be used for best results.

II. Highlight: Style Versatility

We carefully selected 13 style keys with good orthogonality and are commonly used in many scenarios, justified by usage data from Nieta AI (30M+ generations).

Having orthogonal styles means each style is effectively different from the others, allowing you to easily combine and create new styles without interference.

impasto	Flat color	Pseudo-impasto	Celluloid

2.5d	line-sketch	Pixel art	Korean Anime (trigger: kri-ani)

Cute (trigger word: chibi)	Chinese Traditional (trigger word: gufeng)	Dark Style (trigger word: helltaker maker)	Meme Sticker (trigger word: bqb)

Vibrant (trigger word: tiaotiaotang)	Remix: Gufeng + Pixel Art + Chibi	Remix: 2.5d + Korean Anime + line-sketch

Neta Art XL also includes a long list of artist styles, activated through by xxx clause.

By ask	By mochizukikei	By ziyun	By dino	By eku uekura

By hata	By poire	By rella	By fuzichoco	By void

Please refer to https://civitai.com/models/124189/anime-illust-diffusion-xl for a complete list of supported artists.

III. Expression, Posing, and Camera Angles

Face pinching, :o	Double v	Wiping tears	jumping-from-rooftop

Compared to other models, Neta excels at maintaining stability, prompt following ability, and anatomical accuracy even with challenging poses or camera angles that would cause degradation in other models. We compared our results to the second-best candidate models to highlight Neta’s advantages in these areas:

	1girl, running, grassland, closeup	Nahida, 1girl, female, solo, on side, (looking at viewer), forest, flora	1boy, lie on stomach	1boy, neuvillette (genshin impact), genshin impact, solo, (blue down jacket, white pants, brown snow boot, put both hands in the pocket:1.2), looking at viewer, snowy, ground with white snow and footprint, from above	1girl, arima kana, oshi no ko, solo, idol, idol clothes, one eye closed, red shirt, black skirt, black headwear, gloves, stage light, singing, open mouth, crowd, smile, pointing at viewer, masterpiece, best quality
Neta Art XL V1.0
AnimagineXL3.1
Fixed Issues	Figure not stable	Overfitted poses	Girls always appear	Not “looking at viewer”	Both great work!

IV. Multi-Character Scenes

Examples of group scenes with multiple characters

Prompt: 2people, a girl and a boy, kissing, wedding, pseudo-impasto

V. Text & Typography

Neta Art XL demonstrates good ability to keep poster-like text in good success rate.

Prompt: 1girl, double buns, purple and black dual-toned hair, twin buns hairstyle, holding a sign that says twin buns hairstyle, says "Merry christmas" in twin buns hairstyle, poster titled "Happy New Year", neta, best quality, intricate details, snowy forest, best quality, intricate details, 2024, best quality, intricate details,

VI. Training

Training workflow:

The model was trained based on AAM V10, with Neta dataset and AIDXL dataset; most regularization dataset are from AIDXL best quality data; The merge only occurs in final step: which we conclude crystal clearly that it’s “We merged 0.05 CLIP and 0.15 UNet input layers from Animagine 3.1”.

Data annotation combining multiple sources (Original prompt, CogVLM captions, WaifuTagger tags)
Post-processing techniques like semantic deduplication and hierarchical tag organization

a. Semantic Deduplication: This removed redundant tags by intelligently detecting when a higher-level tag (e.g. very long hair) semantically covered a lower-level one (e.g. long hair).

b. Tag Layering Algorithm: Tags were organized into hierarchical layers based on their priorities and related semantics (eg. by wlop influence the whole picture styling, while frills influence a small fraction). More dominant tags were placed in higher layers to prioritize their influence during training.

Dataset management tool from https://github.com/Eugeoter/waifuset

Using high-quality regularization data from AIDXL: High-quality regular datasets with “best” and “amazing” quality ratings from AIDXL. These datasets are manually selected and come with detailed annotations and natural language descriptions.
Finetuning on more knowledgeable base models like AAM, blending with AnimagineXL 3.1 Character Knowledge.

Challenges Faced:

Imbalance in learning different styles
Poor generalization for some styles to diverse scenes
Lack of details/texture in generations
Trigger word overlap with base model knowledge

Solutions Explored:

Data reweighting to balance style learning, and supplement diverse data per style.
Tuning sampling hyperparameters like minimum gamma and rectified flow. Rectified Flow is a training parameter that increases the sampling frequency in the middle time steps but weakens the weight of the model’s learning ability for small noises in the low time steps. This technique helps to improve the model’s ability to restore styles but requires the use of a knowledge-rich base model.
Randomizing / drop off trigger words during training.

VII. Evaluation（Not going to be a Research Paper, but useful for the team）

We know what CCIP and DPG-Bench are, but after a simple test we don’t think it fits our purpose very well, so we went with user study.

Nine models are evaluated using 16 styles (some unique to Neta Art, it not going to be very research oriented, thus biased :P) and 80 prompts. Each prompt generates 3 samples with different aspect ratios, resulting in an XYZ plot (Generated from https://github.com/talesofai/comfyui-browser).

XYZPlot of 3,840 samples, just for one-time evaluation. Here’s an online Example.

Evaluation is done on a 10-point scale across four axes under objective counting rules:

Prompt Following: Points deducted for every 5 images unrelated to the prompt.
Stability: Points deducted for distortions or breakages in heads, hands, feet, or composition.
Diversity: Points deducted for poor style restoration within each style column.
Generalizability: Points deducted for poor semantic generalization within each style column.

Additionally, 10 top model trainers provide subjective scores for Aesthetics, with an average calculated for each model.

VIII. License

Developed with ❤️ by: Neta.art Lab - https://civitai.com/user/neta_art

In collaboration with:

Euge: https://civitai.com/user/Euge\_
汤人烂: https://space.bilibili.com/8594480
Chenkin: https://civitai.com/user/Chenkin
Bo Dai: https://daibo.info/

Thanks to:

https://blog.novelai.net/introducing-novelai-diffusion-anime-v3-6d00d1c118c3

https://cagliostrolab.net/posts/animagine-xl-v3-release

https://civitai.com/models/269232/aam-xl-anime-mix

https://civitai.com/models/124189/anime-illust-diffusion-xl

https://github.com/deepghs/waifuc

Model type: Diffusion-based text-to-image generative model

License: We merged 0.05 CLIP and 0.15 UNet input layers from Animagine 3.1, thus [Fair AI Public License 1.0-SD]

IX. Conclusion and Future Work

Shortcomings:

Some characters are underfitted.
Styles are not activated well with long prompts.
Certain styles appear grayish at low CFG and short prompts. Partly explained in https://civitai.com/articles/4969.

Future Work:

Prepare larger training sets and more knowledge-based data to improve character, style, and detail handling.
Welcome others to join discussions, provide suggestions, and contribute to model advancement.

Neta Art XL 2.0 is on the way. Stay tuned with us, and test our product for FREE: http://neta.art/

Discord: https://discord.gg/AtRtbe9W8w

Twitter: https://twitter.com/netaart_ai

Civitai：https://civitai.com/user/neta_art